Cloud Connect live: Data Storage and the Cloud

James Duncan and Jason Hoffman of Joyent are reviewing the state of storage within the cloud. Jason Hoffman gives an overview of where we are and how we got here, and where things might be going. He explains the evolution of cloud computing as such:

1995 – Internet – Cloud Networking – Network IO turned into a utility

2005 – Intercomputer – Cloud Computing – Ram CPU memory and disc I/O

2015 – Interdata – Cloud data management, Governanace, Policy, etc…

Our ability to generate data and our appetite for storage is unlimited. Figuring out what is unique data and what is redundant data is critical. Figuring our what data is important is critical.

Key issues in scaling storage, which are driving many new storage solutions such as cloud storage, NoSQL platforms etc… are Administrative, Geographic, Load and Capacity.

James Duncan delves into some of the platforms being used for cloud storage:

A popular solution to improve storage access is Memcached which protects expensive backend I/O by caching frequently accessed data in memory. Many folks are migrating to Redis which has richer functionality and increased durability.

Eventually consistent document stores include Mongo and Riak, which is clustered and configurable, but has no indexes. Others are Project Voldemort, Cassandra and Hadoop. They have a lot of similarities but differ in the details, for example Riak is excellent at reads, where Voldemort excels at write performance.

Blobstores are scalable object storage like S3, include MogileFS and Openstack‘s object store based on Rackspace’s Cloud Files.

Ceph is an interesting project that  in the mainline Linux kernel that is more production worthy than its “alpha” status indicates and is possibly the closest open source contender to build an S3-like object store in house.

Jason: Test reliability and durability of data by unplugging systems and seeing what happens. For example, Mongo is known to lose data under these circumstances.