Content addressable data management
A direct implication of both the industry and academia proclaiming the Age of Tera-(even the Peta)-scale computing, is that applications have become more data intensive than ever. The increased data volume from applications tackling larger and larger problems has fueled the need for efficient management of this data. In this thesis, we evaluate a technique called Content Addressable Storage or CAS, for managing large volumes of data. This evaluation focuses on the benefits and demerits of using CAS for, (i) improved application performance via lockless and lightweight synchronization of accesses to shared storage data; (ii) improved cache performance; (iii) increase in storage capacity; and, (iv) increased network bandwidth. We present the design of a CAS-based file store that significantly improves the storage performance providing lightweight and lock-less user-defined consistency semantics. As a result, our file-system shows a 28% increase in read-bandwidth and a 13% increase in write bandwidth, over a popular file-system in common use. We use the same experimental file-system to analyze CAS on data from real world application benchmarks. We also estimate the potential benefits of using CAS for a virtual machine based user mobility application, that was in active use at a public deployment for over a period of seven months.