Storage-oriented clusters present unique challenges to the implementation of storage management. Such clusters manage a vast amount of data, most of which is located on secondary storage. Manual storage management in storage-oriented cluster environments is complex, error-prone and tedious. As a result there is a clear need for automatic storage management (garbage collection) for such clusters. The goals of a garbage collector for use in a storage-oriented cluster are safety, completeness and scalability in the face of distributed cycles of garbage in secondary storage. Of the few extant distributed secondary storage garbage collectors, none meet all of the stated goals whilst also operating efficiently. This paper describes the design and implementation of a new distributed garbage collector based on the train algorithm, specifically for use in storage-oriented clusters. The collector presented here extends the train algorithm, employing an asynchronous distributed termination detection algorithm for isolated train detection, a mechanism for deferring the update of metadata and a new external root tracking mechanism to permit interaction with clients that cache and swizzle pointers. Our experiments demonstrate that these extensions successfully adapt the train algorithm for efficient operation in a storage-oriented cluster, fulfilling the stated goals of safety, completeness and scalability.
|Cite as: Brodie-Tyrrell, W., Detmold, H., Falkner, K. and Munro, D.S. (2004). Garbage Collection for Storage-Oriented Clusters. In Proc. Twenty-Seventh Australasian Computer Science Conference (ACSC2004), Dunedin, New Zealand. CRPIT, 26. Estivill-Castro, V., Ed. ACS. 99-108. |
(local if available)