Distributed algorithms for peer -to -peer systems
In recent years, peer-to-peer (P2P) systems have emerged as a powerful networking paradigm that allows sharing of a multitude of resources at the edge of the Internet in a completely distributed manner. The unprecedented large scale of P2P systems demands novel algorithmic solutions to overcome important challenges posed by such systems. Although file sharing is still the most prevalent P2P application, a plethora of applications—some of them yet to be discovered—can profit from the P2P paradigm. In this thesis, we present the design and evaluation of distributed algorithms for four problems related to resource management, distributed computation, and data location and replication in P2P systems.
We first discuss the problem of incorporating locality in distributed hash tables (DHTs), a common way of implementing structured P2P networks. We propose a two-level overlay P2P network in which a global overlay provides an information repository for operation and maintenance of local overlays and directs peers to local overlays. A local overlay acts as a locality-aware cache for the global overlay, grouping peers close together in the underlying network. Local overlays are constructed by exploiting the structure of the Internet as Autonomous Systems (ASs). We present a detailed experimental study that demonstrates performance gains in response time of up to 50% compared to a single overlay. Our contributions also include efficient distributed algorithms for maintaining local overlays in the presence of node arrivals and departures.
As a second problem, we explore the geometry of a DHT—the way in which neighbors and routes are chosen in the DHT—to build a P2P system for decomposing high dimensional binary matrices, called PQ UESTER. Binary matrix decomposition can be seen as a preprocessing phase for more expensive data mining techniques, or it can provide patterns that can be directly interpreted in more specific scenarios. In PQ UESTER, peers compute rank-one approximations of their local matrices and go through consolidation phases to achieve global rank-one approximations. In the consolidation phase, peers find patterns that are similar and that can be merged together to achieve a common rank-one approximation across multiple peers. We show, through extensive experimentation, that PQ UESTER achieves levels of precision and recall similar to a serial implementation, while discovering just a few extra patterns. Our experiments also show that PQUESTER achieves significant levels of speedup, while preserving the load balanced on the peers.
Several characteristics of unstructured P2P networks, e.g. topology flexibility, have led to their adoption in many file-sharing applications. Unfortunately, guaranteed location of shared content is not supported in such networks. We present a simple but highly effective protocol for object location that gives probabilistic guarantees of finding even rare objects in the network. The protocol relies on randomized techniques for replication of objects (or their references) and for query propagation. We prove analytically, and demonstrate experimentally, that our scheme provides high probabilistic guarantees of success, while incurring minimal overhead.
Distributed P2P storage systems rely on voluntary participation of peers to effectively manage a storage pool. If disk space on these peers is not carefully monitored and provisioned, the system may not be able to provide availability for certain files. In particular, identification and elimination of redundant data are important problems that may arise in long-lived systems. We address the problem of duplicate elimination in the context of systems connected over an unstructured P2P network in which there is no a priori binding between an object and its location. We present two randomized protocols to solve this problem in a scalable and decentralized fashion that do not compromise availability requirements of the application. Performance results, using both large-scale simulations and a prototype built on PlanetLab, demonstrate that the protocols provide high probabilistic guarantees of success, while incurring minimal administrative overheads.