PADMINI: A peer-to-peer distributed data mining system for astronomy researchers
As the amount of data available at geographically distributed sources increases rapidly, the need for efficient distributed data mining is becoming increasingly important. Increasing computation powers (change this) at lower hardware costs and reliable communication mechanisms have also led to the proliferation of Peer-to-Peer networks. These factors have lead to the development of dedicated distributed solutions that can run on Peer-to-Peer networks. Many domains such as finance, astronomy, bioinformatics etc. face varied challenges where such solutions can prove instrumental. This thesis presents PADMINI—a Peer-to-Peer Astronomy Data Mining system. Unlike centralized data mining systems, PADMINI is a Web based system powered by Google Sky and distributed data mining algorithms that run on a collection of computing nodes. PADMINI supports two disparate frameworks, namely Hadoop and Distributed Data Mining Toolkit. These frameworks enable PADMINI to support a wide range of data mining algorithms. This work presents solutions implemented on PADMINI for specific data mining problems like Outlier Detection and Classifier Learning. The PADMINI system can also be used to learn (classifiers) classification models from any source of data over the internet, without requiring any kind of support from the host servers. Experimental results to establish the correctness of the solutions and the scalable nature of the PADMINI system are also provided.