Integrating clustering with page ranking
The enormous growth in the number of documents in the World Wide Web increases the need for improved link navigation and analysis. One of the important techniques of analyzing the web link structure is page rank computation. The focus of this thesis is on studying the conventional page ranking algorithm and designing a new page ranking algorithm by incorporating clustering technique into it.
The eye tracking study conducted by search marketing firm Enquiro and Did-it and eye tracking firm Eyetool has shown that the vast majority of eye tracking activity during a search happens in a triangle at the top of a search results page indicating that the areas of maximum interest create a "Golden Triangle". The results from this study has been adopted and used to define a clustering technique which is incorporated into the conventional page ranking algorithm. This new page ranking algorithm has been implemented and results are analyzed which reveal that the number of iterations required in the conventional method can be reduced thus reducing the computation time. The experimental results are based on the SUNYIT.EDU web graph which consists of 1000 nodes and approximately 24,300 links. This web graph has been crawled using an optimized crawler designed for this particular system implementation.
The comparison of search results for a search query based on the page ranks computed by both page ranking techniques shows that the relative position of web pages are the same. This means that if we incorporate clustering into page ranking, the number of iterations can be reduced thus saving cost of computation and time. All this is achieved without compromising with the quality of search results and thus the over all user experience with the search engine.