Using Bloom Filters to Speed Up HITS-Like Ranking AlgorithmsAlgorithms and Models for the Web-Graph (2007), pp. 195-201.
|
Reviews
[Write a review of this article]
There are no reviews of this article
Notes for this articleFor each page u in the web graph, a summary is computed, containing:
- A bloom filter for a sample of in-links of size a (~1000)
- A sample of in-links of size b (~5)
- A bloom filter for a sample of out-links of size c (~1000)
- A sample of out-links of size d (~5)
The samples are done using consistent sampling. Bloom filters are variable-length. This speeds up the computation of the base set for HITS given a root set of results.
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
AbstractThis paper describes a technique for reducing the query-time cost of HITS-like ranking algorithm. The basic idea is to compute for each node in the web graph a summary of its immediate neighborhood (which is a query-independent operation and thus can be done off-line), and to approximate the neighborhood graph of a result set at query-time by combining the summaries of the result set nodes. This approximation of the query-specific neighborhood graph can then be used to perform query-dependent link-based ranking algorithms such as HITS and SALSA. We have evaluated our technique on a large web graph and a substantial set of queries with partially judged results, and found that its effectiveness (retrieval performance) is comparable to the original SALSA algorithm, while its efficiency (query-time speed) is substantially higher.
BibTeX record
RIS record