Two applications of information complexity
T.S. Jayram, Ravi Kumar, et al.
STOC 2003
The Web harbors a large number of communities - groups of content-creators sharing a common interest - each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities - those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.
T.S. Jayram, Ravi Kumar, et al.
STOC 2003
Soumen Chakrabarti, Byron Dom, et al.
VLDB Journal
Don Coppersmith, Prabhakar Raghavan
Operations Research Letters
Lydia E. Kavraki, Jean-Claude Latombe, et al.
Journal of Computer and System Sciences