Probabilistic text modeling with Orthogonalized topics
Enpeng Yao, Guoqing Zheng, et al.
SIGIR 2014
Community discovery on large-scale linked document corpora has been a hot research topic for decades. There are two types of links. The first one, which we call d2d-link, indicates connectiveness among different documents, such as blog references and research paper citations. The other one, which we call u2u-link, represents co-occurrences or simultaneous participations of different users in one document and typically each document from u2u-link corpus has more than one user/author. Examples of u2u-link data covers email archives and research paper co-authorship networks. Community discovery in d2d-link data has achieved much success, while methods for that in u2u-link data either make no use of the textual content of the documents or make oversimplified assumptions about the users and the textual content. In this paper we propose a general approach of community discovery for u2u-link data, i.e., multiple user data, by placing topical variables on multiple authors' participations in documents. Experiments on a research proceeding co-authorship corpus and a New York Times news corpus show the effectiveness of our model.
Enpeng Yao, Guoqing Zheng, et al.
SIGIR 2014
Zi Yang, Keke Cai, et al.
SIGIR 2011
Guoqing Zheng, Jinwen Guo, et al.
CIKM 2010
Elif Aktolga, Irene Ros, et al.
SIGIR 2011