Varun Bhagwan, Tyrone Grandison, et al.
Communications of the ACM
Many real world analytics problems examine multiple entities or classes that may appear in a corpus. For example, in a customer satisfaction survey analysis there are over 60 categories of (somewhat overlapping) concerns. Each of these is backed by a lexicon of terminology associated with the concern (e.g., �Easy, user friendly process" or "Process confusing, too many handoffs�). These categories need to be expanded by a subject matter expert as the terminology is not always straight forward (e.g., �handoffs� may also include �ping-pong� and �hot potato� as relevant terms). But given that Subject Matter Expert time is costly, which of the 60+ lexicons should we expand first? We propose a metric for evaluating an existing set of lexicons and providing guidance on which are likely to benefit most from human-in-the-loop expansion. Using our ranking results we achieved 4 improvement in impact when expanding the first few lexicons off our suggested list as compared to a random selection.
Varun Bhagwan, Tyrone Grandison, et al.
Communications of the ACM
Daniel Gruhl, R. Guha, et al.
KDD 2005
Shanu Kumar, Anjali Singh, et al.
WWW 2019
Gabriel Stanovsky, Daniel Gruhl, et al.
EACL 2017