Identifying high value opportunities for human in the loop lexicon expansion

Alfredo Alba; Daniel Gruhl; Chad Deluca; Linda Kato; Anna Lisa Gentile; Chris Kau; Petar Ristoski; Steve Welch

doi:10.1145/3308560.3317305

WWW 2019

Conference paper

13 May 2019

Identifying high value opportunities for human in the loop lexicon expansion

View publication

Abstract

Many real world analytics problems examine multiple entities or classes that may appear in a corpus. For example, in a customer satisfaction survey analysis there are over 60 categories of (somewhat overlapping) concerns. Each of these is backed by a lexicon of terminology associated with the concern (e.g., �Easy, user friendly process" or "Process confusing, too many handoffs�). These categories need to be expanded by a subject matter expert as the terminology is not always straight forward (e.g., �handoffs� may also include �ping-pong� and �hot potato� as relevant terms). But given that Subject Matter Expert time is costly, which of the 60+ lexicons should we expand first? We propose a metric for evaluating an existing set of lexicons and providing guidance on which are likely to benefit most from human-in-the-loop expansion. Using our ranking results we achieved 4 improvement in impact when expanding the first few lexicons off our suggested list as compared to a random selection.

Paper