I can parse you: Grammars for dialogs
Martin Hirzel, Louis Mandel, et al.
SNAPL 2017
The rapid expansion of available information, on the Web or inside companies, is increasing. With Cloud infrastructure maturing (including tools for parallel data processing, text analytics, clustering, etc.), there is more interest in integrating data to produce higher-value content. New challenges, notably include entity matching over large volumes of heterogeneous data. In this paper, we describe an approach for entity matching over large amounts of semistructured data in the Cloud. The approach combines ChuQL[4], a recently proposed extension of XQuery with MapReduce, and a blocking technique for entity matching which can be efficiently executed on top of MapReduce. We illustrate the proposed approach by applying it to extract automatically and enrich references in Wikipedia and report on an experimental evaluation of the approach. © 2012 ACM.
Martin Hirzel, Louis Mandel, et al.
SNAPL 2017
Peter A. Boncz, Torsten Grust, et al.
Dagstuhl Seminar Proceedings 2006
Mary Fernández, Jan Hidders, et al.
DEXA 2005
Mirella M. Moro, Susan Malaika, et al.
WWW 2007