Affiliation:
1. Microsoft Corporation, USA
2. University of Amsterdam, The Netherlands
3. Gavagai & SICS Stockholm, Sweden
Abstract
There is an increasing amount of structure on the Web as a result of modern Web languages, user tagging and annotation, and emerging robust NLP tools. These meaningful, semantic, annotations hold the promise to significantly enhance information access, by increasing the depth of analysis of today's systems. Currently, we have only started to explore the possibilities and only begun to understand how these valuable semantic cues can be put to fruitful use. The workshop had an interactive format consisting of keynotes, boasters and posters, breakout groups and reports, and a final discussion, which was prolonged into the evening. There was a strong feeling that we made substantial progress. Specifically, each of the breakout groups contributed to our understanding of the way forward. First, annotations and use cases come in many different shapes and forms depending on the domain at hand, but at a higher level there are remarkable commonalities in annotation tools, indexing methods, user interfaces, and general methodology. Second, we got insights in the "exploitation" aspects, leading to a clear separation between the low-level annotations giving context or meaning to small units of information (e.g., NLP, sentiments, entities), and annotations bringing out the structure inherent in the data (e.g., sources, data schema's, document genres). Third, the plan to enrich ClueWeb with various document level (e.g., pagerank and spam scores, but also reading level) and lower level (e.g., named entities or sentiments) annotations was embraced by the workshop as a concrete next step to promote research in semantic annotations.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Management Information Systems