In conjunction with ACL-IJCNLP 2015
July 2015, Bejing, China
Narratives are at the heart of information sharing. Ever since people began to share their experiences, they have connected them to form narratives. Modern day news reports still reflect this narrative structure, but they have proven difficult for automatic tools to summarise, structure, or connect to other reports. This difficulty is partly rooted in the fact that most text processing tools focus on extracting relatively simple structures from the local lexical environment, and concentrate on the document as a unitor on even smaller units such as sentences or phrases, rather than cross-document connections. However, current information needs demand a move towards multidimensional and distributed representations which take into account the connections between all relevant elements involved in a “story”. Additionally, most work on cross-document temporal processing focuses on linear timelines, i.e. representations of chronologically ordered events in time. Storylines, though, are more complex, and must take into account temporal, causal and subjective dimensions (e.g., characters’ perspectives, the good versus the bad). How storylines should be represented, how they can be extracted automatically, and how they can be evaluated are open research questions in the NLP and AI communities. In this workshop, we aim to bring together researchers working on representing and extracting narrative structures in news. In particular, we will seek to assess the state-of-the-art in event extraction and linking, as well as detecting and ranking narratives according to salience.
Motivation: why the topic is of interest now
The reasons for this workshop are threefold:
- Recent advances in NLP technology have made it feasible to look beyond scenario-driven, atomic extraction of events from single documents and work towards extracting story structures from multiple documents, while these documents are published over time as news streams.
- Policy makers, NGOs, information specialists (such as journalists and librarians) and others are increasingly in need of tools that support them in finding salient stories in large amounts of information to more effectively implement policies, monitor actions of “big players” in the society and check facts. Their tasks often revolve around reconstructing cases either with respect to specific entities (e.g. person or organizations) or events (e.g. hurricane Katrina).
- Storylines represent explanatory schemas that enable us to make better selections of relevant information but also projections to the future. They form a valuable potential for exploiting news data.
We invite work on all aspects of generating narrative structures or components thereof from news. This includes topics such as (but not limited to):
|detecting events from news||linguistic expression of relevant events|
|filtering relevant events||cumulation of information from news streams|
|detecting opinions and perspectives||finding trending or serendipitous stories in news|
|tracing perspective change through time||modeling plot structures|
|storyline stability and completeness||annotating storylines|
|temporal or causal ordering of events||linguistics resources for storylines|
|big data as a source for storylines||evaluation of storylines|
|discourse structure and storylines||visualisation of storylines|
|detecting facts and speculations||dynamic event modeling|
Goals and Structure of the workshop
One of the first aim of the workshop is to gather together researchers from different communities (Natural Language Processing, Artificial Intelligence and Humanities) to discuss and share ideas on the issues of storyline extraction from news articles. In order to make the workshop results more effective, we are planning to make this as a “working” workshop on the line of the line of the 1st Workshop on EVENTS. The organizers will provide a set of news articles from different sources, stretching over a period of time and focused on a specific topic and will ask the participants to provide their own annotations, interpretations, and analyses of this dataset. We will collect these analyses before the workshop and summarize them to facilitate an insightful comparison. We will ask for clear documentation of the annotation schemes so as to enable meaningful comparisons. Furthermore, we will ask participants who have systems and tools for extracting storylines to run their systems on this common dataset. These results will be compared with the annotated data (only indirect comparisons would be possible). The results of the combined manual and automatic annotation of the common dataset will be used to dive the discussion around three themes:
- Definitions: what is a storyline? how it can be formally and computationally formulated?
- Resources: what are the core markables of a storyline? how should annotation of storylines should be performed? can existing annotation schemes be re-used and adapted for storyline annotation? how should we annotate cross-document information concerning events and character perspectives? is it feasible to develop a StoryBank for evaluation?
- Evaluation: how do we determine if an extracted storyline is “good enough”? can standard measures, such as Precision, Recall and F-measure, be applied to evaluate storyline extraction or do we need different measures? should evaluation take place at a global level or it must be conducted separately on the different components of a storyline system?
- Tommaso Caselli, VU University Amsterdam
- Marieke van Erp, VU University Amsterdam
- Anne-Lyse Minard, Fondazione Bruno Kessler
- Mark Finlayson, Florida International University
- Ben Miller, Georgia State University
- Jordi Atserias, Yahoo! Barcelona
- Alexandra Balahur, European Commission’s Joint Research Centre
- Piek Vossen, VU University Amsterdam