Is journalism in the text/video/audio business, or is it in the knowledge business? This class we’ll look at this question in detail, which gets us deep into the issue of how knowledge is represented in a computer. The traditional relational database model is often inappropriate for journalistic work, so we’re going to concentrate on so-called “linked data” representations. Such representations are widely used and increasingly popular. For example Google recently released the Knowledge Graph. But generating this kind of data from unstructured text is still very tricky, as we’ll see when we look at th Reverb algorithm.
Topics: Structured and unstructured data. Article metadata and schema.org. Linked open data and RDF. Entity extraction. Propositional representation of knowledge. Extracting structured data from unstructured text. The Reverb algorithm. DeepQA. Automatic story writing from data.
- A fundamental way newspaper websites need to change, Adrian Holovaty
- The next web of open, linked data – Tim Berners-Lee TED talk
- Identifying Relations for Open Information Extraction, Fader, Soderland, and Etzioni (Reverb algorithm)
- Standards-based journalism in a semantic economy, Xark
- What the semantic web can represent – Tim Berners-Lee
- Building Watson: an overview of the DeepQA project
- Can an algorithm write a better story than a reporter? Wired/ 2012.
Assignment: Entity extraction. Text enrichment experiments using OpenCalais.