Lecture 6: Structured Journalism and Knowledge Representation

Is journalism in the text/video/audio business, or is it in the knowledge business? This class we’ll look at this question in detail, which gets us deep into the issue of how knowledge is represented in a computer. The traditional relational database model is often inappropriate for journalistic work, so we’re going to concentrate on so-called “linked data” representations. Such representations are widely used and increasingly popular. For example Google recently released the Knowledge Graph. But generating this kind of data from unstructured text is still very tricky, as we’ll see when we look at th Reverb algorithm.

Topics: Structured and unstructured data. Article metadata and schema.org. Linked open data and RDF. Entity extraction. Propositional representation of knowledge. Extracting structured data from unstructured text. The Reverb algorithm. DeepQA. Automatic story writing from data.

Slides (PDF)

Readings

Recommended

Assignment: Entity extraction. Text enrichment experiments using OpenCalais.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>