Mark Steedman

Inference in Open-Domain Question-Answering

Open-domain question-answering from text corpora like Wikipedia and the Common Crawl generally requires inference. Perhaps The question is "Who owns Twitter?", but the text only talks about people buying (or not buying) that company. To answer the question, we need a structure of "meaning postulates" that includes one that says buying entails ownership. Such structures are commonly (though inaccurately) referred to as "entailment graphs (EG)." They are inherently directional: the fact Twitter Inc, owns Twitter does not imply that they bought it.

There are two approaches: the sacred and the profane. The profane approach is to hope that large large models (LLM) such as BERT can be fine-tuned for use as "latent" entailment graphs. I'll argue following Javad Hosseini, Sabine Weber, and Tianyi Li that we see no evidence so far that LLMs can learn directional entailment (as opposed to bidirectional similarity).

The sacred approach uses machine-reading with parsers to extract a (different) Knowledge Graph (KG) of triples representing events or relations that hold between entities, including buying and owning relations, and builds an entailment graph on the basis of distributional inclusion between the triples. Such entailment graphs gain in precision, because they are inherently directional, and are scalable, but they are inherently sparse, because of the Zipfian distribution of everything.

I'll discuss some recent work by Nick McKenna investigating the theory of smoothing EGs using LLMs, and use of WordNet/BabelNet to investigate further distributional asymmetries.

back to overview

Watch Recording


Mark Steedman is a Professor in the School of Informatics at the University of Edinburgh. His research is at the interdisciplinary interface of computer science and cognitive science in natural language processing (NLP) and Artificial Intelligence (AI). He has pioneered the application of computational techniques to the analysis of natural language syntax and semantics, and to the analysis of music.

His most widely recognised invention is Combinatory Categorial Grammar (CCG), a computationally practical theory of natural language grammar and processing (Steedman 1985b, 1987a, 1996a, 2000a, 2012a). His work has been recognized in its linguistic aspect by a Fellowship of the British Academy, and in its applied aspect, by Fellowships of the American Association for Artificial Intelligence (AAAI), the Association for Computational Linguistics (ACL), and the Cognitive Science Society. In 2018, Steedman received the Lifetime Achievement Award of the ACL. His students are employed at Google, Facebook, DeepMind, Apple, and Amazon, as well as on the faculties of the world’s leading universities.