Luke Zettlemoyer
Nonparametric Language Models: Trading Data for Parameters (and Compute) in Large Language Models
Large language models (LLMs) such as ChatGPT have taken the world by storm, but are incredibly expensive to train, requiring significant amounts of data and computational resources. They also hallucinate, e.g. by regularly introducing made up facts, and are difficult to keep up to date over time, as the world around them changes. In this talk, I will survey some of our recent work on non-parametric and retrieval-based language models, which are instead designed to be easily extensible and provide much more careful provenience for their predictions. The key idea is to trade parameters for data; rather than attempting to memorize all the worlds facts and knowledge in the learned parameters of a single monolithic LM, we instead provide the model an explicit knowledge store (e.g. a collection of web pages from Wikipedia) that can be used to look up information in real time. This is a new area where best practices are still forming, but I will argue retrieval augmentation is a very general idea that can lead to much more efficient training, can provide fundamentally new insights into how LLMs work, and is broadly applicable to a range of settings, including e.g. models that do text-to-image generation. I will also provide, to the best of my ability, a guess about where things are going and what it would take to convince every major LLM to go non-parametric in the near future.
back to overview
Watch Recording