The complete post is available where it was originally published on this site
Quotes are an important part of journalism, with news reports often substantiated by citing what different sources say. Research shows that journalists often select sources on the basis of accessibility (availability, time pressure) and quality (credibility, accuracy, topicality), while also paying attention to the balance between sources.
In that sense, there’s value to be gleaned by understanding the sources and their quotes that journalists include. There’s also potential benefit in supporting reporters by helping them find previous statements from sources and compare them to new claims. To be able to support these use cases we need to be able to extract, analyse, and index quotes computationally. In this post I’ll provide some background before presenting our own LLM-Based approach to tackle this problem.

