Searching Digital Libraries with Language Models
Prof Michalis Vlachos, Faculty of Business and Economics (HEC), Department of Information Systems, University of Lausanne
Chaired by
Assoc Prof Miguel Escobar Varela, Deputy Director, Centre for Computational Social Science and Humanities
- 28 January 2026, 02:00 PM to 03:30 PM
- Room AS7-0102, The Shaw Foundation Building
Abstract:
Digital libraries now hold tens of millions of pages, yet most are still accessed through keyword based search over potentially noisy OCR text. As collections expand, traditional search interfaces struggle to support meaningful discovery, context, and evidence-based answers.
This talk explores how modern language models can enhance the entire digital library pipeline. We examine how LLMs can refine OCR output, clean historical text, and enable natural-language search through retrieval augmented generation. Using real-world archival data, we show improvements in character and word error rates, as well as downstream gains in retrieval quality, evaluated through both standard metrics and LLM based judging for faithfulness, correctness, and relevancy.
We also look beyond ranking results, discussing evidence driven answers and immersive, augmented-reality interfaces that open new ways to explore large historical collections. We conclude by reflecting on how these advances can improve transparency, reduce misinformation, and reshape the future of search in digital libraries.
Bio:
Michalis Vlachos is a Professor in the Faculty of Business and Economics at the University of Lausanne. Prior to joining academia, he worked as a Research Staff Member at IBM Research in Zurich and at the IBM T.J. Watson Research Center in New York. He holds a PhD in Computer Science from the University of California, Riverside, as well as an MBA from the University of Illinois at Urbana–Champaign.
His research interests include recommender systems and data science, with a focus on practical and scalable methods. He has published extensively in these areas and holds several US patents. His work has been recognized with multiple technical and best-paper awards, as well as competitive research fellowships and grants, including a Fulbright Scholarship, a Marie-Curie International Reintegration Grant, and an ERC grant on the topic “Exact Mining from Inexact Data.”