RANLP 2025 Keynote Speakers:

Eneko Agirre (University of the Basque Country, Spain)
LLMs and low-resource languages
Abstract: Generative AI models are now multilingual, raising new questions about their relative performance across languages and local cultures, specially for communities with less speakers. In this talk I will explore some of those questions and the lessons we learned along the process. Is it possible to build high-performing base LLMs for low-resource languages? We have built high performing open models for Basque based on Llama 3.1 base accompanied by a fully reproducible end-to-end evaluation suite. Is it possible to instruct them with zero native or machine translated instructions? We show that raw Basque corpora suffices to adapt Llama 3.1 instruct to Basque with high quality, close to much larger close models. In the process, we also learned about catastrophic forgetting, that LLMs do not fully exploit their multilingual potential when prompted in non-English languages and that local knowledge is transferred from the low-resource to the high-resource language. The evaluation suite was recognised with a best resource paper award at ACL 2024.
Bio: Eneko Agirre is Full Professor of Informatics and Head of HiTZ Basque Center of Language Technology at the University of the Basque Country, UPV/EHU, in San Sebastian, Spain. Visiting researcher or professor at New Mexico State, Melbourne, Southern California, Stanford and New York Universities. He received the Spanish Informatics Research Award in 2021, is a member of Jakiunde, the Basque academy of sicences, and is one of the 95 fellows of the Association of Computational Linguistics (ACL). He was President of ACL’s SIGLEX, member of the editorial board of Computational Linguistics, Journal of Artificial Intelligence Research and Action Editor for the Transactions of the ACL. He is co-founder of the Joint Conference on Lexical and Computational Semantics (*SEM). He is a recipient of three Google Research Awards and six best paper awards and nominations, most recent at ACL 2024. Dissertations under his supervision received best PhD awards by EurAI, the Spanish NLP society and the Spanish Informatics Scientific Association. He has over 200 publications across a wide range of NLP and AI topics, as well as having given more than 20 invited talks, mostly international.

Preslav Nakov (MBZUAI Abu Dhabi, UAE)
Towards Truly Open, Language-Specific, Safe, Factual, and Specialized Large Language Models
Abstract: First, we will argue for the need for fully transparent open-source large language models (LLMs), and we will describe the efforts of MBZUAI’s Institute on Foundation Models (IFM) towards that based on the LLM360 initiative. Second, we will argue for the need for language-specific LLMs, and we will share our experience from building Jais, the world’s leading open Arabic-centric foundation and instruction-tuned large language model, Nanda, our open-weights Hindi LLM, Sherkala, our open-weights Kazakh LLM, and some other models. Third, we will argue for the need for safe LLMs, and we will present Do-Not-Answer, a dataset for evaluating the guardrails of LLMs, which is at the core of the safety mechanisms of our LLMs. Forth, we will argue for the need for factual LLMs, we will discuss the factuality challenges that LLMs pose. We will then present some recent relevant tools for addressing these challenges developed at MBZUAI: (i) OpenFactCheck, a framework for fact-checking LLM output, for building customized fact-checking systems, and for benchmarking LLMs for factuality, (ii) LM-Polygraph, a tool for predicting an LLM’s uncertainty in its output using cheap and fast uncertainty quantification techniques, and (iii) LLM-DetectAIve, a tool for machine-generated text detection. Finally, we will argue for the need for specialized models, and we will present the zoo of LLMs currently being developed at MBZUAI’s IFM.
Bio: Preslav Nakov is Professor and Department Chair for NLP at the Mohamed bin Zayed University of Artificial Intelligence. He is part of the core team at MBZUAI’s Institute for Foundation Models that developed Jais, the world’s best open-source Arabic-centric LLM, Nanda, the world’s best open-weights Hindi model, Sherkala, the world’s best open-weights Kazakh model, and LLM360, the first truly open LLM (open weights, open data, and open code). Previously, he was Principal Scientist at the Qatar Computing Research Institute, HBKU, where he led the Tanbih mega-project, developed in collaboration with MIT, which aims to limit the impact of “fake news”, propaganda and media bias by making users aware of what they are reading, thus promoting media literacy and critical thinking. He received his PhD degree in Computer Science from the University of California at Berkeley, supported by a Fulbright grant. He is Chair of the European Chapter of the Association for Computational Linguistics (EACL), Secretary of ACL SIGSLAV, and Secretary of the Truth and Trust Online board of trustees. Formerly, he was PC chair of ACL 2022, and President of ACL SIGLEX. He is also member of the editorial board of several journals including Computational Linguistics, TACL, ACM TOIS, IEEE TASL, IEEE TAC, CS&L, NLE, AI Communications, and Frontiers in AI. He authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and 250+ research papers. He received a Best Paper Award at ACM WebSci’2022, a Best Long Paper Award at CIKM’2020, a Best Resource Paper Award at EACL’2024, a Best Demo Paper Award (Honorable Mention) at ACL’2020, a Best Task Paper Award (Honorable Mention) at SemEval’2020, a Best Poster Award at SocInfo’2019, and the Young Researcher Award at RANLP’2011. He was also the first to receive the Bulgarian President’s John Atanasoff award, named after the inventor of the first automatic electronic digital computer. His research was featured by over 100 news outlets, including Reuters, Forbes, Financial Times, CNN, Boston Globe, Aljazeera, DefenseOne, Business Insider, MIT Technology Review, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget, among others.

Roberto Navigli (Sapienza University of Rome, Italy)
Do Large Language Models Understand Word Meanings?
Abstract: The ability to interpret word meanings in context is a core yet underexplored challenge for Large Language Models (LLMs). While these models demonstrate remarkable linguistic fluency, the extent to which they genuinely grasp word semantics remains an open question. In this talk, we investigate the disambiguation capabilities of state-of-the-art instruction-tuned LLMs, benchmarking their performance against specialized systems designed for Word Sense Disambiguation (WSD), and also evaluating their ability to generate free-form definitions and explanations. We also examine lexical ambiguity as a persistent challenge in Machine Translation (MT), particularly when dealing with rare or context-dependent word senses. Through an in-depth error analysis of both disambiguation and translation tasks, we reveal systematic weaknesses in LLMs, shedding light on the fundamental challenges they face in semantic interpretation. Furthermore, we show the limitations of standard evaluation metrics in capturing disambiguation performance, reinforcing the need for more targeted evaluation frameworks.
Bio: Roberto Navigli is Professor of Natural Language Processing at the Sapienza University of Rome, where he leads the Sapienza NLP Group. He has received two ERC grants on multilingual semantics, highlighted among the 15 projects through which the ERC has transformed science. He has received several prizes, including two Artificial Intelligence Journal prominent paper awards and several outstanding/best paper awards from ACL. He leads the Italian Minerva LLM Project — the first LLM pre-trained in Italian — and is the Scientific Director and co-founder of Babelscape, a successful deep-tech company developing next-generation multilingual NLU and NLG. He is a Fellow of ACL, AAAI, ELLIS, and EurAI, and serves as General Chair of ACL 2025.

Anna Rogers (IT University of Copenhagen, Denmark)
Large language models and factuality
Abstract: People worldwide are increasingly relying on large language models as a part of everyday workflows for discovering and interacting with information, even though the output of these models is not grounded in factuality by design. I will discuss the current directions of work on mitigating this problem, and the impact of large language models on both the information ecosphere and content economy.
Bio: Anna Rogers is an Associate Professor at the IT University of Copenhagen. She holds a PhD degree in Computational Linguistics from the University of Tokyo, followed by postdocs in Machine Learning for NLP (University of Massachusetts) and Social Data Science (University of Copenhagen). Her work focuses on interpretability and robustness of NLP applications based on large language models, as well as their sociotechnical impacts.
Chairs:
Programme Committee Chair: Prof Dr Ruslan Mitkov (Lancaster University, UK and University of Alicante, Spain)
Organising Committee Chair: Prof Dr Galia Angelova (IICT, Bulgarian Academy of Sciences, BG)
Organisers:
Lancaster University, UK
University of Alicante, Spain
Institute of Information and Communication Technologies, BAS, Bulgaria
Bulgarian Association for Computational Linguistics, Bulgaria
RANLP SJR Rank:
The RANLP Proceedings are indexed by Scopus. The SJR-rank can be checked here.