The Tutorials will take place as part of the 14th biennial RANLP conference (RANLP 2023) from 2 to 3 September 2023 in the city of Varna, Bulgaria, delivered by distinguished lecturers:
We plan half-day tutorials, each with duration of 220 minutes, distributed as follows: 60 min talk + 20 min break + 60 min talk + 20 min break + 60 min talk.
Hate or not Hate? Beyond Binary Hate Speech Detection
The prevalence of offensive content on the internet, encompassing hate speech and cyberbullying, is a pervasive issue worldwide. Consequently, it has garnered significant attention from the machine learning (ML) and natural language processing (NLP) communities. As a result, numerous systems have been developed and trained to automatically identify potentially harmful content in order to mitigate its impact. Most of these systems classify posts and harmful and not harmful. In this tutorial, we discuss what is the way forward for detecting harmful content using machine learning approaches. We will discuss the current state-of-the-art models, and benchmarks, followed by improving the explainability and usability of the machine learning models.
Bridging the explainability gap in NLP
Iacer Calixto (University of Amsterdam, Netherlands)
Abstract: This tutorial covers the topic of explainability in Natural Language Processing (NLP). In machine learning, algorithms are said to be more explainable the more the steps the algorithm takes to map input variables onto predictions (i.e., the reasoning) are understandable by humans (i.e., they ‘make sense’ ). However, the best performing models for virtually any NLP task—be it a natural language understanding (NLU) task, e.g., sentiment classification, or a natural language generation (NLG) task, e.g., machine translation —invariably use large language models (LLMs) with the Transformer architecture. Transformer LLMs are complex neural network architectures that are knowingly opaque and hard to interpret. More concretely, in this tutorial we will cover the main challenges in bridging the gap between Transformer LLMs and explainable-by-design methods for NLP, i.e., also referred to as self-explaining or directly interpretable methods. Approximately half of the tutorial will be about showing and discussing a few representative methods that try to bridge ‘explainability-by-design’ and the Transformer architecture. The other half will be about implementing these methods and validating when they work and when they do not, and what are their strong and weak points.
Short bio: Iacer Calixto has a PhD from Dublin City University (2017) on multi-modal machine translation, and since then was a post-doctoral fellow at the Institute for Logic Language and Computation at the University of Amsterdam (2018-2022), a visiting post-doctoral fellow in the Center for Data Science in New York University (2019-2022), and a Marie-Sklodowska Curie Global Fellow (2019-2022). Currently, he is assistant professor at the Amsterdam University Medical Centers at the University of Amsterdam, where he works on methods for responsible and explainable machine learning and natural language processing for problems in medicine and psychology.
Nishant Mishra (University of Amsterdam, Netherlands)
Short bio: Nishant is pursuing his PhD in Responsible AI and NLP for healthcare at the Amsterdam UMC, University of Amsterdam (NL). Before this he obtained his MSc in Computer Science at McGill University (Canada) in 2022, with a focus on deep learning. His master’s thesis concerned efficient and robust tumor detection in giga pixel pathology slides. Nishant has research experience in both industry and academia.
As part of his PhD, he will explore ways to make NLP models (and AI in general) more explainable and robust, enhancing their usability in niche and sensitive areas of applications such as healthcare.
Fighting Mis/Disinformation in the Era of Large Language Models: Challenges and Opportunities
Preslav Nakov (Mohamed bin Zayed University of Artificial Intelligence MBZUAI, Abu Dhabi, UAE)
Abstract: With the rise of large language models (LLMs), the battle against misinformation and disinformation has become increasingly complex. In this talk, I will discuss recent and ongoing work on combatting false information propagated through LLMs, detecting machine generated content, and the risks as well as the challenges and opportunities this brings. First, I will present a novel paradigm for fact-checking complex claims with program-guided reasoning, which turns a complex claim into a Python code for fact-checking it, which we then execute. I will then discuss the risk of misinformation pollution with large language models, and how we can protect against this. Next, I will introduce a novel approach for faking fake news with the aim of real fake news detection by means of propaganda-loaded training data generation. I will also introduce SCITAB, a new benchmark for compositional reasoning and claim verification on scientific tables. Finally, I will discuss the related problem of detecting machine generated text: I will first present the M4 corpus for multi-generator, multi-domain, and multi-lingual black-box machine-generated text detection, and then I will describe methods for leveraging log rank information for zero-shot detection of machine-generated text.
Short bio: Preslav Nakov is Professor of NLP in MBZUAI. Nakov obtained a MSc from Sofia University “St. Kl. Ohridski” and a PhD in computer science from the University of California, Berkeley. He was the first person to receive the prestigious John Atanasov Presidential Award for achievements in the development of the information society by the President of Bulgaria. Nakov’s research interests are disinformation, propaganda, fake news and media bias detection, fact checking, machine translation, question answering, sentiment analysis, and lexical semantics.
Personality Detection from Texts
Sanja Stajner (Karlsruhe, Germany)
Sanja Štajner has over 14 years of research experience across academia and industry on various psycholinguistic topics in NLP. She holds a multiple Masters degree in Natural Language Processing & Human Language Technologies from Autonomous University of Barcelona (Spain) and University of Wolverhampton (UK), and a PhD degree from University of Wolverhampton (UK). Sanja has authored over 80 peer-reviewed articles in leading international journals and conferences. Since 2018, she led and participated in many industry-oriented projects that combined psychology and NLP focusing on personality modelling, sentiment analysis, emotion detection, and mental health assessment. Sanja served as a COLING 2018 area chair for psycholinguistics and cognitive modelling track, and an ACL 2022 demo chair. She has experience as tutorial presenter (EACL 2023, COLING 2018, AIST 2018, RANLP 2017) for international audiences and as a lecturer at Masters and PhD levels.