Keynote speakers

Kyunghyun Cho (New York University)

"A Generalized Framework of Sequence Generation "

Abstract: In this talk, I will describe a generalized framework under which various sequence generation approaches could be formulated. From this generalized framework, I will derive some of the recently proposed approaches to sequence generation beyond conventional left-to-right monotonic generation. They include parallel decoding via iterative refinement, latent-variable non-autoregressive sequence models, masked language model generation and non-monotonic sequential generation.

Bio: Kyunghyun Cho is an assistant professor of computer science and data science at New York University and a research scientist at Facebook AI Research. He was a postdoctoral fellow at University of Montreal until summer 2015 under the supervision of Prof. Yoshua Bengio, and received PhD and MSc degrees from Aalto University early 2014 under the supervision of Prof. Juha Karhunen, Dr. Tapani Raiko and Dr. Alexander Ilin. He tries his best to find a balance among machine learning, natural language processing, and life, but almost always fails to do so.

Ken Church (Baidu)

"Setting Appropriate Expectations: Are Deep Nets Too Hot? Too Cold? Or Just Right?"

Abstract: There is considerable excitement over deep nets, and for good reasons. More and more people are attending more and more conferences on Machine Learning. Deep nets have produced substantial progress on a number of benchmarks, especially in vision and speech. This progress is changing the world in all kinds of ways. Face recognition and speech recognition are everywhere. When friends and family used to ask me about my work, I used to have to explain what AI and speech recognition were, but that's no longer necessary now that everyone has experience with such technologies. Most people know that AI works well in simple cases (monologues with clean speech), but dialogues are proving more challenging than monologues, and the cocktail party effect is likely to remain well beyond the state of the art for some time to come. We created the DIHARD challenge ( to encourage the community to work on diarization (who spoke when), a problem that many thought was solved, based on some early successes. We were so pleased to see that the community found DIHARD 2018 to be both hard and worthwhile that we did it again (see DIHARD 2019). More generally, we have had some successes, and that is a wonderful thing, but much work remains to be done. As Feynman said, "you must not fool yourself -- and you are the easiest person to fool.

Bio: Kenneth Church has worked on many topics in computational linguistics including: web search, language modeling, text analysis, spelling correction, word-sense disambiguation, terminology, translation, lexicography, compression, speech (recognition, synthesis & diarization), OCR, as well as applications that go well beyond computational linguistics such as revenue assurance and virtual integration (using screen scraping and web crawling to integrate systems that traditionally don't talk together as well as they could such as billing and customer care). He enjoys working with large corpora such as the Associated Press newswire (1 million words per week) and even larger datasets such as telephone call detail (1-10 billion records per month) and web logs. He earned his undergraduate and graduate degrees from MIT, and has worked at AT&T, Microsoft, Hopkins, IBM and Baidu. He was the president of ACL in 2012, and SIGDAT (the group that organizes EMNLP) from 1993 until 2011. He became an AT&T Fellow in 2001 and ACL Fellow in 2015.

Preslav Nakov (Qatar Computing Research Institute, HBKU)

"Detecting the "Fake News" Before They Were Even Written"

Abstract: Given the recent proliferation of disinformation online, there has been also growing research interest in automatically debunking rumors, false claims, and "fake news". A number of fact-checking initiatives have been launched so far, both manual and automatic, but the whole enterprise remains in a state of crisis: by the time a claim is finally fact-checked, it could have reached millions of users, and the harm caused could hardly be undone. An arguably more promising direction is to focus on fact-checking entire news outlets, which can be done in advance. Then, we could fact-check the news before they were even written: by checking how trustworthy the outlets that published them are.

We will show how we do this in the Tanbih news aggregator, which makes users aware of what they are reading. In particular, we develop media profiles that show the general factuality of reporting, the degree of propagandistic content, hyper-partisanship, leading political ideology, general frame of reporting, stance with respect to various claims and topics, as well as audience reach and audience bias in social media.

Bio: Dr. Preslav Nakov is a Principal Scientist at the Qatar Computing Research Institute (QCRI), HBKU. His research interests include computational linguistics, "fake news" detection, fact-checking, machine translation, question answering, sentiment analysis, lexical semantics, Web as a corpus, and biomedical text processing. He received his PhD degree from the University of California at Berkeley (supported by a Fulbright grant), and he was a Research Fellow in the National University of Singapore, a honorary lecturer in the Sofia University, and research staff at the Bulgarian Academy of Sciences. At QCRI, he leads the Tanbih project (, developed in collaboration with MIT, which aims to limit the effect of "fake news", propaganda and media bias by making users aware of what they are reading. Dr. Nakov is the Secretary of ACL SIGLEX and of ACL SIGSLAV, and a member of the EACL advisory board. He is member of the editorial board of TACL, C&SL, NLE, AI Communications, and Frontiers in AI. He is also on the Editorial Board of the Language Science Press Book Series on Phraseology and Multiword Expressions. He co-authored a Morgan & Claypool book on Semantic Relations between Nominals, two books on computer algorithms, and many research papers in top-tier conferences and journals. He also received the Young Researcher Award at RANLP'2011. He was also the first to receive the Bulgarian President's John Atanasoff award, named after the inventor of the first automatic electronic digital computer. Dr. Nakov's research was featured by over 100 news outlets, including Forbes, Boston Globe, Aljazeera, MIT Technology Review, Science Daily, Popular Science, Fast Company, The Register, WIRED, and Engadget, among others.

Sebastian Padó (Stuttgart University)

"Entities as a Window into (Distributional) Semantics"

Abstract: The distinction between categories (“country”) and entities (“Italy”, “UK”), which originates in formal semantics, can be found, strangely enough, in contemporary NLP: Applications areas, like Information Extraction or Semantic Web, care predominantly about entities, while work in lexical semantic modelling has concentrated primarily on concepts. I argue that lexical semantics, in particular distributional approaches, have a lot to gain from taking a closer look at entities, since distributional models of entities provide an interesting window into the relationships among words, entities, and categories. I will discuss insights from two kinds of studies: on knowledge base completion and on modeling the semantic relation of instantiation.

Bio: Sebastian Padó is Professor of Computational Linguistics at Stuttgart University. He studied in Saarbrücken and Edinburgh and was a postdoctoral scholar at Stanford University. His core research concerns learning, representing, and processing semantic knowledge (broadly construed) from and in text. Examples include modeling linguistic phenomena, discourse structure, inference, semantic processing, and cross-lingual perspectives on semantics.

Hinrich Schütze (Ludwig Maximilian University, Munich)

"Teaching Deep Networks Lexical Semantics: The Easy Way or the Hard Way?"

Abstract: At the lowest level of deep learning architectures for natural language processing, text is represented by an embedding layer. Embedding units of different granularities are in use, including characters, subwords, words and phrases. After an analysis of the pros and cons of different granularities for accurate representation of lexical semantics, we will look at BERT, a contextualized embedding model whose embeddings are subwords. I will discuss problems we have found with BERT subwords and show how these problems can be solved by attentive mimicking (Schick&Schütze 2019), which mitigates the effect of poorly trained subword representations.

Bio: Hinrich Schütze is professor of computational linguistics and director of the Center for Information and Language Processing at LMU Munich in Germany. Before moving to Munich in 2013, he taught at the University of Stuttgart. He received his PhD in Computational Linguistics from Stanford University in 1995 and worked on natural language processing and information retrieval technology at Xerox PARC, at several Silicon Valley startups and at Google 1995-2004 and 2008/9. He is a coauthor of Foundations of Statistical Natural Language Processing (with Chris Manning) and Introduction to Information Retrieval (with Chris Manning and Prabhakar Raghavan).