Keynotes
We are excited to share the following speakers have kindly accepted to give keynote talks at INLG2025.
Keynote 1: Verena Rieser (Google DeepMind)

Title: The Next Frontier of AI Alignment: Intentional, Plural, Deep
Abstract: We constantly talk about AI alignment, but rarely ask: aligned to what, and to whom? Current practices rely on a single, monolithic "gold standard" of human values. This talk fundamentally challenges that approach and introduces a new framework, advocating for three distinct and necessary expansions of the alignment mandate:
- Intentionality: Making deliberate choices about system goals, moving past implicit defaults.
- Plurality: Engineering systems that handle diverse, conflicting human perspectives.
- Depth: Aligning AI with long-term human well-being, pushing beyond shallow engagement proxies.
This framework represents the next frontier of AI development. We will illustrate its practical application with concrete published examples from my team and other leading researchers in the field.
Short Bio: Verena Rieser is a Senior Staff Research Scientist at Google DeepMind, where she founded the VOICES team (Voices-of-all in alignment). Her team’s mission is to enhance Gemini’s safety and usability for diverse communities. Verena has pioneered work in data-driven multimodal Dialogue Systems and Natural Language Generation, encompassing conversational RL agents, faithful data-to-text generation, spoken language understanding, evaluation methodologies, and applications of AI for societal good. Verena previously directed the NLP lab as a full professor at Heriot-Watt University, Edinburgh, and held a Royal Society Leverhulme Senior Research Fellowship. She earned her PhD from Saarland University.
Keynote 2: Hadas Kotek (Apple)

Title: Evaluating Safety in LLM Text-to-Text Transformations: The View from Misgendering
Abstract: This talk discusses Apple's approach to Responsible AI and its implementation within the Apple Intelligence suite of generative AI products. As part of a comprehensive safety evaluation, the Responsible AI team works to ensure that features do not introduce or reinforce harmful biases or stereotypes. We focus specifically on our methodology for evaluating gendering and misgendering in text-to-text model transformations. Our goal is to ensure that models do not assign gendered pronouns or associations in a stereotypical manner when the input text does not contain a clear indication that this is warranted. To this end, we introduce a benchmark dataset designed to evaluate text-to-text transformations. We will discuss our approach to dataset generation and curation, share results from state-of-the-art Large Language Models, and highlight key insights into where these models perform well––and where challenges remain.
Short Bio: Dr. Hadas Kotek is a Sr. Engineering Manager at Apple and a Research Affiliate at the MIT Department of Linguistics. She currently leads the data and evaluation efforts for the Apple Intelligence Responsible AI team. In this role, she focuses on identifying, evaluating, and developing mitigation strategies for harms and biases in customer-facing products that use Apple’s Large Language Models and Diffusion Models. Dr. Kotek has published on diverse topics in the domains of Responsible AI and NLP, including gender bias, hallucinations, evaluating model handling of controversial topics, improving human annotation quality, and human-in-the-loop strategies for annotation; and topics in Linguistics, including the structure and meaning of questions and experimental approaches to the study of quantification and numerosity. Prior to joining Apple, Dr. Kotek held visiting teaching and research positions in Linguistics at Yale, New York University, and McGill University.
Keynote 3: Minlie Huang (黄民烈; Tsinghua University)

Title: Social Intelligence with LLMs: on Emotion, Mind and Cognition
Abstract: Today’s LLM is designed as a machine tool to facilitate the efficiency, productivity, & creativity of human works. However, social intelligence, which is a significant feature of human intelligence, has been largely neglected in current research. Future AGI must have not only machine intelligence but also social intelligence. In this talk, the speaker will talk about how to embrace social intelligence with LLMs, for emotion understanding, emotional support, behavior simulation, modeling cognition and theory of mind, and applications for mental health.
Short Bio: Dr. Minlie Huang, professor of Tsinghua University, the deputy director of the Foundation Model Center of Tsinghua University. He was supported by National Distinguished Young Scholar project. He won several awards in Chinese AI and information processing societies, including Wuwenjun Technical Advancement Award and Qianweichang Technical Innovation Award. His research fields include large-scale language models, language generation, AI safety and alignment, social intelligence, etc. He authored a Chinese book "Modern Natural Language Generation". He published more than 200 papers in premier conferences and journals (ICML, ICLR, Neurips, ACL, EMNLP etc.), with more than 29,000 citations, and was selected as Elsevier China's Highly Cited Scholars since 2022 and the AI 2000 list of the world's most influential AI scholars since 2020; He has won several best papers or nominations at major international conferences (IJCAI, ACL, SIGDIAL, NLPCC, etc.). He was a key contributor of several large foundation models such as ChatGLM, GLM-4.5, GLM4.1v-thinking, CharacterGLM etc. He serves as associate editors for TNNLS, TACL, CL, and TBD, and has served as the senior area chair of ACL/EMNLP/IJCAI/AAAI for more than 10 times. His homepage is located at http://coai.cs.tsinghua.edu.cn/hml/.
Keynote 4: Michael White (Ohio State University)

Title: Are LLMs Still “Mid”? Two Case Studies on Evaluating LLMs in High-Stakes Conversational Settings
Abstract: Recent large language models (LLMs) have achieved impressive benchmark results—for example, OpenAI reports that its latest models outperform the average physician on HealthBench. Yet hallucination remains a persistent challenge, raising doubts about whether LLMs can be trusted in high-stakes applications. Some commentators have even labeled AI as “mid tech”—“so so” technology with limited practical value. In this talk, I present two case studies that probe this question in the context of conversational interaction. The first examines how accurately LLMs answer patient questions about colonoscopy preparation—a simpler task than diagnosis but one with significant implications for clinical practice. We find that recent closed models substantially outperform smaller open models and have nearly eliminated temporal reasoning errors but still produce too many harmful mistakes for safe deployment. The second case study, motivated by our virtual museum tour guide project, introduces VISTA Score, a new framework for automatic, turn-based verification in dialogue. VISTA improves upon LLM-as-a-judge and FActScore for hallucination detection while also handling opinions and abstentions more effectively. I conclude by discussing how combining detection and mitigation strategies can move us toward more trustworthy conversational systems.
Short Bio: Dr. Michael White is Professor and Vice Chair in the Department of Linguistics at The Ohio State University. Prior to joining OSU, Dr. White was a Senior Research Fellow at the University of Edinburgh, and before that he was a partner in the pioneering NLG company CoGenTex. His research has focused on NLG in dialogue with an emphasis on surface realization, extending also to paraphrasing for ambiguity avoidance and data augmentation in the context of OSU’s virtual patient dialogue system. His current research is centered on bootstrapping techniques for training reliable dialogue systems with synthetic conversations and automatic evaluation. He co-organized the NSF Workshop on Shared Tasks in NLG which provided a crucial impetus for the initial shared tasks in NLG, and he was a co-organizer of the first surface realization shared task. From 2018 to 2021, Dr. White collaborated with conversational AI researchers at the company formerly known as Facebook, where he was twice a Visiting Research Scientist. At present, he is just managing to keep his head above water as one of the ACL Rolling Review Editors in Chief.
Keynote 5: Iryna Gurevych (TU Darmstadt)

Title: Please meet AI, our dear new colleague. In other words: can scientists and machines truly cooperate?
Abstract: How can AI and LLMs facilitate the work of scientists in different stages of the research process? Can technology even make scientists obsolete? The role of AI and Large Language Models (LLMs) in science as the target application domain has recently been rapidly growing. This includes assessing the impact of scientific work, facilitating writing and revising manuscripts as well as intelligent support for manuscript quality assessment, peer-review and scientific discussions. The talk will illustrate such methods and models using several tasks from the scientific domain. We argue that while AI and LLMs can effectively support and augment specific steps of the research process, expert-AI collaboration may be a more promising mode for complex research tasks.
Short Bio: Iryna Gurevych is Professor of Ubiquitous Knowledge Processing in the Department of Computer Science at the Technical University of Darmstadt in Germany. She also is an adjunct professor at MBZUAI in Abu-Dhabi, UAE, and an affiliated professor at INSAIT in Sofia, Bulgaria. She is widely known for fundamental contributions to natural language processing (NLP) and machine learning. Professor Gurevych is a past president of the Association for Computational Linguistics (ACL), the leading professional society in NLP. Her many accolades include being a Fellow of the ACL, an ELLIS Fellow, and the recipient of an ERC Advanced Grant. Most recently, she has received the 2025 Milner award of the British Royal Society for her major contributions to NLP and artificial intelligence that combine deep understanding of human language and cognitive faculty with the latest paradigms in machine learning.