Dual Debiasing: Remove Stereotypes and Keep Factual Gender for Fair Language Modeling and Translation Tomasz Limisiewicz, David Mareček and Tomáš Musil
Do My Eyes Deceive Me? A Survey of Human Evaluations of Hallucinations in NLG Patricia Schmidtova, Eduardo Calò, Simone Balloccu, Dimitra Gkatzia, Rudali Huidrom, Mateusz Lango, Fahime Same, Vilém Zouhar, Saad Mahamood and Ondrej Dusek
Mining Contextualized Visual Associations from Images for Creativity Understanding Ananya Sahu, Amith Ananthram and Kathleen McKeown
Evaluating LLMs' Ability to Understand Numerical Time Series for Text Generation Mizuki Arai, Tatsuya Ishigaki, Masayuki Kawarada, Yusuke Miyao, Hiroya Takamura and Ichiro Kobayashi
Can GPT models Follow Human Summarization Guidelines? A Study for Targeted Communication Goals Yongxin Zhou, Fabien Ringeval and François Portet
ViNumFCR: A Novel Vietnamese Benchmark for Numerical Reasoning Fact Checking on Social Media News Nhi Ngoc Phuong Luong, Anh Thi Lan Le, Tin Van Huynh, Kiet Van Nguyen and Ngan Nguyen
Human ratings of LLM response generation in pair-programming dialogue Cecilia Domingo, Paul Piwek, Svetlana Stoyanchev, Rama Sanand Doddipatla, Kaustubh Adhikari and Michel Wermelinger
Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues Dongxu Lu, Johan Jeuring and Albert Gatt
Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs Akio Hayakawa, Stefan Bott and Horacio Saggion
Enhancing Coherence and Interestingness in Knowledge-Grounded Dialogue Generation Hiroki Onozeki and Michimasa Inaba
KDA: Knowledge Distillation Adapter for Cross-Lingual Transfer Ta-Bao Nguyen, Nguyen-Phuong Phan, Tung Le and Huy Tien Nguyen
Exploring the Power of Large Language Models for Vietnamese Implitcit Sentiment Analysis Huy Gia Luu and Dang Van Thin
Live Football Commentary (LFC): A Large‑Scale Dataset for Building Football Commentary Generation Models Taiga Someya, Tatsuya Ishigaki and Hiroya Takamura
Input Matters: Evaluating Input Structure's Impact on LLM Summaries of Sports Play-by-Play Barkavi Sundararajan, Somayajulu Sripada and Ehud Reiter
Can LLMs Help Encoder Models Maintain Both High Accuracy and Consistency in Temporal Relation Classification? Adiel Meir and Kfir Bar
Taming the Titans: A Survey of Efficient LLM Inference Serving Ranran Zhen, Juntao Li, Yixin Ji, Zhenlin Yang, tong liu, Qingrong Xia, Xinyu Duan, Zhefeng Wang, baoxing Huai and Min Zhang
Statistical Multicriteria Evaluation of LLM-Generated Text Esteban Garces Arias, Hannah Blocher, Julian Rodemann, Matthias Assenmacher and Christoph Jansen
Who's Laughing Now? An Overview of Computational Humour Generation and Explanation Tyler Loakman, William Thorne and Chenghua Lin
Counterfactual Simulatability of LLM Explanations for Generation Tasks Marvin Limpijankit, Yanda Chen, Melanie Subbiah, Nicholas Deas and Kathleen McKeown
When LLMs Can't Help: Real-World Evaluation of LLMs in Nutrition Karen Jia-Hui Li, Simone Balloccu, Ondrej Dusek and Ehud Reiter
Forecasting Communication Derailments Through Conversation Generation Yunfan Zhang, Kathleen McKeown and Smaranda Muresan
QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback Taku Mikuriya, Tatsuya Ishigaki, Shunya Minami, Tadashi Kadowaki, Yohichi Suzuki, Shun Naito, Shunya Takada, Takumi Kato, Tamotsu Baseda, Reo Yamada and Hiroya Takamura
Restaurant Menu Categorization at Scale: LLM-Guided Hybrid Clustering Seemab Latif, Ashar Mehmood, Selim Turki, Huma Ameer, Ivan Gorban and Faysal Fateh
LogitRouter: a novel Attention variant for reducing Myopic Routing in Mixture of Experts Felipe Rodriguez and Marcelo Mendoza
Cognitive Flow: An LLM-Automated Framework for Quantifying Reasoning Distillation José Matos, Catarina Silva and Hugo Goncalo Oliveira
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs Ivan Kartac, Mateusz Lango and Ondrej Dusek
References Matter: Investigating the Impact of Reference Set Variation on Summarization Evaluation Silvia Casola, Yang Janet Liu, Siyao Peng, Oliver Kraus, Albert Gatt and Barbara Plank
Fine-Tuning, Prompting, RAG: How do Knowledge Graph-to-Russian Text Generation Models generalise to Out-of-Distribution Data? Anna Nikiforovskaya, William Eduardo Soto Martinez, Evan Parker Kelly Chapple and Claire Gardent
Enhancing Named Entity Translation from Classical Chinese to Vietnamese in Traditional Vietnamese Medicine Domain: A Hybrid Masking and Dictionary-Augmented Approach Uyen Bao Nguyen Phuc, Nhu Vo Quynh Pham, Long Hong Buu Nguyen and Dien Dinh
Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking Daniel Russo, Stefano Menini, Jacopo Staiano and Marco Guerini
PRICoT: Principle Retrieval and Injection from Inference Successes and Failures for CoT Improvement Yudai Yamazaki, Naoto Takeda, Yasutaka Nishimura and Kazushi Ikeda
SWI: Speaking with Intent in Large Language Models Yuwei Yin, Eunjeong Hwang and Giuseppe Carenini
Automated and Context-Aware Code Documentation Leveraging Advanced LLMs Swapnil Sharma Sarker and Tanzina Taher Ifty
Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models Cong Thanh Do, Rama Sanand Doddipatla and Kate Knill
From Prototypical to Relational: How LLMs Navigate Complex Analogies Mayukh Das and Wolf-Tilo Balke
Generating Impact and Critique Explanations of Predictions made by a Goal Recognizer Jair da Silva Ferreira Junior, Ingrid Zukerman, Enes Makalic, Cecile L. Paris and Mor Vered
FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis Hung Quang Nguyen, Anh Phuong Trinh, Hung Phan Quoc Mai and Phong Tuan Trinh
German4All – A Dataset and Model for Readability-Controlled Paraphrasing in German Miriam Anschütz, Thanh Mai Pham, Eslam Nasrallah, Maximilian Müller, Cristian-George Craciun and Georg Groh
Natural Language Translation of Formal Proofs through Informalization of Proof Steps and Recursive Summarization along Proof Structure Seiji Hattori, Takuya Matsuzaki and Makoto Fujiwara
Annotating Hallucinations in Question-Answering using Rewriting Xu Liu, Guanyi Chen, Kees van Deemter and Tingting HE
Short Papers
FreshTab: Sourcing Fresh Data for Table-to-Text Generation Evaluation Kristýna Onderková, Ondrej Platek, Zdeněk Kasner and Ondrej Dusek
Assessing Semantic Consistency in Data‑to‑Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics Rudali Huidrom, Michela Lorandi, Simon Mille, Craig Thomson and Anya Belz
Truth or Twist? Optimal Model Selection for Reliable Label Flipping Evaluation in LLM-based Counterfactuals Qianli Wang, Van Bach Nguyen, Nils Feldhus, Luis Felipe Villa-Arenas, Christin Seifert, Sebastian Möller and Vera Schmitt
Analysing Reference Production of Large Language Models Chengzhao Wu, Guanyi Chen, Fahime Same and Tingting HE
Surprisal reveals diversity gaps in image captioning and different scorers change the story Nikolai Ilinykh and Simon Dobnik
How (un)faithful are explainable LLM-based NLG metrics? Alex Terentowicz, Mateusz Lango and Ondrej Dusek
Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents Fuyu Xing, Zimu Wang, Wei Wang and Haiyang Zhang
Scaling Up Data-to-Text Generation to Longer Sequences: A New Dataset and Benchmark Results for Generation from Large Triple Sets Chinonso Cynthia Osuji, Simon Mille, Ornait O'Connell, Thiago Castro Ferreira, Anya Belz and Brian Davis
Are Multi-Agents the new Pipeline Architecture for Data-to-Text Systems? Chinonso Cynthia Osuji, Brian Timoney, Mark Andrade, Thiago Castro Ferreira and Brian Davis
Incorporating Formulaicness in the Automatic Evaluation of Naturalness: A Case Study in Logic-to-Text Generation Eduardo Calò, Guanyi Chen, Elias Stengel-Eskin, Albert Gatt and Kees van Deemter
Demo Papers
VitaEval: Open-source Human Evaluation Tool for Video-to-Text and Video-to-Audio Systems Goran Topic, Yuki Saito, Katsuhito Sudoh, Shinnosuke Takamichi, Hiroya Takamura, Graham Neubig and Tatsuya Ishigaki
CSPaper Review: Fast, Rubric-Faithful Conference Feedback Lele Cao, Lei You and R&D Team
Echoes of Others: Real-Time LLM Dialogue Generation for Immersive NPC Interaction James McGrath, Michela Lorandi and Anya Belz
ARTIST: A Learning Support System for Fostering Students' Argumentative Writing Skills Thomas Huber and Christina Niklaus
GenChal
Live Commentary Planning and Generation Chung-Chi Chen, Ming-Hung Wang, Ramon Ruiz-Dolz, Chris Reed, Ichiro Kobayashi, Yusuke Miyao and Hiroya Takamura
ReproNLP Shared Task Overview Anya Belz, Craig Thomson, Javier González Corbelle, Malo Ruelle
DCU-ADAPT-modPB at the GEM’24 Data-to-Text Generation Task: Model Hybridisation for Pipeline Data-to-Text Natural Language Generation Chinonso Cynthia Osuji, Rudali Huidrom, Kolawole John Adebayo, Thiago Castro Ferreira, Brian Davis
DCU-NLG-PBN at the GEM’24 Data-to-Text Task: Open-Source LLM PEFT-Tuning for Effective Data-to-Text Generation Michela Lorandi, Anya Belz
DCU-NLG-Small at the GEM’24 Data-to-Text Task: Rule-based generation and post-processing with T5-Base Simon Mille, Malo Ruelle, Mohammed Sabry, Anya Belz
TeamSaarLST at the GEM’24 Data-to-text Task: Revisiting symbolic retrieval in the LLM-age Mayank Jobanputra, Vera Demberg
Long-Form Analogy Evaluation Challenge Bhavya Bhavya, Chris Palaguachi, Yang Zhou, Suma Bhat, ChengXiang Zhai
The 2024 GEM Shared Task on Multilingual Data-to-Text Generation and Summarization: Qualitative Evaluation Results João Sedoc, Simon Mille, Miruna Adriana Clinciu, Yixin Liu, Elizabeth Clark, Kaustubh Dhole, Saad Mahamood, Lining Zhang