Accepted Papers

Long Papers

  • Dual Debiasing: Remove Stereotypes and Keep Factual Gender for Fair Language Modeling and Translation
    Tomasz Limisiewicz, David Mareček and Tomáš Musil
  • Do My Eyes Deceive Me? A Survey of Human Evaluations of Hallucinations in NLG
    Patricia Schmidtova, Eduardo Calò, Simone Balloccu, Dimitra Gkatzia, Rudali Huidrom, Mateusz Lango, Fahime Same, Vilém Zouhar, Saad Mahamood and Ondrej Dusek
  • Mining Contextualized Visual Associations from Images for Creativity Understanding
    Ananya Sahu, Amith Ananthram and Kathleen McKeown
  • Evaluating LLMs' Ability to Understand Numerical Time Series for Text Generation
    Mizuki Arai, Tatsuya Ishigaki, Masayuki Kawarada, Yusuke Miyao, Hiroya Takamura and Ichiro Kobayashi
  • Can GPT models Follow Human Summarization Guidelines? A Study for Targeted Communication Goals
    Yongxin Zhou, Fabien Ringeval and François Portet
  • ViNumFCR: A Novel Vietnamese Benchmark for Numerical Reasoning Fact Checking on Social Media News
    Nhi Ngoc Phuong Luong, Anh Thi Lan Le, Tin Van Huynh, Kiet Van Nguyen and Ngan Nguyen
  • Human ratings of LLM response generation in pair-programming dialogue
    Cecilia Domingo, Paul Piwek, Svetlana Stoyanchev, Rama Sanand Doddipatla, Kaustubh Adhikari and Michel Wermelinger
  • Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
    Dongxu Lu, Johan Jeuring and Albert Gatt
  • Towards Trustworthy Lexical Simplification: Exploring Safety and Efficiency with Small LLMs
    Akio Hayakawa, Stefan Bott and Horacio Saggion
  • Enhancing Coherence and Interestingness in Knowledge-Grounded Dialogue Generation
    Hiroki Onozeki and Michimasa Inaba
  • KDA: Knowledge Distillation Adapter for Cross-Lingual Transfer
    Ta-Bao Nguyen, Nguyen-Phuong Phan, Tung Le and Huy Tien Nguyen
  • Exploring the Power of Large Language Models for Vietnamese Implitcit Sentiment Analysis
    Huy Gia Luu and Dang Van Thin
  • Live Football Commentary (LFC): A Large‑Scale Dataset for Building Football Commentary Generation Models
    Taiga Someya, Tatsuya Ishigaki and Hiroya Takamura
  • Input Matters: Evaluating Input Structure's Impact on LLM Summaries of Sports Play-by-Play
    Barkavi Sundararajan, Somayajulu Sripada and Ehud Reiter
  • Can LLMs Help Encoder Models Maintain Both High Accuracy and Consistency in Temporal Relation Classification?
    Adiel Meir and Kfir Bar
  • Taming the Titans: A Survey of Efficient LLM Inference Serving
    Ranran Zhen, Juntao Li, Yixin Ji, Zhenlin Yang, tong liu, Qingrong Xia, Xinyu Duan, Zhefeng Wang, baoxing Huai and Min Zhang
  • Statistical Multicriteria Evaluation of LLM-Generated Text
    Esteban Garces Arias, Hannah Blocher, Julian Rodemann, Matthias Assenmacher and Christoph Jansen
  • Who's Laughing Now? An Overview of Computational Humour Generation and Explanation
    Tyler Loakman, William Thorne and Chenghua Lin
  • Counterfactual Simulatability of LLM Explanations for Generation Tasks
    Marvin Limpijankit, Yanda Chen, Melanie Subbiah, Nicholas Deas and Kathleen McKeown
  • When LLMs Can't Help: Real-World Evaluation of LLMs in Nutrition
    Karen Jia-Hui Li, Simone Balloccu, Ondrej Dusek and Ehud Reiter
  • Forecasting Communication Derailments Through Conversation Generation
    Yunfan Zhang, Kathleen McKeown and Smaranda Muresan
  • QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback
    Taku Mikuriya, Tatsuya Ishigaki, Shunya Minami, Tadashi Kadowaki, Yohichi Suzuki, Shun Naito, Shunya Takada, Takumi Kato, Tamotsu Baseda, Reo Yamada and Hiroya Takamura
  • Restaurant Menu Categorization at Scale: LLM-Guided Hybrid Clustering
    Seemab Latif, Ashar Mehmood, Selim Turki, Huma Ameer, Ivan Gorban and Faysal Fateh
  • LogitRouter: a novel Attention variant for reducing Myopic Routing in Mixture of Experts
    Felipe Rodriguez and Marcelo Mendoza
  • Cognitive Flow: An LLM-Automated Framework for Quantifying Reasoning Distillation
    José Matos, Catarina Silva and Hugo Goncalo Oliveira
  • OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
    Ivan Kartac, Mateusz Lango and Ondrej Dusek
  • References Matter: Investigating the Impact of Reference Set Variation on Summarization Evaluation
    Silvia Casola, Yang Janet Liu, Siyao Peng, Oliver Kraus, Albert Gatt and Barbara Plank
  • Fine-Tuning, Prompting, RAG: How do Knowledge Graph-to-Russian Text Generation Models generalise to Out-of-Distribution Data?
    Anna Nikiforovskaya, William Eduardo Soto Martinez, Evan Parker Kelly Chapple and Claire Gardent
  • Enhancing Named Entity Translation from Classical Chinese to Vietnamese in Traditional Vietnamese Medicine Domain: A Hybrid Masking and Dictionary-Augmented Approach
    Uyen Bao Nguyen Phuc, Nhu Vo Quynh Pham, Long Hong Buu Nguyen and Dien Dinh
  • Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking
    Daniel Russo, Stefano Menini, Jacopo Staiano and Marco Guerini
  • PRICoT: Principle Retrieval and Injection from Inference Successes and Failures for CoT Improvement
    Yudai Yamazaki, Naoto Takeda, Yasutaka Nishimura and Kazushi Ikeda
  • SWI: Speaking with Intent in Large Language Models
    Yuwei Yin, Eunjeong Hwang and Giuseppe Carenini
  • Automated and Context-Aware Code Documentation Leveraging Advanced LLMs
    Swapnil Sharma Sarker and Tanzina Taher Ifty
  • Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models
    Cong Thanh Do, Rama Sanand Doddipatla and Kate Knill
  • From Prototypical to Relational: How LLMs Navigate Complex Analogies
    Mayukh Das and Wolf-Tilo Balke
  • Generating Impact and Critique Explanations of Predictions made by a Goal Recognizer
    Jair da Silva Ferreira Junior, Ingrid Zukerman, Enes Makalic, Cecile L. Paris and Mor Vered
  • FinStat2SQL: A Text2SQL Pipeline for Financial Statement Analysis
    Hung Quang Nguyen, Anh Phuong Trinh, Hung Phan Quoc Mai and Phong Tuan Trinh
  • German4All – A Dataset and Model for Readability-Controlled Paraphrasing in German
    Miriam Anschütz, Thanh Mai Pham, Eslam Nasrallah, Maximilian Müller, Cristian-George Craciun and Georg Groh
  • Natural Language Translation of Formal Proofs through Informalization of Proof Steps and Recursive Summarization along Proof Structure
    Seiji Hattori, Takuya Matsuzaki and Makoto Fujiwara
  • Annotating Hallucinations in Question-Answering using Rewriting
    Xu Liu, Guanyi Chen, Kees van Deemter and Tingting HE

Short Papers

  • FreshTab: Sourcing Fresh Data for Table-to-Text Generation Evaluation
    Kristýna Onderková, Ondrej Platek, Zdeněk Kasner and Ondrej Dusek
  • Assessing Semantic Consistency in Data‑to‑Text Generation: A Meta-Evaluation of Textual, Semantic and Model-Based Metrics
    Rudali Huidrom, Michela Lorandi, Simon Mille, Craig Thomson and Anya Belz
  • Truth or Twist? Optimal Model Selection for Reliable Label Flipping Evaluation in LLM-based Counterfactuals
    Qianli Wang, Van Bach Nguyen, Nils Feldhus, Luis Felipe Villa-Arenas, Christin Seifert, Sebastian Möller and Vera Schmitt
  • Analysing Reference Production of Large Language Models
    Chengzhao Wu, Guanyi Chen, Fahime Same and Tingting HE
  • Surprisal reveals diversity gaps in image captioning and different scorers change the story
    Nikolai Ilinykh and Simon Dobnik
  • How (un)faithful are explainable LLM-based NLG metrics?
    Alex Terentowicz, Mateusz Lango and Ondrej Dusek
  • Benchmarking and Improving LVLMs on Event Extraction from Multimedia Documents
    Fuyu Xing, Zimu Wang, Wei Wang and Haiyang Zhang
  • Scaling Up Data-to-Text Generation to Longer Sequences: A New Dataset and Benchmark Results for Generation from Large Triple Sets
    Chinonso Cynthia Osuji, Simon Mille, Ornait O'Connell, Thiago Castro Ferreira, Anya Belz and Brian Davis
  • Are Multi-Agents the new Pipeline Architecture for Data-to-Text Systems?
    Chinonso Cynthia Osuji, Brian Timoney, Mark Andrade, Thiago Castro Ferreira and Brian Davis
  • Incorporating Formulaicness in the Automatic Evaluation of Naturalness: A Case Study in Logic-to-Text Generation
    Eduardo Calò, Guanyi Chen, Elias Stengel-Eskin, Albert Gatt and Kees van Deemter

Demo Papers

  • VitaEval: Open-source Human Evaluation Tool for Video-to-Text and Video-to-Audio Systems
    Goran Topic, Yuki Saito, Katsuhito Sudoh, Shinnosuke Takamichi, Hiroya Takamura, Graham Neubig and Tatsuya Ishigaki
  • CSPaper Review: Fast, Rubric-Faithful Conference Feedback
    Lele Cao, Lei You and R&D Team
  • Echoes of Others: Real-Time LLM Dialogue Generation for Immersive NPC Interaction
    James McGrath, Michela Lorandi and Anya Belz
  • ARTIST: A Learning Support System for Fostering Students' Argumentative Writing Skills
    Thomas Huber and Christina Niklaus

GenChal

  • Live Commentary Planning and Generation
    Chung-Chi Chen, Ming-Hung Wang, Ramon Ruiz-Dolz, Chris Reed, Ichiro Kobayashi, Yusuke Miyao and Hiroya Takamura
  • ReproNLP Shared Task Overview
    Anya Belz, Craig Thomson, Javier González Corbelle, Malo Ruelle
  • DCU-ADAPT-modPB at the GEM’24 Data-to-Text Generation Task: Model Hybridisation for Pipeline Data-to-Text Natural Language Generation
    Chinonso Cynthia Osuji, Rudali Huidrom, Kolawole John Adebayo, Thiago Castro Ferreira, Brian Davis
  • DCU-NLG-PBN at the GEM’24 Data-to-Text Task: Open-Source LLM PEFT-Tuning for Effective Data-to-Text Generation
    Michela Lorandi, Anya Belz
  • DCU-NLG-Small at the GEM’24 Data-to-Text Task: Rule-based generation and post-processing with T5-Base
    Simon Mille, Malo Ruelle, Mohammed Sabry, Anya Belz
  • TeamSaarLST at the GEM’24 Data-to-text Task: Revisiting symbolic retrieval in the LLM-age
    Mayank Jobanputra, Vera Demberg
  • Long-Form Analogy Evaluation Challenge
    Bhavya Bhavya, Chris Palaguachi, Yang Zhou, Suma Bhat, ChengXiang Zhai
  • The 2024 GEM Shared Task on Multilingual Data-to-Text Generation and Summarization: Qualitative Evaluation Results
    João Sedoc, Simon Mille, Miruna Adriana Clinciu, Yixin Liu, Elizabeth Clark, Kaustubh Dhole, Saad Mahamood, Lining Zhang