Event
Pre-ACL 2025 Workshop
Location
Date
Organizer
About
The Pre-ACL Workshop in Copenhagen aims to strengthen the Danish NLP community by connecting it with global leaders in the field. ACL, the oldest and largest international conference on NLP, will take place in Vienna, Austria this year. This presents a unique opportunity to bring international experts closer to the local Danish community to foster knowledge exchange, future collaborations, and innovation through discussions on cutting-edge advancements in language technology.
The workshop is supported by the Danish Data Science Academy, as well as the Pioneer Centre for AI.
The programme will combine different formats – talks, poster sessions and round-table discussions – offering direct engagement in an intimate setting with global leaders to help spark new ideas and promote networking and collaboration.
The target group consists of researchers and practitioners in the areas of Natural Language Processing, Generative AI and Language Technology.
Participants of any seniority and from academia as well as industry are welcome to attend.
Registration
Attendance and registration are free of charge, but limited spots are available due to space constraints.
It is possible to attend as an ordinary attendee or as a poster presenter.
The deadline for general sign-up is 5 July 2025. Attendees are invited to submit a poster presentation proposal as part of the registration procedure by 16 June 2025, and will be informed of the outcome of their submissions by 23 June 2025. In addition to the keynote talks, poster presenters will be invited to join round-table discussions, lunch and dinner on the day of the workshop, as well as a pre-workshop social on 25 July, and can apply for financial support for travel and accommodation to attend the workshop.
Sign up here
Schedule
25 July
16.00-18.00 Social event: Canal Tour
26 July
08.30-09.00 Onesite registration, poster setup & coffee (P1)
09.00-09.15 Opening remarks by Isabelle Augenstein (Natural History Museum)
09.15-10.30 Session 1: LLMs as multi-modal social reasoning agents. Speakers: Thamar Solorio & Danish Pruthi. Session Chair: Pepa Atanasova. (Natural History Museum)
10.30-11.00 Coffee break (P1)
11.00-12.10 Session 2: Biases and values in humans and LLMs. Speakers: Tanu Mitra & David Jürgens. Session Chair: Dustin Wright. (Natural History Museum)
12.10-13.30 Roundtable lunch discussions (P1)
13.30-14.30 Poster session (P1)
14.30-15.00 Coffee break (P1)
15.00-16.45 Session 3: Potential vs risks of LLMs for decision making. Speakers: Anjalie Field, Dallas Card & Mausam. Session Chair: Arnav Arora. (Natural History Museum)
16.45-17.00 Closing remarks & poster awards by Isabelle Augentein (Natural History Museum)
18.00-20.00 Dinner & networking
Find the list of accepted posters here
Invited speakers
- Anjalie Field, Assistant Professor at Johns Hopkins University: Anjalie Field is an Assistant Professor in the Computer Science Department at Johns Hopkins University. She is also affiliated with the Center for Language and Speech Processing (CLSP) and the new Data Science and AI Institute. Her research focuses on the ethics and social science aspects of natural language processing, which includes developing models to address societal issues like discrimination and propaganda, as well as critically assessing and improving ethics in AI pipelines. Her work has been published in NLP and interdisciplinary venues, like ACL and PNAS, and in 2024 she was named an AI2050 Early Career Fellow by Schmidt Futures. Prior to joining JHU, she was a postdoctoral researcher at Stanford, and she completed her PhD at the Language Technologies Institute at Carnegie Mellon University.
- Talk: Fairness and Privacy in High-Stakes NLP
- Abstract: Practitioners are increasingly using algorithmic tools in high-stakes settings, like healthcare, social services, policing, and education with particular recent interest in natural language processing (NLP). These domains raise a number of challenges, including preserving data privacy, ensuring model reliability, and developing approaches that can mitigate, rather than exacerbate historical bias. In this talk, I will discuss our recent work investigating risks of racial bias in NLP child protective services and ways we aim to better preserve privacy for these types of audits in the future. Time permitting, I will also discuss, our development of speech processing tools for policy body camera footage, which aims to improve police accountability. Both domains involve challenges in working with messy minimally processed data containing sensitive information and domain-specific language. This work emphasizes how NLP has potential to advance social justice goals, like police accountability, but also risks causing direct harm by perpetuating bias, reducing privacy and increasing power imbalances.
- Dallas Card, Assistant Professor in the School of Information at the University of Michigan: Dallas Card focuses his research on making machine learning more reliable and responsible, and on using machine learning and natural language processing to learn about society from text. His work received a best short paper nomination at ACL 2019, a distinguished paper award at FAccT 2022, and has been covered by The Washington Post, Vox, Wired, and other outlets. Prior to starting at Michigan, Dallas was a postdoctoral researcher with the Stanford Natural Language Processing Group and the Stanford Data Science Institute. He holds a Ph.D. in Machine Learning from Carnegie Mellon University.
- Talk: Semantic change in historical legal texts and throughout the lifespan
- Abstract: Dallas Card is an Assistant Professor in the School of Information at the University of Michigan, where his research focuses on making machine learning more reliable and responsible, and on using machine learning and natural language processing to learn about society from text. His work received a best short paper nomination at ACL 2019, a distinguished paper award at FAccT 2022, and has been covered by The Washington Post, Vox, Wired, and other outlets. Prior to starting at Michigan, Dallas was a postdoctoral researcher with the Stanford Natural Language Processing Group and the Stanford Data Science Institute. He holds a Ph.D. in Machine Learning from Carnegie Mellon University.
- Danish Pruthi, Assistant Professor at IISc Bangalore: Danish Pruthi is an Assistant Professor at the Indian Institute of Science (IISc), Bangalore. He received his Ph.D. from the School of Computer Science at Carnegie Mellon University. He is broadly interested in the areas of natural language processing and deep learning, with a focus towards inclusive development and evaluation of AI models. He completed his bachelors degree in computer science from BITS Pilani, Pilani. He is also a recipient of the Schmidt Sciences AI2050 Early Career Fellowship, Siebel Scholarship, the CMU Presidential Fellowship and industry awards from Google and Adobe Inc. Until recently, his legal name was only Danish—an “edge case” for many deployed NLP systems, leading to airport quagmires and, in equal parts, funny anecdotes.
- Talk: All That Glitters is Not Novel: Plagiarism in AI Generated Research
- Abstract:Automating scientific research is considered the final frontier of science. Recently, several papers claim autonomous research agents can generate novel research ideas. Amidst the prevailing optimism, we discover that, concerningly, a considerable fraction of such research documents are smartly plagiarized. In this talk, I will share details about our effort to measure the extent of plagiarism in AI-generated research.
- David Jurgens, Associate Professor at the University of Michigan: David Jurgens is an associate professor jointly in the School of Information and the Department of Electrical Engineering and Computer Science at the University of Michigan. His research is at the intersection of natural language processing and computational social science.
- Talk: Large-Scale Language, Real-World Complexity: Insights from a Large-Scale Podcast Analysis and Moral Reasoning with LLMs
- Abstract:As NLP expands into modeling complex social phenomena, we need data and benchmarks that reflect real-world language and reasoning. This talk presents two upcoming ACL papers that meet this need across distinct domains. First, I introduce SPORC, a dataset of 1.1M podcast transcripts with metadata, speaker roles, and audio-derived features—offering large-scale access to spontaneous, topic-rich, long-form discourse. Second, I present UniMoral, a multilingual benchmark for moral reasoning, combining dilemmas from psychology and social media with annotations for ethical judgments and cultural context. Evaluations with LLMs reveal both promising capabilities and key limitations. Together, these projects highlight how socially grounded datasets can support richer, more context-sensitive language understanding.
- Mausam, Professor at IIT Delhi: Mausam is a Professor of Computer Science and founding head of Yardi School of AI at IIT Delhi, along with being an affiliate professor at University of Washington, Seattle. He has over 100 archival papers, a book, two best paper awards, and an ACL test of time award to his credit. He has been a PC Chair for AAAI, is currently an Editor-in-Chief for ARR, and was recently elected as a AAAI Fellow.
- Talk: Robust Semantic Parsing in Low Resource Settings
- Abstract: Most existing semantic parsing systems (e.g., KBQA) only study heavily supervised in-domain settings where all questions are answerable. This severely limits the real-world applicability of these systems. In this talk we study two extensions: (1) transfer learning where only a small in-domain training data is available along with larger out-of-domain training set, and (2) robustness, where questions may not be answerable given the KB. In addition to providing new datasets, we build a series of models using both small and large language models for both these tasks. In our final result, we are able to devise GPT4-based workflows for datasets that include unanswerable questions, which can be developed with very little in-domain training data.
- Tanu Mitra, Associate Professor at University of Washington: Tanu Mitra is an Associate Professor at the University of Washington, Information School, where she leads the Social Computing and ALgorithmic Experiences (SCALE) lab group. Her research combines computational techniques and social science principles to study complex social processes underlying human behavior in large-scale online social systems. Her current research focus is on understanding and designing defenses against problematic information in generative AI technologies, and in online social platforms and the algorithms driving them. Tanu’s work employs a range of interdisciplinary methods from the fields of human computer interaction, machine learning, and natural language processing. Her work has been supported by grants from the NSF, NIH, DoD, and several foundations and industry grants. Her research has been recognized through multiple awards and honors, including an NSF-CAREER, NSF-CRII, an early career ONR-YIP and Adamic-Glance Distinguished Young Researcher award, along with several best paper awards. Dr. Mitra received her PhD in Computer Science from Georgia Tech’s School of Interactive Computing.
- Talk: Issues of Alignment and Covert Bias in Language Models
- Abstract: As large language models (LLMs) are increasingly integrated into real-world applications, concerns around their alignment with human values and potential for harm have come to the forefront. This talk explores three interrelated questions at the heart of responsible LLM deployment: (1) Do LLMs act in accordance with their stated values, or do their behaviors reveal a value-action gap? (2) Do they express covert harms and stereotypes—especially in relation to non-Western concepts like caste versus Western ones like race? (3) How well do LLM judgments align with human perceptions of offensiveness?
Through recent empirical studies, we show that LLMs often exhibit a value-action gap, generate covert harms, and vary in their alignment with human judgments of offensiveness. These findings highlight the need for culturally grounded, context-aware approaches to evaluating and improving LLM behavior.
- Thamar Solorio, Professor at MBZUAI: Thamar Solorio is a professor in the NLP department at MBZUAI where she also serves as Senior Director of Graduate Student Education and Postdoctoral Affairs. Her research interests include NLP for low-resource settings and multilingual data, including code-switching and information extraction. More recently, she has been exploring language and vision problems, focusing on developing inclusive NLP. She served two terms as an elected board member of the North American Chapter of the Association of Computational Linguistics (NAACL) and was PC co-chair for NAACL 2019, and recently stepped down from being co-Editor in Chief of the ACL Rolling Review Initiative (ARR). She was the general chair of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP).
- Talk: Towards Socially Intelligent Multimodal Artificial AgentsAgent
- Abstract: This talk will showcase our recent work on evaluating existing models’ abilities to understand human-human social interactions. I will present a recent benchmark that covers seven distinct theory-of-mind categories in a realistic, narrative-rich scenario. The benchmark includes videos that provide nuanced insight into characters’ mental states. We also show how state-of-the-art models perform on this task. I will conclude with an overview of the many interesting open challenges in this space.
Organisers
- Isabelle Augenstein, Professor at the University og Copenhagen and P1 Co-Lead
- Pepa Atanasova, Tenure-Track Assistant Professor at the University of Copenhagen
- Arnav Arora, PhD student at the University of Copenhagen
- Dustin Wright, Postdoctoral Researcher at the University of Copenhagen
- Yoonna Jang, Postdoctoral Researcher at the University of Copenhagen