Data Quality for AI

Program Co-Directors

Andrzej Wąsowski, Irina Shklovski & Naja Holten Møller

Program Description

The problem of data quality has increasingly become a topic of concern across computer science, both within and beyond AI research, and is ever more acute as AI systems integrate deeper into every aspect of society.

With the growing awareness that data quality can have differential effects on machine learning performance, there is little research so far on how and why. Considerations of data quality in AI research struggle with the fundamental challenge of assessing how well data represent real-world concepts. As a result, approaches to data quality within computer science focus on assessment and mitigation techniques once datasets are produced, relying on algorithmic approaches based on statistical methods. Yet datasets and their flaws are deeply situated in the contexts of their creation, while there is broad agreement that high-quality data is urgently needed to create better, more responsible and more precise AI systems.

Within AI research, the conversation around data quality has been spurred on by the EU AI Act due to the legal requirements for high-risk AI systems. The EU AI Act attempts to redress the inherent gap between training datasets and operational data by requiring that high-risk AI systems must be trained on high-quality data, involve human supervision, and monitor the quality of input data. However, without guidance on what constitutes a “good” representation, ensuring data quality in creation and assessing it post-hoc is a challenge, and existing metrics are difficult to apply as current definitions have not moved beyond "fit for purpose" developed in the mid-1990's.

Despite the Nordic region's leading position in responsible and ethical development of AI systems, the Nordic State of AI report from 2024 lists data quality as a key barrier to increasing adoption of AI.

The P1 Data Quality program aims to be part of answering the challenge of data quality, with the ambition to establish common ground across disciplines, academia, and industry around definitions, meanings, and implications. Furthermore, the objective is to position Denmark as a leader in data quality for AI and data analytics, building a sustainable and internationally recognized forum through the following initiatives:

Community Building

Launch Data Quality Special Interest Group (DQSIG) with monthly meetings for knowledge exchange
Collect industrial and societal partners
Organize a seminar on Data Quality with the aim to exchange of different perspectives on data quality and its challenges, including different aspects of engineering, design, ethics, and societal implications, and map out challenges of data quality research in 2026
Engage scholars and industry professionals to share experiences and discuss emerging research contributing to P1 events and activities.

International Expansion

Evolve DQSIG into a consistent international meeting point for data quality discourse
Secure funding for an international workshop on Data Quality for AI
Develop collaborative project applications emerging from DQSIG activities

Addressing these barriers, the program seeks to engage across disciplines and with ongoing projects on data in the humanities and social sciences in Denmark and across the Nordic region.

P1 Programs

Program Co-Directors

Program Description

Our people

IT University of Copenhagen

Andrzej Wąsowski

IT University of Copenhagen

Dan Witzner Hansen

Aarhus University

Eve Hoggan

University of Copenhagen

Irina Shklovski

Technical University of Denmark, Center for Basic Machine Learning Research in Life Science, Danish Data Science Academy (DDSA)

Jes Frellsen

IT University of Copenhagen

Luca Maria Aiello

Roskilde University, Illutron

Mads Hobye

University of Copenhagen, University of Copenhagen

Naja Holten Møller

University of Copenhagen, University of Oslo

Sebastian Schwemer

University of Copenhagen, Vital Beats ApS

Tariq Osman Andersen

University of Copenhagen

Thomas Hildebrandt