P1 Programs

Data Quality for AI

Program Co-Directors

Program Description

The problem of data quality has increasingly become a topic of concern across computer science, both within and beyond AI research, and is ever more acute as AI systems integrate deeper into every aspect of society.

With the growing awareness that data quality can have differential effects on machine learning performance, there is little research so far on how and why. Considerations of data quality in AI research struggle with the fundamental challenge of assessing how well data represent real-world concepts. As a result, approaches to data quality within computer science focus on assessment and mitigation techniques once datasets are produced, relying on algorithmic approaches based on statistical methods. Yet datasets and their flaws are deeply situated in the contexts of their creation, while there is broad agreement that high-quality data is urgently needed to create better, more responsible and more precise AI systems.

Within AI research, the conversation around data quality has been spurred on by the EU AI Act due to the legal requirements for high-risk AI systems. The EU AI Act attempts to redress the inherent gap between training datasets and operational data by requiring that high-risk AI systems must be trained on high-quality data, involve human supervision, and monitor the quality of input data. However, without guidance on what constitutes a “good” representation, ensuring data quality in creation and assessing it post-hoc is a challenge, and existing metrics are difficult to apply as current definitions have not moved beyond "fit for purpose" developed in the mid-1990's.

Despite the Nordic region's leading position in responsible and ethical development of AI systems, the Nordic State of AI report from 2024 lists data quality as a key barrier to increasing adoption of AI.

The P1 Data Quality program aims to be part of answering the challenge of data quality, with the ambition to establish common ground across disciplines, academia, and industry around definitions, meanings, and implications. Furthermore, the objective is to position Denmark as a leader in data quality for AI and data analytics, building a sustainable and internationally recognized forum through the following initiatives:

Community Building

  • Launch Data Quality Special Interest Group (DQSIG) with monthly meetings for knowledge exchange 
  • Collect industrial and societal partners
  • Organize a seminar on Data Quality with the aim to exchange of different perspectives on data quality and its challenges, including different aspects of engineering, design, ethics, and societal implications, and map out challenges of data quality research in 2026
  • Engage scholars and industry professionals to share experiences and discuss emerging research contributing to P1 events and activities. 

International Expansion 

  • Evolve DQSIG into a consistent international meeting point for data quality discourse
  • Secure funding for an international workshop on Data Quality for AI 
  • Develop collaborative project applications emerging from DQSIG activities 


Addressing these barriers, the program seeks to engage across disciplines and with ongoing projects on data in the humanities and social sciences in Denmark and across the Nordic region.