Visipedia Workshop 2024




About Visipedia

On April 12th 2024, we are holding the Visipedia workshop at the Pioneer Centre for AI in Copenhagen. The Visipedia project (visipedia.org) is jointly led by Serge Belongie’s (KU) and Pietro Perona’s (Caltech) groups. Visipedia’s goal, broadly speaking, is to make computer-vision systems that can be queried and used by large communities of experts to help foster the curation and generation of new knowledge. Visipedia’s prior projects have had a socially-good-focus, and have contributed visual systems and statistical backends to the iNaturalist and Merlin Bird ID apps and resulted in popular datasets such as CUB and iNat2021. At this one day workshop we will hear about recent research advances via invited talks and discuss ideas for future progress.



Friday, 12 April

  • 08:30-09:00 Check-in & Coffee
  • 09:00 – 09:15 Introduction by Serge Belongie (University of Copenhagen)
  • 09:15 – 10:00 Short Talks on the state of Visipedia
  • 10:00 – 10:30 Talk by Toke Thomas Høye (Aarhus University)
  • 10:30 – 11:00 Coffee break
  • 11:00 – 11:30 Natalie Iwanycki Ahlstrand & Anders P. Tøttrup (Museum of Natural History) 
  • 11:30 – 12:00 Silvia Zuffi (IMATI-CNR)
  • 12:00 – 13:30 Lunch break
  • 13:30 – 14:00 Talk by Hazel Doughty (Leiden University)
  • 14:00 – 14:30 Talk by Björn Ommer (University of Munich)
  • 14:30 – 15:30 Coffee break/walk
  • 15:30 – 16:00 Talk by Amanda Wasielewski (Uppsala University)
  • 16:00 – 16:15 Wrap-up
  • 16:30 – 18:00 Posters at the Pioneer Centre, Øster Voldgade 3


Toke Thomas Høye (Aarhus University)

Title: Globally standardised species monitoring with insect camera traps and deep learning models

Abstract: With computer vision and deep learning, insect camera traps have become an important tool to improve our understanding of insect responses to environmental change. Through computer eyes, it is potentially possible to effectively, continuously, and non-invasively observe insects throughout diurnal and seasonal cycles and deep learning models can provide estimates of their abundance, biomass, and diversity. I will unpack and visualize the rich and multidimensional data that novel camera-enabled monitoring systems are capable of generating automatically and in a globally standardized manner. Through results from national and continental scale programs deploying insect camera traps, I will provide a glimpse into the insights that can be derived from the trap images and what the future of automated insect monitoring might look like. I will also highlight outstanding challenges and future research avenues to facilitate the broad scale implementation of insect camera traps for day active and nocturnal insects.

Bio: I lead a research group focused on developing and applying novel monitoring technology to questions related to species responses to environmental change. We primarily focus on insects and other invertebrates, where monitoring data is particularly limited and where species responses to environmental change is particularly pronounced. We focus on computer vision based methods, which hold particular promise for global scalability and expert validation. We collaborate widely in interdisciplinary projects.


Natalie Iwanycki Ahlstrand & Anders P. Tøttrup (Natural History Museum, University of Copenhagen)

Title: Citizen Science at the Natural History Museum of Denmark: finding signs of spring, identifying medieval leather, processing environmental DNA, hunting for fossils and more!

Abstract: Our talk will include a brief overview of the Natural History Museum of Denmark including our vision and mission. In particular, we will highlight our experiences in engaging society in collecting, analyzing, and publishing new findings from research-grade natural history data. In addition, we’ll present our experiences in developing apps to facilitate natural history monitoring. Future directions, aspirations, and challenges will also be addressed to encourage dialogue and inspiration related to the application of AI to our research.

Bio: Natalie Iwanycki Ahlstrand is a newly hired assistant professor (tenure-track) at NHMD. She uses natural history specimens and citizen science to study how plants respond to changing climatic conditions and strives to understand what these changes mean for plants and their interactions with animals and society.

Anders P. Tøttrup is an associate professor (Professor-track) and head of section for Science and Society at the Natural History Museum of Denmark. He is an experienced biologist with a specialty in ornithology and in the methods and applications in citizen science. He is currently overseeing a research group of approx. 20 staff and managing 10 citizen science projects in the country on topics ranging from ranging from biology to archaeology.


Silvia Zuffi (IMATI-CNR)

Title: Modelling 3D animals, from 4D scans to language

Abstract: Modelling the 3D shape of animals through parametric, articulated, mesh-based models facilitates solving a set of computer vision problems dedicated to interpreting the wide variety and abundance of visual data we can capture today. In this talk, I will present two recent works where we address modelling animals in diverse scenarios. The first introduces a novel, high-quality, 3D parametric shape model for horses, learned from real 4D scans. The second exploits language to achieve generalization to new species given a multi-species quadruped model.

Bio: Silvia Zuffi is a research scientist at the CNR Institute for Applied Mathematics and Information Technologies in Milan, Italy. She is a member of the Italian National Biodiversity Future Center. She earned her bachelor’s degree in Electronic Engineering from the University of Bologna, Italy, and completed her PhD in computer vision at Brown University in Providence, RI, under the supervision of Michael J. Black. Her research spans various topics, including 3D reconstruction of knee prostheses, color perception, color reproduction in print, colored text readability, human pose estimation, and 3D modelling. Since 2017, her primary focus has been on modeling and estimating the 3D articulated shape of animals for applications in computer vision and graphics.


Hazel Doughty (Leiden University)

Title: Finer-Grained Video Understanding

Abstract: Thanks to revolutions in supervised learning and the emergence of large, labelled video datasets, tremendous progress has been made in video understanding to automatically understand what is happening. However, current algorithms cannot understand more detail such as identifying how actions happen, which is key to achieving the desired outcome. For instance, existing models could recognize someone performing CPR, but fail to identify it needs to be done faster, firmer and further up the body to have a chance of resuscitation.  In this talk, I’ll present the need to use limited labelled data to achieve a finer-grained understanding of video content. Particularly, I’ll focus on our recent works in recognizing adverbs, video self-supervision and adaptation of vision foundation models for low-resource scenarios.

Bio: Hazel Doughty is an assistant professor at Leiden University. Previously she was a postdoctoral researcher at the University of Amsterdam working with Prof. Cees Snoek. She completed her PhD at the University of Bristol in the UK under the guidance of Prof. Dima Damen. During her PhD she also spent several months as a visiting researcher at Inria Paris. Her research, funded by an NWO Veni grant, focuses on video understanding, particularly fine-grained understanding and learning with incomplete supervision.


Björn Ommer (University of Munich)

Title: Beyond diffusion models for visual synthesis

Abstract: The ultimate goal of computer vision and learning are models that understand our (visual) world. Recently, learning such representations of our surroundings has been revolutionized by deep generative models that profoundly change the way we interact with, program, and solve problems with computers. However, most of the progress came from sizing up models – to the point where the necessary resources started to have profound detriments on future (academic) research, industry, and society.

This talk will contrast the most common generative models to date and highlight the very specific limitations they have despite their enormous potential. We will then investigate mitigation strategies beyond Stable Diffusion to significantly enhance efficiency and democratize AI. Building on novel flow matching, we will present a new framework for efficiently translating small diffusion models to high-resolution image synthesis and to rapid, accurate estimation of scene geometry from a single image. Moreover, we will demonstrate how to effectively extract fine-grained, subject-specific semantic directions from a Test2Image model, enabling their utilization without costly optimization or adaption of the diffusion model.

Bio: Björn Ommer is a full professor at University of Munich where he is heading the Computer Vision & Learning Group. Before, he was a full professor in the department of mathematics and computer science at Heidelberg University and a co-director of its Interdisciplinary Center for Scientific Computing. He received his diploma in computer science from University of Bonn, his PhD from ETH Zurich, and he was a postdoc at UC Berkeley.

Björn serves in the Bavarian AI council and as associate editor for IEEE T-PAMI. His research interests include semantic scene understanding and retrieval, generative AI and visual synthesis, self-supervised metric and representation learning, and explainable AI. Moreover, he is applying this basic research in interdisciplinary projects within neuroscience and the digital humanities. His group has published a series of generative approaches, including work known as “VQGAN” and “Stable Diffusion”, which are now democratizing the creation of visual content and have already opened up an abundance of new directions in research, industry, the media, and beyond.


Amanda Wasielewski (Uppsala University)

Title: Computational Formalism Beyond Art History

Abstract: This presentation will address AI image analysis and generation in relation to the study of visual culture. Techniques in computer vision and machine learning today mirror or relate to traditional formalist methods in the discipline of art history. I will discuss how the categorization of artworks in large datasets affect the interpretation of images. AI systems like this are of pressing interest to humanists interested in how such tools shape culture and cultural practice. For both old and new formalisms, issues of categorization are of paramount importance. The distinctiveness that categories, such as style, imply can only be identified by comparison to other distinctive manners of making, meaning that is a highly relative. It is therefore an unstable and slippery foundation upon which to peg mathematical ‘certainty’ in datasets. This talk ultimately addresses what computational formalism means in wider use within visual culture, particularly in scientific studies of images.

Bio: Amanda Wasielewski is Associate Senior Lecturer of Digital Humanities and Associate Professor of Art History in the Department of ALM (Archives, Libraries, Museums) at Uppsala University. Her recent research focuses on the use of artificial intelligence techniques to study and create art, with a particular focus on the theoretical implications of AI-generated images. Wasielewski is the author of three monographs including Computational Formalism: Art History and Machine Learning (MIT Press, 2023).