Event
P1 Guest Talk on Perspectives on Multimodal Reasoning

Location
Date
Title
Perspectives on Multimodal Reasoning
Abstract
The widespread success of large multimodal models (LMMs) in tasks such as image captioning, visual question answering, and path planning, has captured the imagination of researchers and the general public. At the same time, techniques to elucidate the knowledge representation, bias, exam performance, and reasoning capabilities of LMMs have been extensively explored in the past couple of years. There are two major perspectives to this research area. The first is the model perspective, where one seeks to understand world knowledge as represented in the model. The second perspective is that of data – (i) How to create simple synthetic data that can derive universal laws for LMMs, (ii) How to diagnose existing datasets for possible spurious correlations and bias. These two perspectives are not necessarily mutually exclusive, and this talk will attempt to explore a novel co-dependent perspective of model and data in the LMM reasoning framework.
Bio
Rwiddhi Chakraborty is a final year PhD student at the Machine Learning Group, University of Tromso (UiT). His research interests include representation learning with limited labels, model and data diagnosis, and reasoning capabilities of large multimodal models. Previously, he has spent time in Lugano, Switzerland as a Masters student at USI (University of Lugano), and as a doctoral research visitor at the Human Sensing Lab, Carnegie Mellon University.