Talk on Creative Sensing for People and Robots

Featured image





Creative Sensing for People and Robots 



Large Vision-Language Models trained from internet data collections have reached surprising accuracy at many core perception tasks such as image classification and image segmentation. So where is my robot butler?

It turns out that large pretrained models have a fairly shallow understanding of visual content. Applications with real world impact such as robotics need deeper, 3D understanding of visual data and more attention to non-image data (e.g. lidar, video, thermal, tactile, proprioception, and audio). It is not clear that the diversity of reasoning needed to drive a vehicle or load a dishwasher will emerge on its own with ever larger foundation models. In this talk I will highlight recent works from my lab that use temperature sensing, pressure sensing, and audio sensing to better understand humans and to advance robot capabilities.



James Hays is an associate professor in the School of Interactive Computing at Georgia Institute of Technology since 2015. Previously, he was the Manning assistant professor of computer science at Brown University. He is also the director of perception research at Overland AI, a startup focused on off-road autonomy. He was a principal scientist at self-driving vehicle startup Argo AI from 2017 to 2022. He was a postdoc at Massachusetts Institute of Technology, received his Ph.D. from Carnegie Mellon University, and received his B.S. from Georgia Institute of Technology. His research interests span computer vision, robotics, and machine learning. His research often involves finding new data sources to exploit (e.g. geotagged imagery, thermal imagery) or creating new datasets where none existed (e.g. human sketches, HD maps). He is the recipient of the NSF CAREER award, the Sloan Fellowship, and the PAMI Mark Everingham Prize.