Artifacts

    Course: Real-Time Visual and Machine Learning Systems

    This course explores the principles and applications of real-time visual and machine learning systems. Students will learn how to design, implement, and optimize systems that process visual data and make intelligent decisions in real-time. The curriculum covers key topics such as programming in rust, memory hierarchies, concurrency and data types. The course focus on a mix of hands-on exercises with larger code project in the end. All paths of the material is open source to explore at your own pace.

    Course: Machine Learning Operations

    This course explores a number of coding practices that will help machine learning practitioners to organize, scale, monitor and deploy machine learning models either in a research or production setting. The course focus on hands-on experience with a number of frameworks, both local and in the cloud, for doing large scale machine learning models. All parts of the material is open-source to explore at your own pace.

    MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

    The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that multi-modal pretraining notably improves the linear probing performance, e.g. 4pp on BigEarthNet and 16pp on So2Sat, compared to pretraining on optical satellite images only. We show that this also leads to better label and parameter efficiency which are crucial aspects in global scale applications.