Event

Last Fridays Talks: Learning Theory and Optimization

Featured image

Location

Date

Type

Organizer

Each last-Friday-of-the-month, we are hosting the Last Fridays Talks, where one of our seven Collaboratories will present insights from their current work. Join us for a discussion on results and recent papers, followed by some socializing afterwards for everyone who wish to attend. 
 

Talk 1

Machine Learning in Untrusted Environments


Abstract

As machine learning systems scale in both model complexity and data volume, they increasingly rely on distributed algorithms to process massive datasets efficiently. Yet, with this scale comes vulnerability: data from diverse sources may be noisy, corrupted, or adversarial, and distributed computing environments are prone to hardware faults, software bugs, and malicious attacks. If left unaddressed, these issues can significantly undermine the reliability of large-scale learning systems. In this talk, I will discuss how to design robust learning algorithms that remain reliable in the face of such real-world challenges. Focusing on stochastic gradient descent (SGD), the foundation of modern ML optimization, I will present recent advances in robust aggregation methods and examine how robustness interacts with data heterogeneity in distributed settings. I will highlight key distinctions between robust machine learning and classical robust statistics, and conclude with some open problems and research directions at the intersection of theory and practice.


Speaker

Nirupam Gupta 


Talk 2

Learning in an Echo Chamber: Online Learning with Replay Adversary


Abstract

Modern learning systems increasingly retrain on their own past predictions through self-training, feedback loops, or continual deployment, creating the risk of echo chambers that amplify early mistakes. This talk presents a simple model of this phenomenon: online learning in the replay setting. In each round, the learner predicts a label, and the environment either reveals the true label or “replays" an old prediction on the same example. Errors are only counted when true labels are shown, yet replayed mistakes can still distract standard algorithms such as the SOA and the halving algorithm. On the technical side, the work introduces the Extended Threshold Dimension (ExThD) a combinatorial measure that that exactly characterizes learnability in this replay model: the optimal worst-case mistake bound is equal to ExThD, and no algorithm can do better. We also show that replay is strictly harder than classical mistake-bound learning in the sense that, some classes that are easy in the standard model force many errors under replay, and in the proper setting there is a sharp dichotomy: a class is properly learnable under replay if and only if it is (almost) intersection-closed. The results suggest new principles for designing self-training and continual-learning systems that remain robust in their own feedback loops.  The talk is based on a SODA 2026 paper by the same name.


Speaker

Amartya Sanyal