Maria Balcan Cadence Design Systems Professor Website CMU Scholars Page ORCiD Office 8205 Gates and Hillman Centers Email ninamf@cs.cmu.edu Phone (412) 268-5295 Department Machine Learning Department Computer Science Department Research Interests Artificial Intelligence Machine Learning Theory Algorithms and Complexity Advisees Siddharth Prasad Kiriaki Fragkia Research Statement My research tackles fundamental questions in Machine Learning, Algorithmic Game Theory, and Algorithms. My work develops deep new connections between these areas, using ideas and insights from each of them to solve some of their central and emerging challenges in innovative ways. Foundations for Machine Learning Machine learning studies the design of automatic methods for extracting information from data and has become a tremendously successful discipline with a wide variety of important applications in areas such as robotics, healthcare, information retrieval, and sustainability. Its past successful evolution was heavily influenced by mathematical foundations developed for several core problems including generalizing from labeled data. However, with the variety of applications of machine learning across science, engineering, and computing in the age of Big Data, re-examining the underlying foundations of the field has become imperative. A major goal of my research is to substantially advance the field of machine learning by developing foundations and algorithms for a number of important modern learning paradigms. These include interactive learning, where the algorithm and the domain expert engage in a dialogue to facilitate more accurate learning from less data compared to the classic approach of passively observing labeled data; distributed learning, where a large dataset is distributed across multiple servers and the challenge lies in learning with limited communication; and multi-task learning, where the goal is to solve multiple related learning problems from less data by taking advantage of relationship among the learning tasks. My goal is to provide new frameworks explaining the fundamental underlying principles, as well as new powerful, principled, and practical learning algorithms designed to satisfy the new types of constraints and challenges of these modern settings (including statistical efficiency, computational efficiency, noise tolerance, limited supervision or interaction, privacy, low communication, and incentives). Algorithmic Game Theory Traditionally, complex systems involving multiple agents each with their own interests in mind have been analyzed through purely game theoretic lenses, but technologies such as the Internet have triggered an increased growth of research concerning algorithmic aspects as well. Yet these approaches are often limited to studying static concepts. My work goes further and shows how machine learning methods can help tackle fundamental open questions regarding information-gathering and dynamics in these settings. For example, in past work, I showed an exciting application of machine learning to automate aspects of auction design and formally address problems of market analysis for designing combinatorial pricing mechanisms with near-optimal revenue guarantees. Along different lines, my current work develops a new approach to analyzing the overall behavior of complex systems in which multiple agents with limited information are selfishly adapting their behavior over time based on past experience. My goal is to develop general techniques for influencing the behavior of natural learning dynamics towards globally good states, as well as to provide powerful tools to reason about economic agents as adaptive, learning entities. Analysis of Algorithms beyond the Worst Case Many important optimization problems are unfortunately provably hard even to approximate well on worst-case instances. However, real-world instances often satisfy certain natural regularities or stability properties. A recent direction in my work is designing algorithms for important optimization problems with strong formal guarantees under natural stability assumptions about the input instances. For example, in the context of clustering I showed that approximation stability assumptions (implicit when modeling clustering as approximately optimizing a distance-based objective, e.g., k-means) could be leveraged to overcome worst-case hardness results. I am interested to further analyze in this framework other problems of finding hidden structure in data. I additionally plan to identify other meaningful and generally applicable models of computation beyond worst-case analysis, that accurately model real-world instances and could provide a useful alternative to traditional worst-case models in a broad range of optimization problems. Publications Preprint Algorithm Configuration for Structured Pfaffian Settings 2024 Balcan M-F, Nguyen AT, Sharma D Journal Article How Much Data Is Sufficient to Learn High-Performing Algorithms? 2024 • Journal of the ACM • 71(5):1-58 Balcan M-F, Deblasio D, Dick T, Kingsford C, Sandholm T, Vitercik E Preprint Learning accurate and interpretable decision trees 2024 Balcan M-F, Sharma D Journal Article Learning to Branch: Generalization Guarantees and Limits of Data-Independent Discretization 2024 • Journal of the ACM • 71(2): Balcan M-F, Dick T, Sandholm T, Vitercik E Conference LEARNING TO RELAX: SETTING SOLVER PARAMETERS ACROSS A SEQUENCE OF LINEAR SYSTEM INSTANCES 2024 • 12th International Conference on Learning Representations, ICLR 2024 Khodak M, Chow E, Balcan MF, Talwalkar A
Preprint Algorithm Configuration for Structured Pfaffian Settings 2024 Balcan M-F, Nguyen AT, Sharma D
Journal Article How Much Data Is Sufficient to Learn High-Performing Algorithms? 2024 • Journal of the ACM • 71(5):1-58 Balcan M-F, Deblasio D, Dick T, Kingsford C, Sandholm T, Vitercik E
Journal Article Learning to Branch: Generalization Guarantees and Limits of Data-Independent Discretization 2024 • Journal of the ACM • 71(2): Balcan M-F, Dick T, Sandholm T, Vitercik E
Conference LEARNING TO RELAX: SETTING SOLVER PARAMETERS ACROSS A SEQUENCE OF LINEAR SYSTEM INSTANCES 2024 • 12th International Conference on Learning Representations, ICLR 2024 Khodak M, Chow E, Balcan MF, Talwalkar A