Shiva Kaul Classical Improvements to Modern Machine Learning Degree Type: Ph.D. in Computer Science Advisor(s): Geoffrey Gordon Graduated: August 2024 Abstract: Over the past decade, a large rift has grown between classical and modern machine learning. The predictive performance of modern learning is incomparably better, but it is easier to analyze and guarantee safety, efficiency, fairness, and other properties of classical learning. In this dissertation, I investigate when it is possible to restore such desiderata to modern machine learning by prudently and strategically incorporating classical techniques. I form syntheses between classical and modern learning which can be categorized according to two high-level strategies: (1) wrapping, in which reliable performance guarantees are extracted from modern, opaque models via classical analytic techniques, or (2) swapping, in which some components of a modern model are rebuilt from classical primitives in a way which improves overall efficiency, tractability, and/or expressivity. These efforts lead to new developments in multiple areas of machine learning.The most important contribution in this dissertation pertains to meta-analysis, a structured form of question-answering which serves as the foundation of evidence-based medicine. Classic meta-analytic techniques are based upon randomized, controlled trials, whose causal validity is trusted; by contrast, modern regression models are trained upon large observational databases whose causal validity is untrusted. I show it is possible by incorporate the untrusted data into meta-analysis without sacrificing validity. This involves basic improvements to full conformal prediction which are of general interest. In a separate, more focused, application to healthcare, I generalize classic, handcrafted heart-rate variability statistics so they can be fine-tuned, via supervised learning, as part of a deep neural network. This leads to more accurate, physiologically-informed models.I also present foundational computational primitives that can be used within future machine learning models and algorithms. The first is an algorithm to (approximately) run nonlinear RNNs for T steps in just O(log T) parallel time. A key innovation of this algorithm is replacing nonlinearity across time by nonlinearity along depth through a provably-consistent scheme of local, parallelizable corrections. Inthis manner, classical linear dynamical systems (also known as state-space models) can be stacked to form fast, nonlinear sequence models. Another new computational primitive is gradient-based optimization over the set of all sequences of orthogonal polynomials. This optimization formulation has connections to many different problems in signal processing and optimization. Finally, I propose fairness criteria that circumvent computational intractability, based upon the geometric notion of margin used throughout learning theory and optimization. Thesis Committee: Geoffrey Gordon (Chair) Zachary Lipton Aditi Raghunathan Ryan Tibshirani (University of California, Berkeley) Srinivasan Seshan, Head, Computer Science Department Martial Hebert, Dean, School of Computer Science Keywords: Machine learning, conformal prediction, healthcare, linear dynamical systems, orthogonal polynomials, fairness CMU-CS-24-137.pdf (5.29 MB) ( 206 pages) Copyright Notice