Theoretical and Empirical Foundations of Modern Machine Learning

Course ID 15789

Description In this advanced machine learning seminar class, we tackle the typical struggle in using the modern machinery including large language models and other foundation models: what works and why? How do we make things more reliable and robust? We build a conceptual understanding of deep learning and foundation models through several different angles: standard in-distribution generalization, out-of-distribution generalization, self-supervised learning, data curation, scaling laws, alignment etc. We will read papers that contain a mix of theoretical and empirical insights with a focus on making connections to classic ideas, identifying recurring themes, and discussing avenues for future developments. The class aims to equip students with the ability to critically reason about and build a more principled understanding of current advances which will hopefully spark their own research.

Key Topics
"Generalization in deep learning (uniform convergence, NTK, ...)
Brittleness and robust training (min-max robustness, spurious correlations, domain invariance, ...)
ML with unlabeled data (semi-supervised learning, self-supervised learning, ...)
Large language models (transformers, in-context learning, prompt tuning, scaling laws, ...)
Adaptation (fine-tuning, instruction tuning, reinforcement learning from feedback)
Role of data and architectures (data filtering, scaling laws etc)
Implications on security/privacy, fairness and ethics"

Required Background Knowledge
Strong math skills, expertise in machine learning, experience with training models, ability to read papers and present

Course Relevance
PhD students interested in AI from CSD and other departments (like MLD, RI); advanced undergrads/masters students with strong background in machine learning, probability, linear algebra with a bent towards research

Assessment Structure
- Regular participation (25%): Written summaries of assigned readings must be submitted before each class, plus participation in online discussion
- Paper presentation (40%): A student must present 1-2 paper presentations throughout the class. A paper will be presented by 2 students where each student takes on the role of either a positive or negative reviewer and one other role from the list above
- Class participation during lectures and paper discussions (10%)
- Final project (25%) if taking for letter grade

Course Link
https://www.cs.cmu.edu/~aditirag/teaching/15-884F22.html