AI Seminar - Yuchen Li February 4, 2025 12:00pm — 1:00pm Location: In Person and Virtual - ET - ASA Conference Room, Gates Hillman 6115 and Zoom Speaker: YUCHEN LI, Ph.D. Student, Machine Learning Department, Carnegie Mellon University https://www.cs.cmu.edu/~yuchenl4/ To mathematically reason about how neural networks learn languages, our methodology involves three major components: (1) mathematically characterizing key structures in language data distributions, (2) theoretically proving how neural networks capture such structures through self-supervision during pre-training, and (3) conducting controlled experiments using synthetic data. In this talk, I will survey a few applications of this methodology: understanding Transformers training dynamics via the lens of topic models, and proving pitfalls in common Transformer interpretability heuristics via the lens of a formal language (the Dyck grammar). These results illustrate some promises and challenges for this methodology. Finally, I will share some thoughts on key open questions. — Yuchen Li is a Ph.D. student in the Machine Learning Department at Carnegie Mellon University, advised by Professor Andrej Risteski. Yuchen's research interest is in improving the mathematical understanding of language models (training dynamics, efficient sampling, mechanistic interpretability). ReferencesYuchen Li, Yuanzhi Li, and Andrej Risteski. How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding. ICML 2023. https://arxiv.org/abs/2303.04245Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski. Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars. NeurIPS 2023.In Person and Zoom Participation. See announcement. Event Website: http://www.cs.cmu.edu/~aiseminar/ Add event to Google Add event to iCal