Trustworthy AI: Theory and Practice

Course ID 15783

Description As AI systems become more capable and widely deployed, ensuring their reliability, robustness, and alignment with human intent is critical. This advanced seminar explores the principles behind building trustworthy AI, with a focus on both theoretical foundations and empirical guarantees. We will examine key challenges such as robustness to distribution shifts, adversarial attacks, data poisoning, privacy risks, and jailbreaks, as well as broader concerns in AI alignment and governance. Through a mix of foundational papers and recent advances, the class will investigate recurring themes across security, robustness, and alignment, drawing connections to classical machine learning principles and modern scaling trends. Discussions will emphasize not only what works but also why it works (or fails)—aiming to equip students with the conceptual tools to critically assess current methods and develop principled approaches for trustworthy AI. This course is designed for students interested in both theoretical insights and practical implications, bridging research in machine learning, security, and AI alignment to address some of the most pressing challenges in modern AI development. " Through a mix of foundational papers and recent advances, the class will investigate recurring themes across security, robustness, and alignment, drawing connections to classical machine learning principles and modern scaling trends. Discussions will emphasize not only what works but also why it works (or fails)—aiming to equip students with the conceptual tools to critically assess current methods and develop principled approaches for trustworthy AI. This course is designed for students interested in both theoretical insights and practical implications, bridging research in machine learning, security, and AI alignment to address some of the most pressing challenges in modern AI development.

Key Topics
Robustness, security, alignment of modern machine learning models such as Large Language Models

Required Background Knowledge
Familiarity with machine learning, deep learning, probability, linear algebra

Course Relevance
Doctoral students, CS majors after taking AI/ML courses

Course Goals
An in-depth grasp of the principles and challenges involved in building frontier AI systems that are reliable, robust, and aligned with human values.

Learning Resources
n/a

Assessment Structure
40% homework, 20% midterm, 30% course project, 10% participation

Extra Time Commitment
n/a