Computer Science Thesis Proposal November 8, 2023 2:00pm — 3:30pm Location: In Person - Reddy Conference Room, Gates Hillman 4405 Speaker: JIELIN QIU , Ph.D. Student, Computer Science Department, Carnegie Mellon University https://www.cs.cmu.edu/~jielinq/ On the Alignment, Robustness, and Generalizability of Multimodal Learning In the modern era of data-driven AI technologies, multimodal intelligence has emerged as a powerful paradigm. Multimodal intelligence is artificial intelligence that studies agents able to demonstrate intelligence capabilities, such as understanding, reasoning, and planning, through multimodal experiences and data. With applications spanning image and video understanding, text, speech, healthcare, and robotics, multimodal intelligence has the potential to revolutionize various fields. This thesis proposal aims to push the boundaries of multimodal intelligence by addressing three key aspects: multimodal alignment, multimodal robustness, and multimodal generalizability. We will address these critical questions: (1) How do we explore the inner semantic alignment between different domains? How can the learned alignment help advance multimodal applications? (2) How robust are the multimodal models? How can we improve the models' robustness in real-world applications? (3) How do we generalize the knowledge of one learned domain to another unlearned domain? In essence, this thesis proposal seeks to propel the field of multimodal AI forward by enhancing alignment, robustness, and generalizability, thus paving the way for more sophisticated and efficient multimodal AI systems. Thesis Committee: Christos Faloutsos (Co-chair) Lei Li (Co-chair) Yonatan Bisk William Wang (University of California, Santa Barbara) Add event to Google Add event to iCal