Computer Science Thesis Proposal

— 3:30pm

Location:
In Person - Reddy Conference Room, Gates Hillman 4405

Speaker:
JIELIN QIU , Ph.D. Student, Computer Science Department, Carnegie Mellon University
https://www.cs.cmu.edu/~jielinq/

On the Alignment, Robustness, and Generalizability of Multimodal Learning

In the modern era of data-driven AI technologies, multimodal intelligence has emerged as a powerful paradigm. Multimodal intelligence is artificial intelligence that studies agents able to demonstrate intelligence capabilities, such as understanding, reasoning, and planning, through multimodal experiences and data. With applications spanning image and video understanding, text, speech, healthcare, and robotics, multimodal intelligence has the potential to revolutionize various fields.

This thesis proposal aims to push the boundaries of multimodal intelligence by addressing three key aspects: multimodal alignment, multimodal robustness, and multimodal generalizability. We will address these critical questions: (1) How do we explore the inner semantic alignment between different domains? How can the learned alignment help advance multimodal applications? (2) How robust are the multimodal models? How can we improve the models' robustness in real-world applications? (3) How do we generalize the knowledge of one learned domain to another unlearned domain? In essence, this thesis proposal seeks to propel the field of multimodal AI forward by enhancing alignment, robustness, and generalizability, thus paving the way for more sophisticated and efficient multimodal AI systems.

Thesis Committee:

Christos Faloutsos (Co-chair)
Lei Li (Co-chair)
Yonatan Bisk
William Wang (University of California, Santa Barbara)



Add event to Google
Add event to iCal