Joint AI Seminar / Doctoral Speaking Skills Talk - Victor Akinwande

— 1:00pm

Location:
In Person - Newell-Simon 3305

Speaker:
VICTOR AKINWANDE , Ph.D. Student, Computer Science Department, Carnegie Mellon University
https://home.victorakinwande.com/

Adapting Vision-Language models with Hypernetworks000

Self-supervised vision-language models trained with contrastive objectives form the basis of current state-of-the-art methods in AI vision tasks. The success of these models is a direct consequence of the huge web-scale datasets used to train them, but they require correspondingly large vision components to properly learn powerful and general representations from such a broad data domain. This poses a challenge for deploying large vision-language models, especially in resource-constrained environments. 

This talk presents an alternate vision-language architecture, called HyperCLIP, that uses a small image encoder along with a hypernetwork that dynamically adapts image encoder weights to each new set of text inputs. 

With a trained HyperCLIP model, we can generate new zero-shot deployment-friendly image classifiers for any task with a single forward pass through the text encoder and hypernetwork. HyperCLIP increases the zero-shot accuracy of SigLIP trained models with small image encoders by up to 3% on ImageNet and 5% on CIFAR-100 with minimal training throughput overhead. 

Presented in Partial Fulfillment of the CSD Speaking Skills Requirement

Event Website:
https://www.cs.cmu.edu/~aiseminar/


Add event to Google
Add event to iCal