Joint AI Seminar / Doctoral Speaking Skills Talk - Victor Akinwande April 29, 2025 12:00pm — 1:00pm Location: In Person - Newell-Simon 3305 Speaker: VICTOR AKINWANDE , Ph.D. Student, Computer Science Department, Carnegie Mellon University https://home.victorakinwande.com/ Adapting Vision-Language models with Hypernetworks000 Self-supervised vision-language models trained with contrastive objectives form the basis of current state-of-the-art methods in AI vision tasks. The success of these models is a direct consequence of the huge web-scale datasets used to train them, but they require correspondingly large vision components to properly learn powerful and general representations from such a broad data domain. This poses a challenge for deploying large vision-language models, especially in resource-constrained environments. This talk presents an alternate vision-language architecture, called HyperCLIP, that uses a small image encoder along with a hypernetwork that dynamically adapts image encoder weights to each new set of text inputs. With a trained HyperCLIP model, we can generate new zero-shot deployment-friendly image classifiers for any task with a single forward pass through the text encoder and hypernetwork. HyperCLIP increases the zero-shot accuracy of SigLIP trained models with small image encoders by up to 3% on ImageNet and 5% on CIFAR-100 with minimal training throughput overhead. Presented in Partial Fulfillment of the CSD Speaking Skills Requirement Event Website: https://www.cs.cmu.edu/~aiseminar/ Add event to Google Add event to iCal