Joint AI Seminar / Computer Science Speaking Skills Talk November 7, 2023 12:00pm — 1:00pm Location: In Person - ASA Conference Room, Gates Hillmn 6115 Speaker: ASHER TROCKMAN , Ph.D. Student, Computer Science Department, Carnegie Mellon University http://ashertrockman.com/ Mimetic Initialization for Transformers and Convolutional Networks While neural network weights are typically initialized randomly from univariate distributions, pre-trained weights often have visually-discernible multivariate structure. In recent work, we propose a technique called "mimetic initialization" that aims to replicate such structures when initializing convolutional networks and Transformers. We handcraft a class of multivariate Gaussian distribution to initialize filters for depthwise convolutional layers, and we initialize the query and key weights for self-attention layers such that their product approximates the identity. Mimetic initialization substantially reduces training time and increases final accuracy on various common benchmarks. Our technique enables us to almost close the gap between untrained and pre-trained Vision Transformers on small datasets like CIFAR-10, achieving up to a 6% gain in accuracy through initialization alone. For convolutional networks like ConvMixer and ConvNeXt, we observe improvements in accuracy and reductions in training time, even when convolutional filters are frozen (untrained) after initialization. Overall, our findings suggest that the benefits of pre-training can be separated into two components: serving as a good initialization and storing transferrable knowledge, with the former being simple enough to (at least partially) capture by hand in closed-form. — Asher Trockman is a PhD student at Carnegie Mellon University advised by Zico Kolter. He researches deep learning for vision and deep learning phenomena generally. Presented as part of the AI Seminar Series. Presented in Partial Fulfillment of the CSD Speaking Skills Requirement In Person and Zoom Participation. See announcement. Event Website: http://www.cs.cmu.edu/~aiseminar/ Add event to Google Add event to iCal