Asher James Trockman

Mimetic Initialization for Deep Neural Networks

Degree Type: CS
Advisor(s): J. Zico Kolter
Graduated: May 2025

Keywords: Deep Learning, Computer Vision, Convolutional Neural Networks, Vision Transformers, Self-Attention, State Space Models, Multilayer Perceptrons, Initialization

Abstract

While neural network weights are typically initialized randomly from univariate distributions, pre-trained weights often have visually-discernible multivariate structure. We propose a technique called "mimetic initialization" that aims to replicate such structures when initializing convolutional networks (CNNs), Transformers, and State Space Models (SSMs). For CNNs, we handcraft a class of multivariate Gaussian distributions to initialize filters for depthwise convolutional layers; for Transformers, we initialize the query and key weights for self-attention layers such that their product approximates the identity; and for SSMs, we initialize layers to approximate simple linear attention. Mimetic initialization substantially reduces training time and increases final accuracy on various common small-scale benchmarks. Our technique enables us to almost close the gap between untrained and pre-trained Vision Transformers on small datasets like CIFAR-10, achieving up to a 6% gain in accuracy through initialization alone. For convolutional networks like ConvMixer and ConvNeXt, we observe improvements in accuracy and reductions in training time, even when convolutional filters are frozen (untrained) after initialization. For SSMs, mimetic initialization substantially improves generalization abilities on synthetic language tasks like copying and associative recall. Overall, our findings suggest that some of the benefits of pre-training may be explained by it serving as a good initialization, whose structure is simple enough to (at least partially) capture by hand in closed form.

Thesis Committee

J. Zico Kolter (Chair)
Albert Gu
Aditi Raghunathan
Sébastien Bubeck (OpenAI)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Thesis Document

CMU-CS-25-114.pdf (12.83 MB) (117 pages)

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

Asher James Trockman

Mimetic Initialization for Deep Neural Networks

Abstract

Thesis Committee

Thesis Document

At a Glance

Academic Offerings

Admissions

Directory Submenu

People

Explore the Field

What can we help you find?

Asher James Trockman

Mimetic Initialization for Deep Neural Networks

Abstract

Thesis Committee

Thesis Document