This session we will discuss:|
**NF-Nets: High-Performance Large-Scale Image Recognition Without Normalization**
paper link: https://arxiv.org/abs/2102.06171
EVERYONE SHOULD TAKE THE TIME TO READ THE PAPER IN DETAIL SEVERAL DAYS BEFORE THE MEETUP & ideally also familiarize themselves with any necessary background information & key references (please see below). Everyone should come prepared to participate.
This is an online session. Details will be available to previous attendees in the DLSG Slack channel. Additional attendees by invitation. If you are completely new to the group or not from the Bay Area, please ping Jeff on LinkedIn first - https://www.linkedin.com/in/jeffcoggshall/
Batch normalization is a key component of most image classification models, but it has many undesirable properties stemming from its dependence on the batch size & interactions between examples.
Although recent work has succeeded in training deep ResNets without normalization layers, these models do not match the test accuracies of the best batch-normalized networks, & are often unstable for large learning rates or strong data augmentations.
In this work, we develop an adaptive gradient clipping technique which overcomes these instabilities, & design a significantly improved class of Normalizer-Free ResNets.
Our smaller models match the test accuracy of an EfficientNet-B7 on ImageNet while being up to 8.7x faster to train, & our largest models attain a new state-of-the-art top-1 accuracy of 86.5%.
In addition, Normalizer-Free models attain significantly better performance than their batch-normalized counterparts when fine-tuning on ImageNet after large-scale pre-training on a dataset of 300 million labeled images, with our best models obtaining an accuracy of 89.2%.
Key Prerequisite Papers
* Characterizing signal propagation to close the performance gap in unnormalized ResNets - paper link: https://arxiv.org/abs/2101.08692
* Sharpness-Aware Minimization for Efficiently Improving Generalization - paper link: https://arxiv.org/abs/2010.01412
* Batch Normalization Biases Residual Blocks
Towards the Identity Function in Deep Networks - paper link: https://arxiv.org/abs/2002.10444
* Yannic Kilcher video on NF-Nets: https://www.youtube.com/watch?v=rNkHjZtH0RQ
* Code: https://github.com/deepmind/deepmind-research/tree/master/nfnets
Interesting Optional Reading
* Coming soon - maybe