blog | Elyssa Hofgard

To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking

Many popular ML datasets are heavily canonicalized — objects almost always appear in the same orientation. We measure this with a simple classifier test, showing theoretically that canonicalization can cause data augmentation to hurt performance. We give practitioners a flowchart for diagnosing their own datasets.

14 min read · March 02, 2026

2026