-
To Augment or Not to Augment? Diagnosing Distributional Symmetry Breaking
Many popular ML datasets are heavily canonicalized — objects almost always appear in the same orientation. We measure this with a simple classifier test, showing theoretically that canonicalization can cause data augmentation to hurt performance. We give practitioners a flowchart for diagnosing their own datasets.