GROKFAST: A Machine Studying Method that Accelerates Grokking by Amplifying Sluggish Gradients

GROKFAST: A Machine Studying Method that Accelerates Grokking by Amplifying Sluggish Gradients

Grokking is a newly developed phenomenon the place a mannequin begins to generalize nicely lengthy after it has overfitted to the coaching knowledge. It was first seen in a two-layer Transformer skilled on a easy dataset. In grokking, generalization happens solely after many extra coaching iterations than overfitting. This requires excessive computational assets, making it…