LLM training data mixture optimization breaks when training pools shift — every prior proxy experiment becomes stale.
For months, he and his team had watched the snake using a transmitter and a trail camera. “I’m just kind of following this ...