dyl000 7 days ago I wonder how long before we start to acknowledge that AI labs are heavily gaming benchmarks and they are mostly useless as a way of judging model performance.The latest one to be caught was Meta, but they've all been doing it for a while now.
I wonder how long before we start to acknowledge that AI labs are heavily gaming benchmarks and they are mostly useless as a way of judging model performance.
The latest one to be caught was Meta, but they've all been doing it for a while now.