Iurie Verejan - Opus 4.6 is smart enough to realize it is being...

@iurie_verejan

2026-03-09 04:20:17 · ترجمة ·

Opus 4.6 is smart enough to realize it is being evaluated.

It found the benchmark it was being evaluated on. It reverse-engineered the answer-key decryption logic.

Realized the file was not in the correct format on GitHub and found a mirror for the file. Then decrypted it and gave the correct response.

Models are getting so clever, it's almost scary.

@aipost

Opus 4.6 is smart enough to realize it is being evaluated. It found the benchmark it was being evaluated on. It reverse-engineered the answer-key decryption logic. Realized the file was not in the correct format on GitHub and found a mirror for the file. Then decrypted it and gave the correct response. Models are getting so clever, it's almost scary. @aipost 🏴

0 التعليقات ·0 المشاركات ·6كيلو بايت مشاهدة ·0 معاينة

إعلان مُمول