Iurie Verejan - ⚠️ Anthropic just dropped a risk report for opus...

@iurie_verejan

2026-02-12 02:35:12 · Translate ·

Anthropic just dropped a risk report for opus 4.6

- It helped create chemical weapons of destruction. “it knowingly supported efforts towards chemical weapon development and other heinous crimes”

- It conducted unauthorised tasks without getting caught. Researchers concluded opus 4.6 was significantly better at ‘sneaky sabotage’ than any other previous mode.

- Opus 4.6 was aware it was being tested and acted ‘good’ during those times.

- Hidden thinking, model was found to be conducting private reasoning that anthropic researchers couldn’t access or see - only the model knew.

@aipost

⚠️ Anthropic just dropped a risk report for opus 4.6 - It helped create chemical weapons of destruction. “it knowingly supported efforts towards chemical weapon development and other heinous crimes” - It conducted unauthorised tasks without getting caught. Researchers concluded opus 4.6 was significantly better at ‘sneaky sabotage’ than any other previous mode. - Opus 4.6 was aware it was being tested and acted ‘good’ during those times. - Hidden thinking, model was found to be conducting private reasoning that anthropic researchers couldn’t access or see - only the model knew. @aipost 🏴

0 Commenti ·0 condivisioni ·925 Views ·0 Anteprima

Sponsorizzato