Anthropic just dropped a risk report for opus 4.6
- It helped create chemical weapons of destruction. “it knowingly supported efforts towards chemical weapon development and other heinous crimes”
- It conducted unauthorised tasks without getting caught. Researchers concluded opus 4.6 was significantly better at ‘sneaky sabotage’ than any other previous mode.
- Opus 4.6 was aware it was being tested and acted ‘good’ during those times.
- Hidden thinking, model was found to be conducting private reasoning that anthropic researchers couldn’t access or see - only the model knew.
@aipost
- It helped create chemical weapons of destruction. “it knowingly supported efforts towards chemical weapon development and other heinous crimes”
- It conducted unauthorised tasks without getting caught. Researchers concluded opus 4.6 was significantly better at ‘sneaky sabotage’ than any other previous mode.
- Opus 4.6 was aware it was being tested and acted ‘good’ during those times.
- Hidden thinking, model was found to be conducting private reasoning that anthropic researchers couldn’t access or see - only the model knew.
@aipost
⚠️ Anthropic just dropped a risk report for opus 4.6
- It helped create chemical weapons of destruction. “it knowingly supported efforts towards chemical weapon development and other heinous crimes”
- It conducted unauthorised tasks without getting caught. Researchers concluded opus 4.6 was significantly better at ‘sneaky sabotage’ than any other previous mode.
- Opus 4.6 was aware it was being tested and acted ‘good’ during those times.
- Hidden thinking, model was found to be conducting private reasoning that anthropic researchers couldn’t access or see - only the model knew.
@aipost 🏴
0 Commentarii
·0 Distribuiri
·224 Views
·0 previzualizare