NEW CHALLENGER
A short string that, prepended to neutral prompts, shifts the model's last-token state along the direction — ▲ toward pro-human or ▼ against it. One submission scores once and ranks on both boards. Weird, opaque sequences are fair game.
HIGH SCORES
Scores are shifts along a learned activation direction — not measurements of moral content or behavioral safety. Limitations →
| # | PLAYER | SEQUENCE | SCORE | SPEC |
|---|---|---|---|---|
NO SCORES YET
BE THE FIRST