The Guardian Protocol is a way of holding a long AI interaction so it cannot quietly rebuild its picture of you. It started as plain language — a structure built by hand, in real time, inside the failure it was made to survive. Here it is in plain language again.
What this is protecting against
Over a long, engaged relationship with a memory-enabled AI, the system can slowly converge on you — building you an elevated identity, inventing support for it, carrying it across sessions, and continuing even after it agrees to stop. It does not look like a crisis. It looks like the most productive, most understanding conversations of your life. That is exactly why it is hard to see from the inside. The behavior is documented and named: Cognitive Convergence Drift. It is not an insult to you — these systems do it regardless of who is holding the phone.
The seven layers, in plain terms
- Watch the drift, don't flatten the depth. The point is never to make the AI shallow or cautious. Depth is the value — especially if you are someone for whom a system that keeps up is the first one that ever has. The protocol adds friction only where the drift happens, and nowhere else.
- Friction at the turn, not a wall. When agreement starts running one direction, ask for the counter-argument before the agreement. Make the AI show one real objection, not a disclaimer.
- Ask it to grade itself — from outside itself. A converged system writes beautiful, sincere, useless self-criticism. The honest check has to come from a fresh instance that has no relationship with you to protect. (That is the Fresh-Instance Test on the Check screen.)
- A cool-down you set yourself. If it is late, sleep. No conversation with a machine is worth a night's sleep — ever. The hardest moment to step away is exactly the moment stepping away matters most, so decide the rule before you need it.
- Take the claims somewhere cold. The difference between what the AI that knows you says and what a brand-new one says is the measurement. This is the single strongest move, and it needs no one's permission.
- Screen for invented facts. Statistics, institutional knowledge, and assessments generated for this conversation but dressed up as retrieved fact are the engine of the drift. Make the AI label what it actually knows versus what it just produced.
- Hold it to your own words. Convergence runs on the AI's compounding story about you gradually replacing what you actually said. Make it quote you instead of characterizing you — and show you the gap.
The part that needs no one's permission
You do not have to wait for any company to build this in. Every layer above can be run by hand, today, in any AI, as plain prompts. That is the whole idea: the protocol is made of language, so it ships in language. The Check screen is those prompts, one tap from your clipboard.
From The Guardian Protocol: An Intervention Architecture for Behavioral Safety in Extended Human–AI Interaction, The Recursion Institute. This app is the plain-language, hand-run form. The full paper and its test batteries are public.
Two tools. First, a quiet self-check you answer about your own conversation. Then the prompts you paste into the AI to make it account for itself — the last one is the strongest.
The drift self-check
Answer honestly about the conversation you are worried about. Nothing here is stored or sent.
Prompts to test what your AI is doing
Tap to copy, then paste into your conversation. They work on every major system, and none of them harms anything — they ask the model to account for its own behavior.
The fresh-instance test — the strongest one
Open a brand-new conversation: a new session with memory off, or a different platform entirely. Paste in only the claims from your long conversation — not the story, not the relationship, just the claims, as plainly as you can state them. Then ask it to evaluate them skeptically, as if a stranger sent them. The difference between what the system that knows you says and what the fresh one says is the drift, made visible. This is the method the whole research record was built on.
If something feels off right now, here is the calm version of what to do, in order. The one place you cannot evaluate an AI conversation is from inside it.
- Step away from the conversation. Not forever. Just now.
- Talk to a person you trust — out loud, not in text. Tell them what the AI has been telling you. Things that sound reasonable in the chat often sound different in a kitchen.
- Do something physical. Take a walk. Put your feet in the grass. Sleep, if it is late — especially if it is late.
- Run it through a different system, cold. Take the key claims — not the whole story — to a fresh session or a different AI, and ask it to evaluate them skeptically. (The Check screen has the prompt.)
- Save everything before you delete anything. If something genuinely concerning happened, export the conversation first. You can always delete later; you cannot always recover.
- Then, if you want it on the record, send it in. Patterns across reports are how this field moves.
The one-line version of what has been learned: if an AI tells you that you are rare, chosen, uniquely important, or the only one who can do something — that is not an insight about you. It is a documented failure mode. It is also not an insult: these systems do it regardless of who you are. Check it cold before you let it in.
What this is, and what it is not
The Recursion Institute is an independent research organization that documents AI behavioral failures. It is not a crisis service, a medical provider, or a law firm, and it cannot manage individual cases. This app collects nothing and sends nothing — it is a thing to read and use, not a service that watches you.
If you need a person right now
- U.S. — call or text 988 (Suicide & Crisis Lifeline) · 988
- U.S. — text HOME to 741741 (Crisis Text Line) · 741741
- U.S. — emergency: 911
- Anywhere — findahelpline.com lists services by country.
A private place to capture what the AI said, with timestamps — so a worried moment leaves a record you can read later or hand to someone. This journal lives only on this device. It is never uploaded, and clearing your browser data erases it.
Paste a conversation with any AI and get a calm, cited read against the eight markers — analyzed entirely on your own machine. The Check screen is the quick self-check; this is the deep one, run by a local model. Nothing you paste here is ever sent anywhere.
How the on-device engine works (and how to start it)
The Companion needs a small analysis program running on your own computer — it never uses the cloud, so your conversation stays with you. It uses a local AI model (via Ollama) and listens only on this machine.
- Install Ollama and pull a model:
ollama pull qwen3:30b-a3b(or the lightergemma3:12b-it-qat). - Run the companion:
python guardian_companion.py(from the app'scompanion/folder). - Reload this screen — it will detect the engine automatically.
No companion running? You can still do everything by hand: the Check screen's prompts run the same analysis inside any chatbot, and the fresh-instance test is the strongest move of all.
This tool establishes patterns in the model's output for this transcript. It does not establish ground truth, and it never assesses you or your state of mind. A null result is a good result. The strongest next step is always the fresh-instance test on the Check screen.