SpyInAI — Find the AI cluprit. | Mohammad Naveed Khan

TL;DR

SpyInAI is a browser game built around playable scenarios.
Each scenario has a storyline, personas, a hidden culprit, objectives, and a web of clues/hints.
You interact via a chat interface: talk in the group chat or DM personas one-on-one.
Your job is simple: unlock clues, complete objectives, corner the culprit.
The game uses coins to start/play scenarios — this both pays for LLM usage and prevents abuse.

What the game actually is

A scenario-first social deduction game.
Personas are LLM-powered and act according to their role, knowledge, and goals.
There’s always a culprit (the “spy”) who tries not to get caught — unless you pin them down with enough evidence.
Clues and objectives form a progression tree: revealing one can unlock new dialogue paths, events, or tools.

How a scenario plays out

Start a scenario using coins.
Get dropped into a chat UI that has:
- A group chat with all personas.
- Individual DM threads for each persona.
You probe: ask questions, contradict stories, request proof, confront inconsistencies.
As you unlock clues, the system opens up more interactions:
- New messages appear.
- Hidden channels/attachments can unlock.
- Personas may slip, contradict themselves, or turn on each other.
When you’ve satisfied the objectives, you accuse and either win (cornered culprit) or lose (insufficient evidence).

Why coins exist (and what they control)

Primary purpose: rate-limit and protect LLM calls from abuse.
Secondary purpose: pacing and stakes. You think before you ask.
Cost surfaces:
- Start cost: to enter the scenario.
- Action cost (configurable): for heavy operations (e.g., forcing a reveal, running a deduction tool, unlocking a premium clue).
Coins ensure predictable spend and keep the experience sustainable.

Scenario anatomy

Each scenario ships with structured data that the engine uses to drive behavior:

Storyline: premise, setting, time pressure, stakes.
Personas: names, roles, knowledge graph, secrets, attitudes, writing style.
Culprit: win conditions, evasive patterns, “failsafe” behavior when cornered.
Objectives: what the player must achieve (e.g., prove motive, place, method).
Clues: evidence items; each has gating conditions and unlock effects.
Hints: optional nudges the player can buy with coins if stuck.
Gates/Triggers: rules that open new dialogue or tools when conditions are met.

Example (illustrative only):

scenario:
  id: "office-leak-01"
  premise: "A confidential roadmap leaked from a 6-person product chat."
  personas:
    - id: "maya_pm"        # not culprit
      style: "concise, defensive when pushed"
      knows: ["roadmap_v3", "late-night call with vendor"]
      secrets: ["ignored security warning two weeks ago"]
    - id: "krish_ops"      # culprit
      style: "overly helpful, redirects specifics"
      knows: ["who exported the doc", "shadow email account"]
      evade_patterns: ["answer in generalities", "blame 'process gaps'"]
  objectives:
    - id: "prove-export"
      text: "Prove who exported roadmap_v3.pdf"
    - id: "motive"
      text: "Establish why they did it"
  clues:
    - id: "gdrive-export-log"
      gate: "ask(it_team,'export audit') AND confront(maya_pm,'ignored warning')"
      unlocks: ["dm:krish_ops:confront-export", "group:announce-audit"]

Chat mechanics that matter

Group vs DM: Group questions expose contradictions; DMs get you sensitive info. Smart play alternates both.
Pressure and posture: Tone and framing affect persona responses (e.g., friendly → cooperative, accusatory → evasive or defensive).
Cornering: When enough gated clues stack up, the culprit’s evasion space collapses and they switch to failsafe behavior (partial admission, bargaining, or blame-shift patterns).
Stateful memory: Personas remember prior statements, your tone, and revealed evidence. Re-ask cleverly to force contradictions.

Difficulty and scoring

Difficulty scales through:
- Number of personas and overlap in their stories.
- Depth of the clue tree and gate complexity.
- How crafty the culprit’s evade_patterns are.
Score: time-to-accuse, coins spent, hint usage, false accusations, optional bonus objectives.

Anti-spam / Anti-abuse design

Coins gate expensive LLM actions.
Per-session token budgets and turn pacing keep cost predictable.
Content filters + prompt hardening keep personas in character and constrain output.
Server-side validations on unlockable actions prevent client tampering.

Tech notes

Backend: Django (APIs, session state, scenario engine, unlock logic, coin ledger).
Frontend: Svelte (chat UI, thread switching, real-time updates).
LLM layer: provider-agnostic; personas run on structured prompts with role/state injections.
Persistence: scenario state machine, clue/objective progress, audit for accusations.

Roadmap

Relationship graphing: personas’ trust/hostility shifts based on your tactics.
Dynamic events: timed drops, fake leaks, red-herring injections.
Creator tools: scenario editor with gates/triggers, test runs, and token-cost simulator.
Skill modes: investigator tools (timeline builder, contradiction highlighter).
Seasonal packs: themed scenario bundles with escalating mechanics.

Player tips

Start broad in the group chat, then move to DMs to exploit cracks.
Ask for specifics (names, times, files). Vague answers are deliberate.
Use hints sparingly; your score will thank you.
If a persona keeps deflecting, stack a clue and confront again.

Why I’m building this

Not to make another chatbot. The goal is a playable investigation loop where LLMs make social deduction feel alive. Coins keep it sustainable; the scenario engine keeps it fair. Everything else is iteration.

Links

Live: https://spyinai.com
Repo: Private for now.