Skip to content

Architecture

KibitzerSession — the core

All of kibitzer's logic lives in KibitzerSession. Hooks, MCP server, and external tools are thin wrappers.

                        KibitzerSession
                    ┌───────────────────────┐
                    │  before_call()        │
                    │  after_call()         │
                    │  validate_calls()     │
                    │  change_mode()        │
                    │  get_suggestions()    │
                    │  get_feedback()       │
                    │  register_tools()     │
                    │  validate_program()   │
                    │  register_docs()      │
                    │  get_doc_context()    │
                    │  get_prompt_hints()   │
                    │  get_correction_hints()│
                    └───────────┬───────────┘
           ┌────────────────────┼────────────────────┐
           │                    │                     │
    Claude Code hooks      MCP server           Python API
    (thin wrappers)        (thin wrapper)       (direct import)
           │                    │                     │
    kibitzer-pre.sh        kibitzer serve      from kibitzer import
    kibitzer-post.sh       ChangeToolMode        KibitzerSession
                           GetFeedback
                    ┌───────────┴───────────┐
                    │                       │
              .kibitzer/              .kibitzer/
              state.json              store.sqlite
              (hot counters)          (event log)

Two persistence stores:

Store Format Purpose
state.json JSON Hot counters — mode, failures, turns, suggestions given. Read/written every hook call.
store.sqlite SQLite Event log — tool calls, denials, mode switches, errors. Append-only, queryable by Riggs via DuckDB ATTACH.

Configuration loads from four tiers (highest priority last):

Tier Source Purpose
1 src/kibitzer/config.toml Package defaults — 6 modes, controller thresholds, plugin settings
2 .kibitzer/config.toml Project-local TOML overrides
3 .kibitzer/policy.duckdb Legacy ducklog policy database
4 .kibitzer/policy.db Umwelt compiled policy (supersedes tier 3)

When an umwelt policy database is present, PolicyConsumer reads resolved mode properties from the cascade and converts them to config dict format via to_config(). This lets policy authors define modes, writable paths, coaching frequency, and transition thresholds in .umw stylesheets.

Hooks and MCP server share state through KibitzerSession. Each hook invocation creates a session (with KibitzerSession(safe_mode=True)), does its work, and saves on exit. The MCP server holds a longer-lived session. External tools like lackpy import KibitzerSession directly.

Hook protocol

Hooks are bash scripts in .claude/hooks/ that pipe stdin to Python. Claude Code sends JSON on stdin and reads JSON from stdout.

PreToolUse receives:

{
  "session_id": "...",
  "tool_name": "Edit",
  "tool_input": {"file_path": "src/foo.py", "old_string": "...", "new_string": "..."},
  "tool_use_id": "..."
}

And outputs one of: - Nothing (exit 0, empty stdout) — allow the tool call - Deny — block the tool call, send reason to agent:

{"hookSpecificOutput": {"hookEventName": "PreToolUse", "permissionDecision": "deny", "permissionDecisionReason": "..."}}
- Suggest — allow the tool call, inject context:
{"hookSpecificOutput": {"hookEventName": "PreToolUse", "additionalContext": "[kibitzer] ..."}}

PostToolUse receives the same fields plus tool_result (Bash: {exitCode, stdout, stderr}, Edit: string or {error: "..."}, etc.). It outputs nothing (allow) or context:

{"hookSpecificOutput": {"hookEventName": "PostToolUse", "additionalContext": "[kibitzer] Mode switched to debug: ..."}}

PreToolUse chain

stdin JSON
    ├── tool_name in {Edit, Write, NotebookEdit}?
    │   └── Path guard: check file_path against mode's writable prefixes
    │       ├── allowed → continue
    │       └── denied → output deny JSON, exit
    └── tool_name == Bash?
        └── Run each interceptor plugin against command string
            ├── no match → exit 0 (allow)
            └── match found → check plugin mode:
                ├── observe → log to intercept.log, exit 0
                ├── suggest → output additionalContext, exit 0
                └── redirect → output deny JSON, exit 0

PostToolUse chain

stdin JSON
    ├── 1. Update counters
    │   ├── total_calls, turns_in_mode, tools_used_in_mode
    │   ├── success_count or failure_count + consecutive_failures
    │   └── Coach counters: edit failures, reads, edits_since_test, last_edit_turn
    ├── 2. Mode controller
    │   ├── Check transition rules (consecutive failures → debug, etc.)
    │   ├── Oscillation guard (don't switch back too quickly)
    │   └── Apply transition if triggered (reset mode-level counters)
    ├── 3. Coach (every N calls)
    │   ├── Discover available tools from .mcp.json
    │   ├── Detect patterns from state (mode-aware)
    │   ├── If fledgling available: query conversation analytics for richer patterns
    │   ├── Filter out already-given suggestions (dedup)
    │   └── Output new suggestions as additionalContext (referencing only available tools)
    └── Save state → .kibitzer/state.json

MCP server

The MCP server runs as a persistent process (via kibitzer serve). It provides two tools:

ChangeToolMode(mode, reason?) — Validates mode exists, updates state, resets counters, returns new mode info. This is how the agent explicitly switches modes (vs. auto-transitions from the mode controller).

GetFeedback(status?, suggestions?, intercepts?) — Returns a combined response with current status, coaching suggestions, and/or the intercept log. All params default true. Reads state.json and intercept.log.

File layout

src/kibitzer/
├── session.py             KibitzerSession + CallResult — the Python API
├── docs.py                DocSection, DocResult, DocRefinement — doc pipeline types
├── failure_modes.py       Shared failure mode taxonomy (7 modes + hint map)
├── store.py               KibitzerStore — SQLite event log (append, query)
├── config.py              Loads config.toml (defaults + project-local merge)
├── state.py               Reads/writes .kibitzer/state.json
├── guards/
│   └── path_guard.py      check_path(file_path, mode_policy) → PathGuardResult
├── interceptors/
│   ├── base.py            InterceptMode enum, Suggestion dataclass, BaseInterceptor
│   ├── registry.py        build_registry() — loads plugins for installed tools
│   ├── blq.py             Test/build command interception
│   ├── jetsam.py          Git command interception
│   └── fledgling.py       Search/navigation interception
├── controller/
│   └── mode_controller.py update_counters(), check_transitions(), apply_transition()
├── coach/
│   ├── observer.py        detect_patterns(state) — mode-aware pattern detection
│   ├── suggestions.py     should_fire(), generate_suggestions() — frequency + dedup
│   ├── fledgling.py       Query fledgling for conversation analytics (Python API + CLI fallback)
│   └── tools.py           Discover available tools from .mcp.json for tailored suggestions
├── hooks/
│   ├── pre_tool_use.py    Thin wrapper: KibitzerSession → before_call → hook output
│   ├── post_tool_use.py   Thin wrapper: KibitzerSession → after_call → hook output
│   └── templates.py       Generates bash hook scripts for .claude/hooks/
├── umwelt/
│   ├── vocabulary.py      register_kibitzer_vocabulary() — properties on state.mode
│   └── consumer.py        PolicyConsumer — wraps PolicyEngine for kibitzer queries
├── mcp/
│   └── server.py          Thin wrapper: delegates to KibitzerSession
├── cli.py                 Click CLI: init, serve
└── config.toml            Default configuration

Grade

Kibitzer is level 1 — specified mutations over structured data:

  • Path guard: reads mode (JSON), checks path against a prefix list, allows or blocks
  • Mode controller: reads counters, compares to thresholds, updates mode
  • Coach: reads counters, detects patterns, formats suggestion strings
  • Interceptors: matches bash commands against string patterns, returns suggestions

No computation channels. No trained judgment. Every decision is traceable to a specified rule in config.toml.