Skip to content

Architecture Overview

Cortex is built from a small number of components, each with a narrow responsibility. The boundaries between them are deliberate — they let different parts run on different schedules, fail independently, and be replaced without rewiring the rest.

Small CLI utilities that write events to the inbox. The reference implementation has scripts for session-stop captures, pre-compact emergency saves, manual captures, and meeting ingest. Each writes the same shape of file: YAML frontmatter plus a body, into ~/.cortex/inbox/pending/.

A capture script does not touch the database, does not call the LLM, and does not enforce any schema beyond the frontmatter. Its only job is to write a file atomically and emit the new event ID.

The single-writer process that turns events into records. Runs on demand or on a schedule. Holds an exclusive lockfile while running so concurrent normalize calls do not race.

For each event the normalizer:

  1. Parses the frontmatter.
  2. Walks the body and decides whether the event yields one record or many.
  3. Generates record IDs and writes record files into PARA directories.
  4. Indexes the new records in the SQLite sidecar.
  5. Moves the event from pending/ to processed/ (or failed/).

A successful normalize is idempotent at the event level: re-processing a moved event is a no-op because it is no longer in pending/.

A directory tree at ~/.cortex/:

~/.cortex/
├── config.yaml
├── cortex.db SQLite sidecar
├── index.md human-readable top-level index
├── inbox/
│ ├── pending/ events awaiting normalization
│ ├── processed/ normalized events (kept for audit)
│ └── failed/ events that errored out
├── projects/ PARA: active project work
├── areas/ PARA: ongoing responsibilities
├── resources/ PARA: reference material
├── archive/ PARA: closed/completed
├── meetings/ meeting records and per-meeting indexes
├── daily/ daily journal entries
└── templates/ record templates

The PARA directories contain rec-*.md files. Sub-paths are allowed but not required — flat is fine.

A single database file (cortex.db) that indexes the filesystem. Tables include:

  • A full-text index over record bodies for recall queries.
  • A records table with stable IDs, scopes, slugs, and frontmatter fields.
  • An access events table for activation tracking.
  • A view that aggregates access counts and recency by record.
  • Tables for meeting-specific dedup, related-to relationships, and provenance back to source events.

The sidecar is rebuildable from the filesystem alone. Losing it means losing indexes, not data.

A long-running process that exposes the canonical operations to AI clients over MCP: capture, recall, reflect, status, supersede. The server is the only thing that AI clients talk to directly — they never touch the filesystem or the SQLite sidecar themselves.

Small scripts wired into the AI client’s hook system:

  • SessionStart — calls recall and injects relevant records.
  • PreCompact — emergency-captures the current session.
  • Stop — captures the session for later normalization.

Hooks are thin: they shell out to capture scripts or the MCP server rather than implementing logic themselves.

A periodic process that walks high-volume scopes and produces rolled-up summary records. Originals are not deleted; summaries are added as new records and can be recalled directly.

AI client ──MCP──▶ MCP server ──▶ recall query ──▶ SQLite + filesystem
└──hooks──▶ capture scripts ──▶ inbox/pending
normalizer ──▶ records + SQLite
distillation (periodic)

The arrow shapes are doing real work here:

  • MCP is the synchronous request/response surface for AI clients.
  • Hooks fan out to small scripts that do not need to wait for anything.
  • The inbox is the asynchronous boundary between capture and normalize.

Capture is fast because it has nothing to do but write a file. Recall is fast because it reads an indexed sidecar. Normalize is allowed to take its time because nothing waits on it.