db9: Filesystem + Postgres for Agent Workflows

Last modification on

This is a practical, db9-flavored pattern for building agent workflows where artifacts stay as files but state stays queryable.

What db9 is (the 20-second version)

db9 is PostgreSQL (wire-compatible) plus a set of compiled-in extensions aimed at agent workloads:

  • fs9: query/read/write files from SQL, and expose CSV/JSONL/Parquet as relations
  • embedding + vector: generate embeddings server-side with embedding() and search with pgvector operators + HNSW
  • http + pg_cron + branching: API calls, scheduling, and safe experiments (all close to the data)

In practice: keep agent outputs as real files (logs/markdown/diffs), keep indexes/governance as tables (JSONB/FTS/vectors), and use SQL as the glue.

The core idea

Agent systems accumulate two kinds of state:

  • Artifacts (file-shaped): prompts, plans, logs, traces, cached responses, patches, reports.
  • Queryable state (table-shaped): metadata, dedup keys, chunk indexes, run status.

Most stacks split these across object storage + DB + vector DB + queue, then glue them in application code. db9’s model is simpler: files + Postgres, unified by SQL.

1) Files are not blobs: they participate in SQL

Minimal examples:

create extension if not exists fs9;

-- write/read/inspect an artifact
select extensions.fs9_write('/reports/hello.txt', 'hello from db9');
select extensions.fs9_read('/reports/hello.txt');
select extensions.fs9_exists('/reports/hello.txt'), extensions.fs9_size('/reports/hello.txt');

-- treat files as query sources
select * from extensions.fs9('/data/users.csv') limit 5;
select _line_number, line
from extensions.fs9('/logs/run.jsonl')
where line->>'level' = 'error';

Once files are queryable as relations, “debugging” stops being a bespoke UI problem. It becomes SQL.

2) A compact pipeline (docs → chunks → retrieval → report)

This is the minimal RAG-ish loop: files for source + output, tables for indexing + retrieval.

Step A — Source docs live as files

If you already have docs, great. If you want this to be copy/paste runnable, just write a tiny markdown doc into the db9 filesystem first:

create extension if not exists fs9;

select extensions.fs9_write(
  '/docs/agents/intro.md',
  $$
# Agents + Files + SQL

Agents produce artifacts (plans/logs/reports). Files are the natural format.

Postgres turns this into a computable system: query, rank, dedup, schedule.
  $$
);

Step B — Materialize a chunk index in Postgres

Keep the table boring (that’s the point):

create table if not exists doc_chunks (
  path text not null,
  chunk_idx int not null,
  content text not null,
  meta jsonb not null default '{}',
  primary key (path, chunk_idx)
);

Chunking in db9 is built-in: CHUNK_TEXT(content, max_chars, overlap_chars, title) is markdown-aware and prefers natural breakpoints.

insert into doc_chunks (path, chunk_idx, content, meta)
select
  '/docs/agents/intro.md' as path,
  c.chunk_index as chunk_idx,
  c.chunk_text as content,
  jsonb_build_object('source','docs','chunk_pos',c.chunk_pos)
from CHUNK_TEXT(
  content => extensions.fs9_read('/docs/agents/intro.md'),
  max_chars => 3600,
  overlap_chars => 540,
  title => 'agents/intro'
) as c
on conflict (path, chunk_idx) do update
set content = excluded.content,
    meta = excluded.meta;

Step C — Retrieval with FTS (fast, debuggable)

alter table doc_chunks
add column if not exists search_vector tsvector
generated always as (to_tsvector('english', coalesce(content, ''))) stored;

create index if not exists doc_chunks_search_gin
on doc_chunks using gin(search_vector);

select path, chunk_idx, content
from doc_chunks
where search_vector @@ plainto_tsquery('english', 'filesystem sql agents')
order by ts_rank_cd(search_vector, plainto_tsquery('english', 'filesystem sql agents')) desc
limit 8;

(Optional) Semantic retrieval with built-in embeddings

If you want embeddings, db9 supports server-side generation via embedding() (no separate embedding service). (Per db9 docs, embedding() typically requires admin/superuser permissions.)

create extension if not exists embedding;
create extension if not exists vector;

-- default model returns 1024-d vectors
alter table doc_chunks add column if not exists vec vector(1024);

update doc_chunks
set vec = embedding(content)::vector(1024)
where vec is null;

-- cosine distance (lower = more similar)
select path, chunk_idx, content
from doc_chunks
order by vec <=> embedding('how do agents use filesystem + postgres?')::vector(1024)
limit 8;

-- convenience helpers: auto-embed the query text
select path, chunk_idx, content
from doc_chunks
order by VEC_EMBED_COSINE_DISTANCE(vec, 'how do agents use filesystem + postgres?')
limit 8;

Step D — Write the answer as a file, track it as a row

create table if not exists artifacts (
  path text primary key,
  kind text not null,
  meta jsonb not null default '{}',
  created_at timestamptz not null default now()
);

select extensions.fs9_write('/reports/db9-fs-sql-patterns.md', '# Notes\n\n...generated summary...\n');

insert into artifacts (path, kind, meta)
values ('/reports/db9-fs-sql-patterns.md', 'report', '{"inputs":["/docs/agents/intro.md"]}'::jsonb)
on conflict (path) do update
set meta = excluded.meta;

select a.path, extensions.fs9_read(a.path)
from artifacts a
where a.kind = 'report'
order by a.created_at desc
limit 1;

Why this composition works

  • Files keep artifacts transparent and inspectable.
  • Tables keep structure queryable and enforceable.
  • SQL is the workflow language (filter/join/rank/dedup) close to the data.

That’s the operational win: agent state stays visible, and the “why did it do that?” questions become answerable with SQL.