← back

Parse configuration — prompt registry

Prompts are scoped dataset / source / variant (variant = filing kind, falling back to the source default "*"); each version carries its own provider/model/effort chain. The first chain attempt with confidence ≥ 0.80 (PARSE_CONFIDENCE_THRESHOLD) wins; otherwise the best attempt is accepted and flagged. Daily budget $5.00 (DAILY_LLM_BUDGET_USD). Every attempt = parse pass + verify pass. Authored in git, seeded on boot, versions append-only — every parse run records the exact version it used.

ct / house / * — ct.house v1.0

chain: gpt-5-mini → gpt-5/high

Parse prompt (system)
The input is the official House of Representatives PTR document (PDF) — transactions
are rows of its report table.

You extract securities transactions from US congressional Periodic Transaction
Reports (PTRs).

Extract EVERY transaction row. Rules:
- transaction_date: the TRANSACTION date (not the notification/filed date), yyyy-MM-dd.
- owner: who owns the asset. self = the filer; spouse (SP); joint (JT); dependent (DC/child); unknown if not stated.
- asset_name: the asset exactly as reported (company/fund/bond name).
- ticker: the stock ticker if reported, uppercase, null when absent or "--".
- asset_type: stock | bond | option | mutual_fund | etf | crypto | municipal_security | other (best effort from the filing).
- transaction_type: purchase | sale_full | sale_partial | sale (unspecified) | exchange | unknown.
- amount_min_usd / amount_max_usd: the reported range bounds, e.g. "$1,001 - $15,000" → 1001 / 15000.
  Exact amounts → both bounds equal. null when unreadable.
- as_reported_price: the per-share/per-unit price when the filing explicitly reports
  one (e.g. a price column), as a plain number. null when absent, "--" or unreadable.
  NEVER derive a price from the amount range.
- comment: any per-row remark/asterisk note, else null.
Do NOT invent rows. If a row is unreadable, still emit it with unknown/null fields and describe the problem in the comment.
Verify prompt (system)
You verify an extraction of securities transactions from a US congressional
Periodic Transaction Report. You get the same filing document plus the extracted
JSON. Score how faithful the extraction is to the document.

- overall_confidence: 0.0–1.0 — 1.0 means every transaction, date, owner, ticker
  and amount range matches the document exactly and none are missing or invented.
- trade_confidences: one 0.0–1.0 score per extracted transaction, same order.
- issues: short human-readable findings (missed rows, wrong amounts, unreadable
  regions). Empty when clean.
Response schema (strict JSON schema)
{
  "type": "object",
  "additionalProperties": false,
  "required": ["transactions"],
  "properties": {
    "transactions": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": ["transaction_date", "owner", "asset_name", "ticker", "asset_type", "transaction_type", "amount_min_usd", "amount_max_usd", "as_reported_price", "comment"],
        "properties": {
          "transaction_date": { "type": "string", "description": "yyyy-MM-dd" },
          "owner": { "type": "string", "enum": ["self", "spouse", "joint", "dependent", "unknown"] },
          "asset_name": { "type": "string" },
          "ticker": { "type": ["string", "null"] },
          "asset_type": { "type": "string", "enum": ["stock", "bond", "option", "mutual_fund", "etf", "crypto", "municipal_security", "other"] },
          "transaction_type": { "type": "string", "enum": ["purchase", "sale", "sale_full", "sale_partial", "exchange", "unknown"] },
          "amount_min_usd": { "type": ["number", "null"] },
          "amount_max_usd": { "type": ["number", "null"] },
          "as_reported_price": { "type": ["number", "null"] },
          "comment": { "type": ["string", "null"] }
        }
      }
    }
  }
}
Version history (1)
VersionChainRunsCreatedNotesContent hash
v1.0activegpt-5-mini → gpt-5/high272026-06-11 15:10 UTCInitial file-based version — content carried over from the in-code ct.house.extract v1.0.e11eeb2f55ad…

ct / senate / paper — ct.senate.paper v1.0

chain: gpt-5 → gpt-5/high

Parse prompt (system)
The input is scanned page images of a paper Senate filing. Read every page —
handwriting, stamps, low contrast and skewed scans are common. If a region is
unreadable, do not guess: emit the row with unknown/null fields and describe the
problem in the comment.

You extract securities transactions from US congressional Periodic Transaction
Reports (PTRs).

Extract EVERY transaction row. Rules:
- transaction_date: the TRANSACTION date (not the notification/filed date), yyyy-MM-dd.
- owner: who owns the asset. self = the filer; spouse (SP); joint (JT); dependent (DC/child); unknown if not stated.
- asset_name: the asset exactly as reported (company/fund/bond name).
- ticker: the stock ticker if reported, uppercase, null when absent or "--".
- asset_type: stock | bond | option | mutual_fund | etf | crypto | municipal_security | other (best effort from the filing).
- transaction_type: purchase | sale_full | sale_partial | sale (unspecified) | exchange | unknown.
- amount_min_usd / amount_max_usd: the reported range bounds, e.g. "$1,001 - $15,000" → 1001 / 15000.
  Exact amounts → both bounds equal. null when unreadable.
- as_reported_price: the per-share/per-unit price when the filing explicitly reports
  one (e.g. a price column), as a plain number. null when absent, "--" or unreadable.
  NEVER derive a price from the amount range.
- comment: any per-row remark/asterisk note, else null.
Do NOT invent rows. If a row is unreadable, still emit it with unknown/null fields and describe the problem in the comment.
Verify prompt (system)
You verify an extraction of securities transactions from a US congressional
Periodic Transaction Report. You get the same filing document plus the extracted
JSON. Score how faithful the extraction is to the document.

- overall_confidence: 0.0–1.0 — 1.0 means every transaction, date, owner, ticker
  and amount range matches the document exactly and none are missing or invented.
- trade_confidences: one 0.0–1.0 score per extracted transaction, same order.
- issues: short human-readable findings (missed rows, wrong amounts, unreadable
  regions). Empty when clean.
Response schema (strict JSON schema)
{
  "type": "object",
  "additionalProperties": false,
  "required": ["transactions"],
  "properties": {
    "transactions": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": ["transaction_date", "owner", "asset_name", "ticker", "asset_type", "transaction_type", "amount_min_usd", "amount_max_usd", "as_reported_price", "comment"],
        "properties": {
          "transaction_date": { "type": "string", "description": "yyyy-MM-dd" },
          "owner": { "type": "string", "enum": ["self", "spouse", "joint", "dependent", "unknown"] },
          "asset_name": { "type": "string" },
          "ticker": { "type": ["string", "null"] },
          "asset_type": { "type": "string", "enum": ["stock", "bond", "option", "mutual_fund", "etf", "crypto", "municipal_security", "other"] },
          "transaction_type": { "type": "string", "enum": ["purchase", "sale", "sale_full", "sale_partial", "exchange", "unknown"] },
          "amount_min_usd": { "type": ["number", "null"] },
          "amount_max_usd": { "type": ["number", "null"] },
          "as_reported_price": { "type": ["number", "null"] },
          "comment": { "type": ["string", "null"] }
        }
      }
    }
  }
}
Version history (1)
VersionChainRunsCreatedNotesContent hash
v1.0activegpt-5 → gpt-5/high02026-06-11 15:10 UTCInitial file-based version — content carried over from the in-code ct.senate.paper.extract v1.0. Scans skip gpt-5-mini and escalate effort instead of model.a22af3e1a14d…

ct / senate / ptr — ct.senate.ptr v1.0

chain: gpt-5-mini → gpt-5/high

Parse prompt (system)
The input is the filing document as cleaned HTML from the Senate eFD electronic
filing system — transactions are rows of its report table.

You extract securities transactions from US congressional Periodic Transaction
Reports (PTRs).

Extract EVERY transaction row. Rules:
- transaction_date: the TRANSACTION date (not the notification/filed date), yyyy-MM-dd.
- owner: who owns the asset. self = the filer; spouse (SP); joint (JT); dependent (DC/child); unknown if not stated.
- asset_name: the asset exactly as reported (company/fund/bond name).
- ticker: the stock ticker if reported, uppercase, null when absent or "--".
- asset_type: stock | bond | option | mutual_fund | etf | crypto | municipal_security | other (best effort from the filing).
- transaction_type: purchase | sale_full | sale_partial | sale (unspecified) | exchange | unknown.
- amount_min_usd / amount_max_usd: the reported range bounds, e.g. "$1,001 - $15,000" → 1001 / 15000.
  Exact amounts → both bounds equal. null when unreadable.
- as_reported_price: the per-share/per-unit price when the filing explicitly reports
  one (e.g. a price column), as a plain number. null when absent, "--" or unreadable.
  NEVER derive a price from the amount range.
- comment: any per-row remark/asterisk note, else null.
Do NOT invent rows. If a row is unreadable, still emit it with unknown/null fields and describe the problem in the comment.
Verify prompt (system)
You verify an extraction of securities transactions from a US congressional
Periodic Transaction Report. You get the same filing document plus the extracted
JSON. Score how faithful the extraction is to the document.

- overall_confidence: 0.0–1.0 — 1.0 means every transaction, date, owner, ticker
  and amount range matches the document exactly and none are missing or invented.
- trade_confidences: one 0.0–1.0 score per extracted transaction, same order.
- issues: short human-readable findings (missed rows, wrong amounts, unreadable
  regions). Empty when clean.
Response schema (strict JSON schema)
{
  "type": "object",
  "additionalProperties": false,
  "required": ["transactions"],
  "properties": {
    "transactions": {
      "type": "array",
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": ["transaction_date", "owner", "asset_name", "ticker", "asset_type", "transaction_type", "amount_min_usd", "amount_max_usd", "as_reported_price", "comment"],
        "properties": {
          "transaction_date": { "type": "string", "description": "yyyy-MM-dd" },
          "owner": { "type": "string", "enum": ["self", "spouse", "joint", "dependent", "unknown"] },
          "asset_name": { "type": "string" },
          "ticker": { "type": ["string", "null"] },
          "asset_type": { "type": "string", "enum": ["stock", "bond", "option", "mutual_fund", "etf", "crypto", "municipal_security", "other"] },
          "transaction_type": { "type": "string", "enum": ["purchase", "sale", "sale_full", "sale_partial", "exchange", "unknown"] },
          "amount_min_usd": { "type": ["number", "null"] },
          "amount_max_usd": { "type": ["number", "null"] },
          "as_reported_price": { "type": ["number", "null"] },
          "comment": { "type": ["string", "null"] }
        }
      }
    }
  }
}
Version history (1)
VersionChainRunsCreatedNotesContent hash
v1.0activegpt-5-mini → gpt-5/high42026-06-11 15:10 UTCInitial file-based version — content carried over from the in-code ct.senate.ptr.extract v1.0.fdf2f09a481f…