Sibyl Memory vs Honcho — independent 1:1 benchmark (June 2026)

Run by an independent Sibyl Memory closed-beta tester on a corpus the tester
built. Both systems ingested the same 42,000-record corpus (200 simulated
companies, 600 stakeholders evolving over 180 days) through their own official
SDK, retrieved the top 8 rows per question for a fixed 250-question suite, and
handed that context to Claude Sonnet 4.6 to answer. Same data, questions, model,
and scoring applied to both.

Headline:
  answers correct    Sibyl 243/250 (97.2%)    Honcho 214/250 (85.6%)
  retrieval hit      Sibyl 243/250 (97.2%)    Honcho 219/250 (87.6%)
  context per query  Sibyl 291 tok            Honcho 1,313 tok
  est. answer cost   Sibyl $0.534             Honcho $1.831  (3.4x)

Contents:
  00-README.md             the tester's own summary
  01-sibyl-report.md       Sibyl per-category report
  02-sibyl-raw-results.json    Sibyl per-question raw results
  03-honcho-report.md      Honcho per-category report
  04-honcho-raw-results.json   Honcho per-question raw results
  05-honcho-runner.py      the runner used to produce the Honcho baseline

Synthetic corpus. The tester's OS paths and handle were redacted to "tester".
The runner reads API keys from the environment; none are included.

Write-up: https://blog.sibylcap.com/beta-analysis
Sibyl Labs, LLC
