How It Works — RocSite Discovery

Pipeline — from raw data to certified finding

Every stage is logged, hashed, and replayable.

1 · Ingest

Cohort extraction from source (MIMIC-IV, eICU, or client data). Timestamped and hashed.

2 · Index

51M documents compared against observed outcomes. Contradiction candidates scored 0–1.

3 · Debate

Advocate builds the case. Adversary attacks it. Arbiter judges. Consensus required.

4 · Validate

Replication on independent cohort (eICU if MIMIC first, or vice versa). Novelty score.

5 · Attest

Ed25519 signature. Merkle chain. OSF timestamp. Evidence hash returned to client.

01 · Contradiction engine

The engine that indexes 51M documents and compares them to real patient outcomes.

Every published claim about a condition × subgroup is encoded with its source, its cohort definition, and its reported effect size. Every observed outcome is encoded the same way. Contradictions surface where the same quantity, under the same definition, diverges by more than a calibrated threshold.

The engine does not trust one source. Claims are cross-referenced against meta-analyses, Cochrane reviews, clinical practice guidelines, and the primary literature they cite. An "established" estimate is only established if it triangulates across at least three independent sources with compatible cohort definitions.

The comparator side — observed outcomes — goes through cohort-definition audit, care-process leakage detection, and label-stability testing before any contradiction is flagged. A finding does not leave the engine if the comparator is an administrative artifact.

02 · Twin consensus system

Advocate · Adversary · Arbiter. Every candidate finding runs this gauntlet.

Multi-agent adversarial debate, seven patents pending. Every candidate finding is argued for, argued against, and judged, all by independent specialist twins.

Advocate

Builds the strongest case for the contradiction. Specialist twin drawn from relevant clinical domain.

Adversary

Attacks the finding. Seeks alternative explanations — selection bias, confounding, coding artifact.

Arbiter

Judges the exchange. Requires advocate to withstand adversarial attack before certification.

Unanimous consensus is required for a Confirmed label. Split decisions are held in the Exploratory tier and timestamped on OSF pending replication. If the adversary wins outright, the finding is discarded and the engine learns from the refutation — adversary wins become new contradiction-detector training signal.

03 · Cryptographic attestation

Ed25519 Merkle chain. Every finding is timestamped, signed, and auditable.

Every artifact in the pipeline — raw cohort, analysis script, twin debate transcript, replication result, final verdict — is hashed and linked into a Merkle chain. The chain root is signed with Ed25519 and anchored on OSF with a public timestamp.

# Finding f8c6412e-17c2-4234-b93e-da7b622d09ff — MI × Age >70 — attestation header
algorithm:        Ed25519
evidence_hash:    6bf704b3907a061b
signed_by:        rocsite-discovery-2026 (fingerprint 9C4A…3F12)
osf_project:      osf.io/3ws8g
osf_timestamp:    2026-04-22T19:34:10Z
artifacts:
  - cohort_extract.parquet     (MIMIC-IV  n=2,161  deaths=464)
  - outlier_engine.py          (RocSite Discovery Engine)
  - debate_transcript.json     (Advocate / Adversary / Arbiter)
  - replication_eicu.parquet   (eICU-CRD n=4,069  mortality 14.55%)
  - verdict.json               (CONFIRMED · divergence +168.4%)
novelty_score:    0.94
consensus:        unanimous (advocate: ✓, adversary: conceded, arbiter: confirm)

The attestation is verifiable offline. A regulator, auditor, or reviewer can obtain the artifacts, recompute the hashes, and cross-check the signature without contacting RocSite. The chain is designed to survive the company — findings remain verifiable regardless of our future.

04 · Novelty scoring

Findings are ranked 0 to 1.0. Auto-certification requires 0.75 and unanimous consensus.

Novelty is not just surprise. It's the weighted combination of how much a finding contradicts the published consensus, how well-established that consensus is, how large the observed cohort is, and how reliably the comparator holds up under adversarial attack.

Literature divergence

Magnitude of the gap between observed and published, normalized by the variance of the published estimate.

Consensus strength

How well-established the published claim is — triangulated from meta-analyses, guidelines, primary literature.

Cohort power

Log-scaled sample size and event count. Small-n findings rarely cross the auto-cert threshold even when consensus is clear.

Adversarial robustness

How cleanly the Advocate survived the Adversary. Surviving a hard attack adds score. Barely surviving loses it.

Auto-certification threshold: novelty ≥ 0.75, unanimous consensus, independent replication passed. Below threshold = Exploratory tier. No manual override can upgrade a finding to Confirmed — the threshold is enforced in code.

05 · Pre-registration protocol

OSF pre-registration before any analysis runs. Priority claim, immutable timestamp, public audit trail.

Pre-registration happens before the engine touches the data. The protocol — cohort definition, hypothesis, falsification criteria, planned analyses — is posted to OSF with a time-stamped hash.

That hash is what gets referenced in the final evidence chain. It makes it mathematically impossible for us to have reverse-engineered a finding after seeing the data: the timestamp on OSF is earlier than the data extract.

For clients, the pre-registration is held privately until the Confirmed finding is published, then released. Evidence hashes and Merkle roots are part of the public record from day one — only the narrative interpretation is gated on publication.

06 · Reproducibility

MIMIC-IV and eICU are public. Any credentialed researcher can reproduce every confirmed finding.

This is not a slogan. Confirmed findings ship with a minimum-reproducibility bundle: the query that extracts the cohort, the analysis script, the environment lockfile, and the expected outputs with hashes. Clone the repo, run make, compare the digests.

For client-data findings where we cannot republish the raw cohort, we publish the analysis code and the cohort definition (inclusion criteria, code lists, temporal windows), which is enough for an independent group to recreate an equivalent cohort on a different dataset and reproduce the direction and approximate magnitude.

Code on GitHub OSF project Published papers

Pipeline — from raw data to certified finding

1 · Ingest

2 · Index

3 · Debate

4 · Validate

5 · Attest

The engine that indexes 51M documents and compares them to real patient outcomes.

Advocate · Adversary · Arbiter. Every candidate finding runs this gauntlet.

Ed25519 Merkle chain. Every finding is timestamped, signed, and auditable.

Findings are ranked 0 to 1.0. Auto-certification requires 0.75 and unanimous consensus.

Literature divergence

Consensus strength

Cohort power

Adversarial robustness

OSF pre-registration before any analysis runs. Priority claim, immutable timestamp, public audit trail.

MIMIC-IV and eICU are public. Any credentialed researcher can reproduce every confirmed finding.

Still reading? You're our buyer.