Everything on this page is what a serious buyer reviews during due diligence. Methodology is not proprietary — it is peer-reviewed and published. Any credentialed researcher can reproduce every confirmed finding.
Every stage is logged, hashed, and replayable.
Every published claim about a condition × subgroup is encoded with its source, its cohort definition, and its reported effect size. Every observed outcome is encoded the same way. Contradictions surface where the same quantity, under the same definition, diverges by more than a calibrated threshold.
The engine does not trust one source. Claims are cross-referenced against meta-analyses, Cochrane reviews, clinical practice guidelines, and the primary literature they cite. An "established" estimate is only established if it triangulates across at least three independent sources with compatible cohort definitions.
The comparator side — observed outcomes — goes through cohort-definition audit, care-process leakage detection, and label-stability testing before any contradiction is flagged. A finding does not leave the engine if the comparator is an administrative artifact.
Multi-agent adversarial debate, seven patents pending. Every candidate finding is argued for, argued against, and judged, all by independent specialist twins.
Unanimous consensus is required for a Confirmed label. Split decisions are held in the Exploratory tier and timestamped on OSF pending replication. If the adversary wins outright, the finding is discarded and the engine learns from the refutation — adversary wins become new contradiction-detector training signal.
Every artifact in the pipeline — raw cohort, analysis script, twin debate transcript, replication result, final verdict — is hashed and linked into a Merkle chain. The chain root is signed with Ed25519 and anchored on OSF with a public timestamp.
# Finding f8c6412e-17c2-4234-b93e-da7b622d09ff — MI × Age >70 — attestation header algorithm: Ed25519 evidence_hash: 6bf704b3907a061b signed_by: rocsite-discovery-2026 (fingerprint 9C4A…3F12) osf_project: osf.io/3ws8g osf_timestamp: 2026-04-22T19:34:10Z artifacts: - cohort_extract.parquet (MIMIC-IV n=2,161 deaths=464) - outlier_engine.py (RocSite Discovery Engine) - debate_transcript.json (Advocate / Adversary / Arbiter) - replication_eicu.parquet (eICU-CRD n=4,069 mortality 14.55%) - verdict.json (CONFIRMED · divergence +168.4%) novelty_score: 0.94 consensus: unanimous (advocate: ✓, adversary: conceded, arbiter: confirm)
The attestation is verifiable offline. A regulator, auditor, or reviewer can obtain the artifacts, recompute the hashes, and cross-check the signature without contacting RocSite. The chain is designed to survive the company — findings remain verifiable regardless of our future.
Novelty is not just surprise. It's the weighted combination of how much a finding contradicts the published consensus, how well-established that consensus is, how large the observed cohort is, and how reliably the comparator holds up under adversarial attack.
Magnitude of the gap between observed and published, normalized by the variance of the published estimate.
How well-established the published claim is — triangulated from meta-analyses, guidelines, primary literature.
Log-scaled sample size and event count. Small-n findings rarely cross the auto-cert threshold even when consensus is clear.
How cleanly the Advocate survived the Adversary. Surviving a hard attack adds score. Barely surviving loses it.
Pre-registration happens before the engine touches the data. The protocol — cohort definition, hypothesis, falsification criteria, planned analyses — is posted to OSF with a time-stamped hash.
That hash is what gets referenced in the final evidence chain. It makes it mathematically impossible for us to have reverse-engineered a finding after seeing the data: the timestamp on OSF is earlier than the data extract.
For clients, the pre-registration is held privately until the Confirmed finding is published, then released. Evidence hashes and Merkle roots are part of the public record from day one — only the narrative interpretation is gated on publication.
This is not a slogan. Confirmed findings ship with a minimum-reproducibility bundle: the query that extracts the cohort, the analysis script, the environment lockfile, and the expected outputs with hashes. Clone the repo, run make, compare the digests.
For client-data findings where we cannot republish the raw cohort, we publish the analysis code and the cohort definition (inclusion criteria, code lists, temporal windows), which is enough for an independent group to recreate an equivalent cohort on a different dataset and reproduce the direction and approximate magnitude.
Most teams stop at the marketing. If you made it this far, you care about defensibility. Let's talk about your data, your claims, and what you need to stand behind.
Contact Adam directly →