EscapeMap โ We Built a Cancer Resistance Engine From Public Data Nobody Was Using
A research series on building AI-driven therapeutic intelligence from the graveyard of failed clinical trials.
The Problem Nobody Talks About
Cancer drug development has a dirty secret. About 97% of oncology drugs that enter clinical trials fail. โ Not because the science was wrong. Not because the targets were irrelevant. But because tumors are adaptive. They route around blockades. They find alternate pathways. They escape.
The field calls this acquired resistance. The honest term is escape. A drug works until the tumor figures out a detour โ and then it stops working, and the patient progresses, and the trial reports a negative result, and the dataset gets archived, and the world moves on.
What nobody does is go back into those archived datasets and ask: what exactly happened? What was the molecular state of the tumor right before it escaped? What pathways were activated? What could have predicted this? What would have stopped it?
That is the premise of EscapeMap. And this is the story of what we found when we actually went looking.
What We Set Out to Build
The EscapeMap project has one central hypothesis: tumor escape routes are predictable before they activate. If you can measure the molecular state of a tumor before treatment begins, you can predict which escape route it will use when the drug starts working โ and you can design a countermeasure in advance.
This is different from everything else in the resistance literature. Most resistance research is retrospective: take a tumor that already failed treatment, sequence it, find the mutation. EscapeMap is prospective: take a pre-treatment tumor, score it, predict the failure mode before the first dose is given.
To build this, three things are needed:
- Transcriptomic data from pre-treatment tumors, ideally paired with post-treatment samples from the same patients
- Clinical outcome data โ who responded, who progressed, and when
- A validated escape route taxonomy โ a map of the molecular pathways tumors use to escape specific drug classes
The third item we built from first principles using DepMap synthetic lethality data and published resistance mechanisms. The first two items required us to go hunting. This blog series is the story of that hunt.
The Data Landscape We Inherited
When the project began, the evidence map looked like this:
| Cancer / Treatment | Phase 1 Expression Anchor | Phase 4 Clinical Anchor |
|---|---|---|
| GBM / Temozolomide + Radiation | 1 candidate (unverified) | None |
| CRC / Bevacizumab | 1 candidate (blocked) | None |
| Breast / Taxane | Exhausted โ GEO empty | 5 anchors (pCR vs RD) |
| Ovarian / Platinum | Unanchored | None |
| GBM / TMZ alone | GEO exhausted | None |
| GBM / Radiation | 1 candidate | None |
Phase 4 โ the retrospective clinical validation layer, where EscapeMap predictions get tested against real patient outcomes โ had zero confirmed datasets. Every relevant trial was either in a controlled-access repository behind a 4-to-8 week Data Access Committee review, paywalled in a pharmaceutical archive, or simply not findable.
The CRC/bevacizumab lane was the critical gap. The PACCE trial (NCT00115765) โ a 842-patient Phase III study that definitively demonstrated bevacizumab escape in mCRC โ was the anchor the entire CRC validation framework needed. โ It was not in any public repository. Status: NOT_IN_REPO.
This was the state of the project on May 10, 2026.
Project Data Sphere: The Graveyard Nobody Opened
On May 12, 2026, the team executed a full catalog sweep of Project Data Sphere (PDS), a free, open-access oncology clinical trial data repository operated by the Friends of Cancer Research. โ Unlike dbGaP or EGA, PDS requires no Data Access Committee review, no research proposal, and no institutional approval. Registration is the only gate. An account is active immediately on registration. โ
PDS was known to the team. It had not been systematically swept.
The sweep confirmed 99 downloadable datasets across the catalog. Five were immediately downloaded, totaling 3.4 GB uncompressed:
- PACCE (NCT00115765) โ mCRC, BEV+CHEMO ยฑ panitumumab, N=842
- PRIME (NCT00460265) โ mCRC, FOLFOX ยฑ panitumumab, N~1,183, full RAS panel
- HORIZON III (NCT00384176) โ mCRC, XELOX ยฑ cediranib, N~1,422
- MOSAIC (NCT00275210) โ Stage II/III CRC, adjuvant FOLFOX4, N~2,246
- VELOUR (NCT00561470) โ mCRC second-line, FOLFIRI ยฑ aflibercept, N~1,226
In a single session, Phase 4 went from zero confirmed datasets to approximately 6,900 patients across five Phase III CRC trials, covering three distinct escape route classes.
What the PACCE Data Said
The PACCE trial was the first dataset processed. The result was immediately verifiable: the PACCE negative result โ panitumumab added to bevacizumab plus chemotherapy showed no benefit and numerically worse outcomes in KRAS wild-type patients โ was reproduced from raw individual patient data.
The confirmed endpoint schema: OS measured by DTHDY/DTH, PFS by PFSDYCR/PFSCR, KRAS status by the KRAS column, treatment arm by TRT. All 842 patients accounted for with perfect match.
The Kaplan-Meier results from raw data:
| KRAS Status | Endpoint | BEV+CHEMO | PANI+BEV+CHEMO | p-value |
|---|---|---|---|---|
| Wild-type | OS | 27.9 mo | 23.4 mo | 0.23 |
| Wild-type | PFS | 12.4 mo | 10.9 mo | 0.11 |
| Mutant | OS | 22.3 mo | 21.9 mo | 0.87 |
| Mutant | PFS | 11.1 mo | 10.9 mo | 0.51 |
The severe AE rate: BEV+CHEMO 70.3% versus PANI+BEV+CHEMO 86.2%. The panitumumab arm was more toxic and less effective. Toxicity was real. Benefit was absent.
This is the escape route fingerprint in clinical data. The MAPK bypass activated on top of dual blockade. Panitumumab generated real biological disruption โ the AE profile proves this โ but the tumor routed around EGFR suppression through an alternative pathway that remained active regardless of treatment intensification. The gap between toxicity and efficacy is not a drug failure. It is a resistance mechanism working exactly as the EscapeMap taxonomy predicts.
Three Escape Route Classes in Clinical Data
The five PDS datasets together provide clinical evidence for three distinct escape route classes โ all from CRC, all from Phase III trials, all with confirmed OS and PFS endpoints.
Class 1 โ Chemo Resistance Escape (MOSAIC)
MOSAIC enrolled 2,246 Stage II/III CRC patients in the trial that established adjuvant FOLFOX as standard of care. Every patient who relapsed after completing FOLFOX adjuvant therapy is a documented Class 1 chemo escape event. The DFS endpoint captures the escape timeline directly: early relapsers (DFS event within 18 months of FOLFOX completion) represent patients whose resistance mechanisms activated fastest. With 2,246 patients this is the largest dataset in the project, and it gives us a chemo escape timeline in a curative-intent adjuvant setting โ a different and arguably more informative context than metastatic disease.
Class 3 โ Anti-Angiogenic Escape, Three Independent Datasets
The most powerful finding from the PDS sweep is that anti-angiogenic escape can now be examined across three independent trials using three different VEGF-pathway blocking agents:
- Bevacizumab (VEGF-A antibody) โ PACCE: no benefit in KRAS-WT, 70.3% severe AEs
- Cediranib (pan-VEGFR TKI) โ HORIZON III: failed to add benefit to XELOX in mCRC
- Aflibercept (VEGF-A/B + PlGF trap) โ VELOUR: demonstrated OS benefit in second-line mCRC
Bevacizumab and cediranib failed. Aflibercept worked. The mechanistic difference: aflibercept additionally blocks PlGF (Placental Growth Factor), which macrophages use as an alternative pro-angiogenic signal when VEGF-A is blocked. โ When VEGF-A blockade triggers the anti-angiogenic escape route, tumors upregulate PlGF-driven angiogenic signaling as a bypass. Aflibercept's broader coverage suppresses this bypass. Bevacizumab and cediranib do not.
This is a mechanistic escape route story โ three independent datasets, same cancer, same treatment class, different molecular coverage, different clinical outcomes โ provable entirely from clinical IPD without a single RNA-seq read.
Class 2 โ MAPK/EGFR Escape (PRIME)
PRIME provides the definitive extended RAS stratification dataset in mCRC: full KRAS exon 2/3/4 and NRAS exon 2/3/4 mutation status paired with OS/PFS in approximately 1,183 patients treated with panitumumab plus FOLFOX. Different RAS mutation positions represent different states of MAPK pathway pre-activation. KRAS exon 2 mutations (codons 12/13) are the canonical positions, but exon 3/4 and NRAS mutations represent a more heterogeneous landscape of bypass activation. PRIME allows the escape timeline to be mapped by mutation position โ a clinical resolution of MAPK escape that has never been done with this dataset as a resource.
The GSE196576 Parallel Track
While the PDS extraction proceeded, a separate parallel track addressed the transcriptomic anchor for the CRC/bevacizumab escape route.
GSE196576 is the gene expression dataset from CALGB/SWOG 80405 โ the same trial that generated the PACCE clinical context โ containing RNA-seq data from 554 primary mCRC tumors. The key paper analyzing this dataset is Innocenti et al. 2022 (PMID 35176136), which identified four immune features as the dominant OS predictors across treatment arms: M2 macrophage score (HR 6.30, 95% CI 3.0โ12.15), TGF-ฮฒ signature (HR 1.35), plasma cells (HR 0.55), and activated memory CD4+ T cells (HR 0.34). The cumulative impact is staggering: patients with all four beneficial immune features achieved a median OS of 42.5 months, while those with 0โ1 features plummeted to 17.7 months (p=3.48ร10โปยนยน).
But in the bevacizumab arm specifically, only one feature mattered: the M2 macrophage score. Patients with high M2 macrophage infiltration had a median OS of 19.9 months versus 33.9 months in the low-M2 group (HR 4.73, 95% CI 2.18โ9.81, p=9.52e-5). The 4-feature immune model provided no significant improvement over covariates alone in the bevacizumab arm (Delta AICc 1.86, LRT p=0.12) โ confirming M2 is the sole relevant immune predictor in this treatment context.
The full manuscript (nihms-1782972) provided the complete N-at-risk table from Figure 4B:
- Beneficial (low M2) arm: N=150 at t=0, N=104 at t=25, N=30 at t=50, N=7 at t=75
- Non-beneficial (high M2) arm: N=53 at t=0, N=19 at t=25, N=7 at t=50, N=0 at t=75
These exact values โ confirmed from the source table rather than estimated from pixel coordinates โ are the inputs for constrained Guyot IPD reconstruction. The N-at-risk table is superior to digitization because it eliminates coordinate extraction error entirely. The Guyot reconstruction using these constraints is the highest-fidelity approximate IPD available without dbGaP access.
The M2 macrophage escape signal is also confirmed as independent of RAS mutation status (OR=0.58, p=0.10), independent of arm assignment (interaction p=0.74), and enriched in right-sided and transverse tumors. โ It represents an immune-suppressive escape mechanism that operates in parallel to genomic escape โ which is precisely why KRAS testing alone cannot identify the patients who will fail bevacizumab therapy.
What Changed in One Day
On May 11, 2026, Phase 4 of the EscapeMap project had zero confirmed datasets and no path to validation without controlled-access approval that would take months.
On May 12, 2026, Phase 4 has five confirmed Phase III clinical trial datasets covering 6,900 patients, three escape route classes, and one arm-specific transcriptomic signal with a confirmed N-at-risk table for IPD reconstruction.
The total time elapsed: one working session. The total cost: zero. The resource that changed everything: a public repository that had been sitting open the entire time.
This is the central lesson of Part 1. The data problem in cancer research is not fundamentally a scarcity problem. Enormous quantities of high-quality, patient-level clinical trial data from landmark Phase III oncology trials are publicly available with no access barriers. โ The PACCE trial โ which cost tens of millions of dollars to run and established foundational knowledge about bevacizumab escape in CRC โ has been sitting in Project Data Sphere, freely downloadable, since its deposition.
The problem is not scarcity. The problem is that nobody systematically mines these datasets with the specific question: what is the molecular signature of escape, and can it be predicted?
That is exactly the question EscapeMap is built to answer.
What Comes Next
Part 2 of this series will cover the transcriptomic anchor problem: why RNA-seq data from paired pre/post-treatment biopsies is structurally scarce, what we found when we swept every accessible public repository, and the two candidate datasets that survived the quality filter.
Part 3 will cover the DepMap synthetic lethality map: 758 candidate escape route markers, the three that were flagged with severe confounds, and what the cleaned hit list looks like after lineage-restricted re-analysis.
Part 4 will cover the Guyot reconstruction in detail: the methodology, the validation framework, and what it means to use approximate IPD from published N-at-risk tables as a clinical anchor for a transcriptomic scoring engine.
Part 5 will cover the full EscapeMap engine design: how clinical Phase 4 anchors, transcriptomic Phase 1 evidence, and synthetic lethality Phase 2 data are integrated into a deterministic scoring algorithm that predicts escape route activation from a single pre-treatment biopsy.
The graveyard is full of signal. We are going to extract all of it.