Lung Cancer Brain Metastasis. Here's What We Found — and Why It Matters for a Cure.
Approximately 20–40% of non-small cell lung cancer (NSCLC) patients develop brain metastases — tumors that have broken away from the primary site, crossed the blood-brain barrier, and colonized the most protected organ in the human body. Once that happens, median survival drops to months.
Lung Cancer Brain Metastasis: A Multi-Dataset Validation Study
Disclaimer: This is a research blog. Nothing here is medical advice. All findings are from publicly available datasets and are for scientific discussion only.
The Problem Nobody Talks About Enough
When lung cancer kills, it usually doesn't kill because of the tumor in the lung. It kills because of what happens next.
Approximately 20–40% of non-small cell lung cancer (NSCLC) patients develop brain metastases — tumors that have broken away from the primary site, crossed the blood-brain barrier, and colonized the most protected organ in the human body. Once that happens, median survival drops to months.
The brain is immunologically distinct, pharmacologically shielded, and biologically alien to the lung cancer cells that arrive there. And yet those cells adapt. They survive. They grow.
The question that launched this project:
What makes a lung cancer cell capable of surviving in the brain?
If we can answer that at the molecular level — if we can identify the specific genes that are switched on or amplified in brain metastases compared to primary lung tumors — then we have targets. And targets are the beginning of cures.
The Assassin Score: Where This Started
CrisPRO's core engine is built around what we call the Assassin Score — a multi-dimensional ranking system that integrates genomic, transcriptomic, proteomic, and functional CRISPR data to identify the most therapeutically actionable genes in a given cancer context.
For the brain metastasis project, we started with 10 candidate genes from a prior CRISPRa (CRISPR activation) screen in a brain-tropic lung cancer model. These weren't random picks. They came from a genome-wide functional screen asking: which genes, when turned on, help lung cancer cells colonize the brain?
The 10 candidates:
| Gene | Gene | Gene | Gene | Gene |
|---|---|---|---|---|
| ATP10D | FAM72A | SLC25A32 | SENP8 | SLC45A4 |
| BACE1 | NOM1 | PTPRC | ROPN1L | TFEB |
Our job was to validate them — not in a cell line, not in a mouse, but in human tissue: real patients, real brain metastases, real primary lung tumors.
The Datasets: What We Actually Used
We built a multi-dataset evidence matrix. Below is an honest accounting of what each dataset is, what it can and cannot tell us, and what we actually found.
GSE223499 — The Izar Lab Dataset (Primary Validation, VERIFIED)
What it is: Single-nucleus RNA sequencing (snRNA-seq) of 31 brain metastases and 11 primary NSCLC tumors. Published in Nature Medicine 2025 by the Izar lab at Columbia (Gonzalez-Ericsson et al.). This is the largest, highest-quality open-access NSCLC brain metastasis snRNA-seq dataset in existence.
What we did:
- Downloaded all 42 per-sample count matrices (~750 MB)
- Matched cell barcodes to metadata annotations
- Ran pseudobulk differential expression using PyDESeq2
- Analyzed four cell compartments separately: tumor-like cells, CNV-confirmed tumor cells, T cells, and a broad immune aggregate (mixing compartments would confound the biology)
Why pseudobulk? Single-cell DE methods that treat each cell as an independent observation are statistically invalid — cells from the same patient are correlated. Pseudobulk aggregates cells per patient first, then runs DE across patients. It's slower and requires more samples, but the p-values are real.
Honest note on power: 5 of our 10 target genes are underpowered in this dataset. NS results for these genes should be read as inconclusive, not as evidence of absence.
GSE131907 — Kim et al. NSCLC Atlas (VERIFIED this session)
What it is: 208,506 single cells from NSCLC patients, including cells from 10 brain metastasis and 11 primary lung tumor samples. A landmark dataset for NSCLC biology.
What we did:
- Downloaded the full raw UMI matrix (408.7 MB)
- Verified 100% barcode overlap with cell annotations
- Extracted epithelial cells only (15,463 BrM / 7,270 Primary)
- Built pseudobulk count matrices and ran PyDESeq2 fresh in this session
- All three spot-check verifications passed (NOM1/NS_03, BACE1/LUNG_T34, SLC45A4/NS_12 — exact count matches)
Caveat: Size factor estimation used only 10 target genes (not the full transcriptome). LFC values are reliable; padj values are approximate.
GSE186344 — Gonzalez et al. Pan-Cancer BrM Atlas (RETIRED)
| Claimed | Actual |
|---|---|
| BrM vs Primary differential expression | BrM-only dataset (Gonzalez et al., Cell 2022, PMID: 35063085) |
- 15 brain metastasis specimens from multiple cancer types — no matched primary tumors
- Raw human data not uploaded (patient privacy); only log-normalized data available (DESeq2 requires raw counts)
Verdict: Any prior claims comparing "BrM vs Primary" from this dataset are methodologically invalid. Retired from our BrM vs Primary evidence matrix. Still valuable for BrM tumor architecture and cell-type composition — a different question.
GSE200563 — Spatial RNA-seq (VERIFIED in prior session)
What it is: Spatial transcriptomics of brain metastasis tissue sections, preserving spatial architecture at the tumor-brain interface.
What we did: ROI analysis comparing tumor-enriched vs non-tumor regions. Results represent whole-ROI signals (tumor + stroma + immune), not tumor-cell-specific expression.
CPTAC LUAD — Proteomics (UNVERIFIED — prior session only)
What it is: Clinical Proteomic Tumor Analysis Consortium lung adenocarcinoma dataset — mass spectrometry protein quantification in tumor vs adjacent normal.
Status: Values from prior sessions (NOM1 +1.81, ROPN1L −3.02, SENP8 −1.72) were not independently re-verified this session. Treat as preliminary.
DepMap — CRISPR Dependency Screens (VERIFIED this session)
What it is: Genome-wide CRISPR knockout screens across 1,186 cancer cell lines. Chronos scores from /mnt/datalake/depmap/crispr_screen/CRISPRGeneEffect.csv.
What we found:
- NOM1: mean Chronos = −0.725, 82.5% of cell lines dependent → COMMON ESSENTIAL
- All other targets: non-essential (mean Chronos > −0.1)
Human Protein Atlas (VERIFIED this session)
What we found:
- BACE1: "Group enriched (Brain, Pancreas)" — brain enrichment confirmed
- SENP8: "Tissue enhanced (Testis)" — not "tissue enriched" as previously stated (weaker claim)
The Results: A Verdict for Each Target
BACE1 — The Lead Target (Downgraded)
Verdict: NOMINAL — single dataset, borderline significance
BACE1 (Beta-Secretase 1) is best known from Alzheimer's research — the enzyme that cleaves amyloid precursor protein to generate amyloid-beta peptides. Billions were spent inhibiting it for AD; those trials failed, but compound pharmacology is exquisitely characterized.
Verified data:
| Source | Finding |
|---|---|
| GSE223499 (tumor-like) | LFC = +1.069, pval = 0.016, padj = 0.071 — borderline after correction; large effect, directionally consistent; also significant in immune aggregate (padj = 0.016) |
| GSE131907 (re-run) | padj = 0.574, LFC = +0.358 — NOT significant; prior claim (padj=0.017, log2FC=+1.22) falsified |
| GSE186344 | Previously cited p=1.08e-9 — invalid (BrM-only, no primary) |
| HPA | Group enriched (Brain, Pancreas) — brain enrichment real |
| DepMap | Non-essential (mean Chronos = −0.068, 0.1% dependent) — favorable drug-target profile |
| Literature | Chafe et al. (Sci Transl Med 2025): genome-wide in vivo CRISPRa screen; top hit BACE1; mechanism: BACE1 cleaves EGFR → soluble EGFR fragment promotes brain colonization |
Honest picture: RNA evidence is weaker than previously claimed. GSE223499 shows nominal signal; GSE131907 does not replicate. What remains strong: functional CRISPRa evidence (Chafe et al.) and brain-enriched expression. BACE1 remains the most therapeutically actionable target — not because of our RNA analysis, but because of published functional screen and availability of characterized BBB-penetrant inhibitors.
Clinical angle: Failed AD BACE1 inhibitors (verubecestat, atabecestat, lanabecestat) have known PK, BBB penetration, and safety profiles. They failed for AD because they didn't improve cognition — not because they were toxic. Repurposing for NSCLC brain metastasis is a scientifically grounded hypothesis that could reach clinical testing faster than de novo drug development.
NOM1 — The Proliferation Biomarker (Reclassified)
Verdict: CONFIRMED expression signal — NOT a direct drug target
NOM1 is a nucleolar protein involved in ribosome biogenesis. Cancer cells addicted to ribosome production upregulate this machinery.
Verified data:
| Source | Finding |
|---|---|
| GSE223499 | Only target reaching formal significance: padj = 0.0034, log2FC = +0.515; well-powered (MDE = 0.40, observed = 0.52); survives BH across 32,498 genes |
| CPTAC (unverified) | Protein reportedly elevated in LUAD vs adjacent normal (+1.81) |
| DepMap (verified) | Mean Chronos = −0.725; 82.5% of lines dependent → common essential |
Critical reclassification: A common essential gene cannot be a drug target. Inhibiting NOM1 would kill cancer cells and normal rapidly dividing cells (gut, marrow, hair follicles). No therapeutic window. This is why ribosome biogenesis trials struggle with toxicity.
What NOM1 actually tells us: Strong upregulation in BrM (padj=0.0034) reflects a highly proliferative, ribosome-hungry state. NOM1 is a proliferation biomarker, not a drug target. Its value is as a synthetic lethality anchor: what does a NOM1-high, ribosome-addicted BrM cell become uniquely dependent on? That is what the CrisPRO SL engine is designed to answer.
FAM72A — The Cell Cycle Accelerator
Verdict: TRENDING
FAM72A is involved in cell cycle progression and chromosomal instability.
| Source | Finding |
|---|---|
| GSE223499 | pval = 0.013, padj = 0.061, LFC = +1.054 — ~2-fold, misses FDR; powered; also significant in immune aggregate (padj = 0.012) |
| GSE131907 | padj = 0.193, pval = 0.039, LFC = +1.273 — direction confirmed, not significant after correction |
| GSE186344 | Previously p=1.25e-34 — invalid |
| HPA Survival | High FAM72A → unfavorable LUAD prognosis (p = 1.38e-5) |
Caveat: Absent from CPTAC proteome; not in DepMap CRISPR screen. Consistent RNA across two datasets; needs dedicated functional study.
PTPRC — The Immune Depletion Signal
Verdict: REINTERPRETED — immune desert biology confirmed
PTPRC (CD45) is a pan-leukocyte marker expressed on virtually every immune cell. Data consistently show downregulation in brain metastases.
| Source | Finding |
|---|---|
| GSE131907 (re-run) | padj = 0.0021, LFC = −2.244 — significantly down in BrM vs primary lung epithelium (stronger than prior claim) |
| GSE223499 (tumor-like) | NS, as expected — immune marker, not expected in tumor cells |
| HPA Survival | High PTPRC → favorable LUAD prognosis (p = 5.92e-5) |
| DepMap | Non-essential (mean Chronos = −0.086) |
Reinterpretation: PTPRC downregulation reflects immune desert biology. Brain metastases are immunologically cold — explaining limited checkpoint inhibitor efficacy in BrM.
Clinical implication: Combination strategies may need to "heat up" the microenvironment before checkpoint blockade can work.
ROPN1L — The RNA-Protein Paradox
Verdict: CORROBORATED (with critical caveat)
Concordant RNA upregulation across datasets, but CPTAC (unverified) reports dramatic protein downregulation in LUAD tumor (−3.02).
- GSE223499: NS (LFC = +0.184, padj = 0.824) — underpowered; very low expression (baseMean = 23); snRNA-seq poor for lowly expressed genes
RNA-protein discordance needs cell-type-resolved proteomics before therapeutic conclusions.
SENP8 — Negative Evidence
Verdict: CONFLICTED → NEGATIVE RNA EVIDENCE IN TUMOR CELLS
SUMO protease involved in protein modification.
| Source | Finding |
|---|---|
| GSE223499 (tumor) | LFC = −0.470, pval = 0.046 — down in BrM tumor-like cells (wrong direction for driver) |
| GSE131907 | LFC = +0.074, padj = 0.842 — NS |
| CPTAC (unverified) | Protein reportedly down (−1.72) |
| HPA | Tissue enhanced (Testis); tau = 0.58; brain present (20.1 nTPM amygdala) |
| DepMap | Non-essential (mean Chronos = −0.022) |
| GSE200563 (spatial) | Up in BrM regions — but ROIs = whole microenvironment |
Interpretation: SENP8 may be up in BrM microenvironment (astrocytes, TAMs) but down in tumor cells. Not supported as a tumor-cell target by snRNA-seq.
SLC25A32 — A Null Result Worth Explaining
Verdict: NULL (RNA) / METABOLIC DEPENDENCY (DepMap)
Mitochondrial FAD transporter. snRNA-seq shows a true null:
| Source | Finding |
|---|---|
| GSE223499 | LFC = +0.031, padj = 0.949 — powered null (MDE = 0.69, observed = 0.03) |
| GSE131907 | LFC = +0.066, padj = 0.842 — consistent null |
Why prior analyses suggested upregulation: Raw pseudobulk counts were higher in BrM, but after library-size normalization (more cells in BrM samples), per-cell expression is identical — cell-composition artifacts.
What remains valid: DepMap essentiality is independent of expression. Therapeutic framing shifts from "overexpressed target" to "metabolic dependency."
SLC45A4, TFEB, ATP10D
| Gene | Verdict | Summary |
|---|---|---|
| SLC45A4 | TRENDING / WEAK | GSE223499: pval=0.064, padj=0.183, LFC=+1.062; GSE131907: opposite direction, NS — not replicated |
| TFEB | INCONCLUSIVE | LFC=−0.336, padj=0.516 (tumor-like); padj=0.014 in T cells; TF is expression ≠ activity |
| ATP10D | NO_EVIDENCE | LFC=−0.132, padj=0.879; CRISPRa hit = artificial overactivation, not endogenous DE |
The Evidence Matrix (Corrected)
| Gene | GSE223499 (tumor-like) | GSE131907 (re-run) | GSE186344 | CPTAC Protein | DepMap | Verdict |
|---|---|---|---|---|---|---|
| NOM1 | ★ padj=0.0034, LFC=+0.52 ↑ | NS ↑ (padj=0.40) | INVALID | +1.81 ↑ (unverified) | COMMON ESSENTIAL | BIOMARKER (not drug target) |
| BACE1 | ~ padj=0.071, LFC=+1.07 ↑ | NS padj=0.574 ↑ | INVALID | Not detected | Non-essential | NOMINAL (functional lit. strong) |
| FAM72A | ~ padj=0.061, LFC=+1.05 ↑ | ~ pval=0.039 ↑ | INVALID | Not detected | Not screened | TRENDING |
| SLC45A4 | ~ pval=0.064, LFC=+1.06 ↑ | NS ↓ (opposite) | INVALID | Not detected | Non-essential | WEAK |
| ROPN1L | NS ↑ (underpowered) | NS | — | −3.02 ↓ (unverified) | Non-essential | CORROBORATED* |
| PTPRC | NS (immune marker) | ★ padj=0.0021 ↓ | — | ↓ (unverified) | Non-essential | REINTERPRETED |
| SENP8 | ↓ pval=0.046 | NS | — | −1.72 ↓ (unverified) | Non-essential | NEGATIVE |
| SLC25A32 | NULL (padj=0.95) | NULL | — | +0.95 ↑ (unverified) | STRONGLY SELECTIVE | NULL (RNA) |
| TFEB | NS ↓ tumor / ↑ T cells | NS | — | +0.09 (unverified) | Non-essential | INCONCLUSIVE |
| ATP10D | NS | NS | — | ↓ (unverified) | Non-essential | NO_EVIDENCE |
Legend:
- ★ = significant after FDR (padj < 0.05)
- ~ = nominal (pval < 0.05, padj > 0.05)
- Strikethrough = invalid comparison (GSE186344 is BrM-only)
- * = RNA-protein discordance requires resolution
- CPTAC marked unverified = not re-confirmed this session
What This Means for a Cure
What we have
- One statistically significant RNA hit: NOM1 (padj=0.0034) — proliferation biomarker, not drug target
- One nominally significant RNA hit with strong functional literature: BACE1 (padj=0.071 GSE223499; Chafe et al. CRISPRa)
- Confirmed immune desert signal: PTPRC down (padj=0.0021 GSE131907)
- True null for SLC25A32 RNA (metabolic dependency framing still valid)
- Repurposable BACE1 compounds (characterized AD inhibitors with BBB penetration)
What we don't have
- Second independent dataset confirming BACE1 RNA (GSE131907 does not replicate)
- Any direct drug target with both confirmed expression and selective essentiality
- Patient-level validation or clinical trial
The path forward
| Stage | Action |
|---|---|
| 1 — Functional validation | Test BACE1 inhibition in NSCLC brain-tropic lines; Chafe mechanism (EGFR cleavage → colonization); orthotopic mouse model |
| 2 — Synthetic lethality (NOM1) | Find second gene essential in NOM1-high, ribosome-addicted BrM cells — CrisPRO SL engine |
| 3 — Biomarker development | Identify BACE1-high BrM patients for trial enrichment |
| 4 — Combination strategy | BACE1 inhibition + immune microenvironment reprogramming (PTPRC-low, cold BrM) |
The Synthetic Lethality Layer: What Comes Next
Layer 1 = target identification. Layer 2 = synthetic lethality.
Instead of directly targeting NOM1 (pan-lethal) or BACE1 (nominally expressed at RNA), identify a second gene essential specifically in NOM1-high or BACE1-high tumor cells. Inhibit that gene → kill cancer without touching normal tissue.
This is the PARP inhibitor paradigm applied to brain metastasis: BRCA-mutant cells depend on PARP because they lost one DNA repair pathway. The tumor's biology defines its vulnerability.
For NSCLC brain metastasis: What does a NOM1-high BrM cell become dependent on?
The CrisPRO SL engine — validated for ovarian cancer (MBD4/gemcitabine axis, VALIDATED tier) and expanding to lung — is designed to answer exactly that.
The Honest Caveats
- Sample size: GSE223499 — 28 BrM vs 11 primary in tumor-like pseudobulk. Enough for discovery, not definitive conclusions.
- Statistical power: 5/10 targets underpowered; NS for BACE1, SLC45A4, ROPN1L, ATP10D, TFEB = inconclusive, not absence.
- STK11 confounding: 24% BrM vs 50% primary carry STK11 mutations — some DE may reflect STK11, not BrM biology.
- GSE186344 retirement: BrM-only; all prior BrM vs Primary p-values from this dataset removed.
- CPTAC unverified: NOM1, ROPN1L, SENP8 protein values preliminary this session.
- RNA ≠ protein ≠ function: Expression is a starting point, not a finish line.
- No causal evidence yet (except BACE1 via Chafe et al.): correlational DE ≠ proof of causation.
Why This Approach Is Different
Most cancer research: one gene, one dataset, one cancer type.
We took a multi-modal, multi-dataset approach and reported conflicts alongside confirmations. When re-analysis contradicted prior signals (BACE1 in GSE131907; entire GSE186344 BrM vs Primary use), we updated conclusions rather than defending prior results.
v3 is more conservative than v2:
- BACE1: CONFIRMED → NOMINAL
- NOM1: drug target → proliferation biomarker
- GSE186344: retired for BrM vs Primary
These are not failures — they are the scientific process working correctly.
The cure for lung cancer brain metastasis will not come from a single paper or dataset. It will come from accumulation of honest, rigorous, multi-modal evidence.
What You Can Do
Researchers: Datasets are public — GSE223499 and GSE131907 on GEO; CPTAC on PDC; DepMap at depmap.org. Everything here is reproducible.
Clinicians: BACE1 inhibitors are characterized; Chafe et al. (Sci Transl Med 2025) provides mechanistic rationale. The case for compassionate use or basket trial is building.
Patients and advocates: This work is for you. The goal is not a paper. The goal is a treatment.
CrisPRO is a research platform. All findings are for scientific discussion only and are not validated for clinical decision-making. For medical decisions, consult a qualified oncologist.
Data sources
- GSE223499 — Gonzalez-Ericsson et al., Nat Med 2025
- GSE131907 — Kim et al., Nat Commun 2020
- GSE186344 — Gonzalez et al., Cell 2022 (BrM architecture reference only)
- CPTAC LUAD — Edwards et al., Cell 2023 (preliminary)
- DepMap 24Q4 — Broad Institute (verified from datalake)
- Human Protein Atlas — Uhlén et al., Science 2015 (verified this session)
- Chafe et al. 2025 — Sci Transl Med (BACE1 CRISPRa screen)
Correction history
| Version | Changes |
|---|---|
| v1 | Original |
| v2 | Corrected NOM1 padj; fixed author attribution; added power analysis |
| v3 | Corrected BACE1 GSE131907; retired GSE186344; reclassified NOM1 as biomarker; updated PTPRC GSE131907; corrected SENP8 HPA classification |
If you want this saved as a file (e.g. blog-brm-v3.md in a project), say where and I can write it there. I can also tune the tone (shorter executive summary up top, or more “public-facing” vs “methods-heavy”).