SAE Intelligence: Interpretable Genomic Features
Go beyond the score. See the exact biological features—exons, TF motifs, protein structures—that drive a prediction and understand *why* a variant is disruptive.
Why It Matters
- Transform black-box predictions into transparent, biologically-grounded stories
- Expose the model's internal logic to explain variant impact
- Flag risky designs and steer generative AI (roadmap)
What We Delivered
- Interactive feature visualizations with disruption scores
- Automated prompt safety checks
- Clear biological explanations for every prediction
Built for Different Audiences
SAE Intelligence serves both scientific discovery and engineering excellence
For Scientists
- **Readable Biology:** See features like exon boundaries, TF motifs, and secondary structures.
- **Quantifiable Disruption:** Pinpoint exactly which features a variant impacts with disruption scores (ΔLL).
- **Explainable AI:** Move from a simple score to a full, auditable explanation for every prediction.
For Engineers
- **Live Frontend Components:** Interactive visualizations powered by robust simulations.
- **Clear Data Contracts:** Stable JSON from simulations drives predictable UI behavior.
- **Roadmap to Production:** Clear path from current RUO simulations to future-state backend services.
How It Works Today
Current implementation details and technical foundation
Component 1
**`DynamicOracleExplain` Component:** An interactive, multi-track visualizer that displays SAE features and their disruption scores (ΔLL) directly on the genomic sequence.
Component 2
**`simulateVariantImpactWithSAE` Function:** A powerful simulation in `simulations.ts` that generates the rich feature and attribution data needed to power our visualizations.
Component 3
**Prompt Quality Checker:** A safety gate that flags pathological inputs (like low‑complexity repeats) in our design flows.
Core Capabilities
From feature attribution to activation steering
Feature Attribution (Live)
LIVETechnical
We simulate the extraction of active SAE features for a given sequence and calculate the change in log-likelihood (ΔLL) caused by a variant.
Scientific
Connects the model's internal logic to human-readable biological concepts (RUO).
Business
- **Trust:** Defend and document decisions with feature-linked, quantitative explanations.
Use Cases
Today:
1. **Interactive feature tracks** in our `DynamicOracleExplain` component.
2. **Quantitative disruption scores** to rank a variant's impact.
Prompt Safety (Live)
LIVETechnical
Detect low‑complexity repeats and other pathological attractors; flag viral/sensitive content (aligned with Forge safety gates).
Scientific
Reduces junk outputs and improves the reliability of generative demos.
Business
- **Quality:** Fewer dead‑ends in design flows and cleaner, more compelling demos.
Use Cases
Today:
1. **Automated safety checks** on design inputs, with clear user warnings.
Activation Steering (Roadmap)
ROADMAPTechnical
Expose endpoints to nudge/target feature activations (e.g., chromatin patterns, motif presence) with compute‑aware beam search.
Scientific
Maps CrisPRO.ai‑style inference‑time scaling to controllable design objectives.
Business
- **Control:** Achieve predictable design quality scaling with transparent, auditable controls.
Use Cases
Roadmap:
1. **Steer** generation towards desired feature sets; **measure** quality and efficacy metrics.
Interactive Demonstrations
See SAE Intelligence in action
Feature Overlay Visualization
Toggle Features:
Genomic Sequence (43044290-43044450):
Feature Types:
Disruption Scores (ΔLL)
Exon Boundary
High Impact
TF Motif (AP-1)
High Impact
Secondary Structure
Medium Impact
Splice Site
Low Impact
Key Insight:
The ΔLL (Delta Log-Likelihood) score quantifies how much a variant disrupts each biological feature. Negative values indicate disruption, with more negative values showing greater impact.
Prompt Safety Checker
Key Benefits:
- • Prevents pathological inputs that could generate junk outputs
- • Flags low-complexity repeats and ambiguous sequences
- • Improves reliability of generative AI demonstrations
- • Provides clear suggestions for sequence improvement
Activation Steering (Roadmap)
Overall Progress
46% CompleteAP-1 Binding Sites
Transcription factor binding motifs
Open Chromatin
Accessible chromatin regions
Alpha Helix
Protein secondary structure
Roadmap Feature
Activation steering is currently in development. This demo shows the planned interface for controlling feature activations during generation, with compute-aware beam search and predictable quality scaling.
Planned Benefits:
- • Steer generation towards desired biological features
- • Predictable quality scaling with transparent controls
- • Compute-aware beam search for efficient generation
- • Auditable design process with clear provenance
Observed Outcomes
Real-world impact from SAE Intelligence
Observed Outcomes
Institutional Value
Why SAE Intelligence matters for your organization
For the Institution
- Interpretable overlays increase confidence and adoption across teams.
- Safer demos and design explorations with automated prompt checks.
- A clear path to controllable, auditable in-silico design (roadmap).
Technical Implementation
Current state and roadmap details
Data Contract
SAE Features
`{ featureId, description, position, strength }` - The active biological features at specific locations.
Delta LL Series
`{ featureId, description, deltaLL }` - The quantitative disruption score for each feature caused by the variant.
Provenance
run_id, model_profile, etc.
Code Locations
Frontend Simulation (Live)
src/utils/simulations.ts
Frontend Component (Live)
src/components/site/blocks/DynamicOracleExplain.tsx
Backend Service (Roadmap)
@/api/routers/sae.py