1. Catalogue
Candidate strategies are framed from 28 academic and practitioner families — mean reversion, regime filters, channel breakouts, liquidity sweeps, volatility- gated momentum. Each candidate is written down as testable rules before any backtest runs. No discretionary entries, no machine learning, no adaptive parameters.
2. Engine screen
A vectorized Python engine tests each candidate over five years of institutional data. This layer is fast and merciless: most candidates die here. Survivors advance with their exact parameters frozen.
3. Broker proof
The engine's bars and the deployment broker's bars are different objects — index CFDs trade ~23 hours, so the daily close itself differs from cash-session data. Every survivor is therefore re-implemented as compiled MQL5 and re-tested on the broker's own price history through MetaTrader 5 Strategy Tester. Strategies that only work on idealized data die here.
4. Stress
- Full-history windows (2022–2026): including the 2022 bear market. A strategy that only works in a bull regime is labeled regime-dependent or retired.
- In-sample / out-of-sample splits: the OOS decay ratio (OOS PF ÷ IS PF) is our gold metric. Our anchor system holds a decay of 1.04 over 212 unseen trades.
- Parameter plateaus: a real edge survives neighbouring parameters. A spike that only works at one setting is curve-fitting and is rejected.
5. Audit
An independent audit graded every public claim against the artifacts on disk — backtest logs, registry records, and exports. Inflated claims were corrected publicly (a celebrated PF 11.49 turned out to be five lucky trades; it is in the graveyard now). Execution defects found by the audit — like orders dying silently in the midnight session break — were fixed and documented.
The funnel in numbers
- 113 strategies catalogued
- 199 backtests across both layers
- 104 strategies killed — 92% rejection rate
- 9 broker-validated systems, 5 deploy-grade
Statistical honesty rules
- Sample sizes below 20 trades are labeled low-confidence — anecdotes, not evidence.
- Live results are expected to run 20–40% below backtest (multiple-testing haircut).
- Simulations are labeled as simulations. Telemetry is labeled as telemetry. Never mixed.
