🧠 ML/RL Intelligence Pipeline

From Data to Alpha: Trade-Matrix Learning Architecture
Research-Grade Methodology | Automated Weekly Updates | Production-Deployed
Weekly
Automated Updates
Transfer Learning preserves old model knowledge while adapting to new data. Automated pipeline runs every Sunday with validation gates (IC ≥ 0.03, Sharpe > 0.5).
9-13
Boruta-Selected Features
Automated wrapper feature selection identifies statistically significant features from 120+ candidates. Locked feature order prevents silent sklearn prediction errors.
45min
Curriculum RL Training
Progressive difficulty curriculum (3 stages) reduces RL agent training time vs. standard training. PPO with transaction cost modeling and slippage simulation.

🔄 Three Critical Data Pipelines

Business Value: Trade-Matrix separates training, inference, and execution into distinct pipelines. This separation prevents training-inference coupling failures and allows independent optimization of latency (inference) vs. accuracy (training) requirements.
📚
Training Pipeline
Frequency Weekly (Sunday)
Duration 73 minutes total
Data Volume 3+ years (6,977 bars)
Validation 40 WFV windows
Output 3 TL models + 3 RL policies
âš¡
Inference Pipeline
Frequency Every 4H bar close
Latency <5ms
Features 9-11 per instrument (Boruta-selected)
Model Loading 4-tier resilient
Output Signal + Confidence + IC
🎯
Execution Pipeline
Frequency Real-time (on signal)
E2E Latency <50ms
RL Position Sizing 4-tier fallback
Risk Checks HRAA v2 + Circuit Breaker
Output Market/Limit orders

📊 Pipeline Deep Dive

Transfer Learning Training Pipeline (Every Sunday)

Methodology: Walk-Forward Validation with 200-bar purge gap prevents data leakage. This exceeds López de Prado's recommendation of h ≈ 0.01T (≈70 bars for our dataset). The purge gap is a well-established technique in quantitative finance research for preventing look-ahead bias.
graph TB subgraph "Data Sources" BYBIT_HIST["Bybit Historical
4H OHLCV 2022-2025
6,977+ bars"] DERIBIT_DVOL["Deribit DVOL
Volatility Index
Real-time + Historical"] end subgraph "Feature Engineering" RAW_FEATURES["Raw Features
112 Total - 56 Raw + 56 Rank"] RANK_NORM["Rank Normalization
Quintile Transform"] BORUTA["Boruta Selection
9-11 Features/Instrument"] LOCKED_FEATURES["Locked Feature Order
Production Consistency"] end subgraph "Walk-Forward Validation" WFV["40 Weekly Windows
200-Bar Purge Gap"] TRAIN_WINDOW["Training Window
In-Sample Data"] VAL_WINDOW["Validation Window
Out-of-Sample Data"] PURGE_GAP["Purge Gap
Prevent Lookahead"] end subgraph "Transfer Learning - Per Instrument" OLD_MODEL["OLD Model
Dynamic Tree Count - Frozen"] NEW_TREES["NEW Trees
50-250 - Grid Search"] SAMPLE_WEIGHT["Exponential Recency Weighting
decay_lambda=0.005"] TL_MODEL["Final TL Model
BTC/ETH/SOL"] end subgraph "Validation Gates" IC_CHECK["IC at least 0.05
TL Training Gate"] HITRATE_CHECK["Hit Rate at least 52 pct
Directional Accuracy"] SHARPE_CHECK["Sharpe at least 0.5
Risk-Adjusted Return"] DEPLOY_DECISION["Deploy or Rollback"] end subgraph "Model Registry" MLFLOW["MLflow Registry
Experiment Tracking"] MINIO["MinIO Storage
Model Artifacts 319MB"] PROD_TAG["Production Tag
Auto-Promotion"] end BYBIT_HIST --> RAW_FEATURES DERIBIT_DVOL --> RAW_FEATURES RAW_FEATURES --> RANK_NORM RANK_NORM --> BORUTA BORUTA --> LOCKED_FEATURES LOCKED_FEATURES --> WFV WFV --> TRAIN_WINDOW WFV --> VAL_WINDOW WFV --> PURGE_GAP TRAIN_WINDOW --> OLD_MODEL OLD_MODEL --> NEW_TREES NEW_TREES --> SAMPLE_WEIGHT SAMPLE_WEIGHT --> TL_MODEL TL_MODEL --> IC_CHECK IC_CHECK --> HITRATE_CHECK HITRATE_CHECK --> SHARPE_CHECK SHARPE_CHECK --> DEPLOY_DECISION DEPLOY_DECISION -->|Pass| MLFLOW DEPLOY_DECISION -->|Fail| OLD_MODEL MLFLOW --> MINIO MINIO --> PROD_TAG style TL_MODEL fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000 style BORUTA fill:#00ff88,stroke:#000,stroke-width:2px,color:#000 style DEPLOY_DECISION fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000

Why Transfer Learning Outperforms Traditional Retraining

â–¼
Aspect Transfer Learning (Trade-Matrix) Full Retraining (Alternative Approach) Advantage
Knowledge Retention Dynamic tree count frozen from OLD model Starts from scratch every week ✓ Preserves patterns from 3+ years of data
Adaptation Speed 50-250 new trees (grid search) + exponential recency weighting Slow convergence on new regimes ✓ Faster adaptation to new data
Training Stability Warm-started from previous model Random initialization each time ✓ Consistent performance week-over-week
Catastrophic Forgetting Prevented by frozen trees Risk of losing historical patterns ✓ Robust to short-term market noise
Computational Efficiency Only trains 50-250 new trees (grid-searched) Trains 150+ trees from scratch ✓ 65min total vs ~180min for full retraining
Design Rationale: Transfer Learning enables faster adaptation to regime shifts by preserving historical patterns in frozen trees while training new trees on recent data. This is particularly valuable in volatile crypto markets where market regimes can shift rapidly.

Real-Time ML Inference Pipeline (<5ms Latency)

Critical Production Issue Fixed: ERROR #102 and #103 (bar continuity failures) were root-caused and fixed in December 2025. Gap detection now prevents data holes that could cause stale feature computation and incorrect signals.
sequenceDiagram participant BYBIT as Bybit Exchange participant GAP_DET as Gap Detection participant CACHE as Feature Cache participant FEAT_ENG as Feature Engineering participant MODEL_LOAD as Model Loader participant ML_INF as ML Inference participant IC_VAL as IC Validator participant RL_AGENT as RL Position Sizer Note over BYBIT,RL_AGENT: Real-Time Inference - Every 4H Bar Close BYBIT->>GAP_DET: New 4H Bar - 2025-01-05 00:00 rect rgb(100, 50, 0) Note over GAP_DET: Gate 1 PRE-BOOTSTRAP: Check Last 200 Bars GAP_DET->>GAP_DET: Detect Missing Bars - 00:00 UTC convention alt Gap Found GAP_DET->>GAP_DET: Severity: CRITICAL/MINOR GAP_DET->>BYBIT: Fetch Missing Bars Note right of GAP_DET: ERROR 102 Fix: Sequential Startup end end GAP_DET->>CACHE: Check Feature Cache alt Cache Hit CACHE->>FEAT_ENG: Return Cached Features else Cache Miss CACHE->>FEAT_ENG: Compute Features FEAT_ENG->>FEAT_ENG: 56 Raw Indicators FEAT_ENG->>FEAT_ENG: Rank Normalization FEAT_ENG->>FEAT_ENG: Select Boruta 9-11 FEAT_ENG->>CACHE: Store - TTL 1h end FEAT_ENG->>MODEL_LOAD: Request Model - BTC/ETH/SOL rect rgb(0, 50, 100) Note over MODEL_LOAD: 4-Tier Resilient Loading MODEL_LOAD->>MODEL_LOAD: Tier 1: MLflow Registry - Production Tag alt Tier 1 Fails MODEL_LOAD->>MODEL_LOAD: Tier 2: Run ID Fallback end alt Tier 2 Fails MODEL_LOAD->>MODEL_LOAD: Tier 3: Direct S3 end alt Tier 3 Fails MODEL_LOAD->>MODEL_LOAD: Tier 4: Local Checkpoint end end MODEL_LOAD->>ML_INF: Model + locked_features.json rect rgb(0, 100, 50) Note over ML_INF: Sub-5ms Inference ML_INF->>ML_INF: Validate Feature Order - CRITICAL sklearn checks ML_INF->>ML_INF: Model.predict - Regression Output ML_INF->>ML_INF: Generate Signal + Confidence end ML_INF->>IC_VAL: Signal + Confidence IC_VAL->>IC_VAL: Calculate Rolling IC - 20-bar window alt IC at least 0.03 IC_VAL->>RL_AGENT: Valid Signal - High Quality else IC below 0.03 IC_VAL->>IC_VAL: Degrade to Kelly Baseline IC_VAL->>RL_AGENT: Degraded Signal - Use TIER 3 Fallback end Note over BYBIT,RL_AGENT: Total Latency under 5ms Cache Hit, under 15ms Cache Miss

Feature Order Validation: Why It's Critical

â–¼
Production Issue Discovered: In November 2025, we discovered sklearn validates both feature names AND order. Mismatched order causes silent prediction errors—not exceptions. This is a well-documented sklearn behavior that can produce arbitrarily wrong predictions without any error message.

Our Solution: locked_features.json Artifact

Every model stores its exact feature order as an MLflow artifact:

{
  "model_id": "btcusdt_tl_week51",
  "training_date": "2025-12-22",
  "features": [
    "rsi_14_rank",
    "macd_signal_rank",
    "bb_width_rank",
    "atr_14_rank",
    "volume_ratio_rank",
    "momentum_20_rank",
    "obv_delta_rank",
    "dvol_btc_rank",
    "correlation_eth_rank"
  ],
  "feature_count": 9,
  "checksum": "sha256:a3f2..."
}
                                

Validation at Inference Time

  1. Download locked_features.json from MLflow artifact store
  2. Reorder computed features to match exact training order
  3. Checksum validation ensures no corruption
  4. Fail fast if feature mismatch detected (no silent errors)
Result: Zero feature order incidents since implementation (November 2025). The locked_features.json artifact with checksum validation ensures feature order consistency across all deployments.

Live Trading Execution Pipeline (E2E <50ms)

Design Principle: The 4-tier fallback system (FULL_RL → BLENDED → PURE_KELLY → EMERGENCY_FLAT) ensures graceful degradation of position sizing. If ML signals degrade or RL agents fail, the system falls back to proven Kelly criterion sizing rather than halting entirely.
graph TB subgraph "Signal Input" ML_SIG["ML Signal
Predicted Return + Confidence"] IC_VAL["IC Validation
0.05 Threshold"] end subgraph "4-Tier RL Fallback System" TIER1["TIER 1: FULL_RL
Confidence >= 0.50, IC >= 0.05
100% RL Policy"] TIER2["TIER 2: BLENDED
Medium Confidence
50% RL + 50% Kelly"] TIER3["TIER 3: PURE_KELLY
Low Confidence or IC < 0.03
100% Kelly Baseline"] TIER4["TIER 4: EMERGENCY
Circuit Breaker OPEN
Minimum Position Only"] end subgraph "Risk Management" HRAA["HRAA v2
Position Size Capping"] CB["Circuit Breaker
Drawdown > 5%"] end subgraph "Order Execution" ORDER["Order Generation
Market/Limit"] BROKER["Bybit API
< 50ms E2E"] end ML_SIG --> IC_VAL IC_VAL -->|Pass| TIER1 IC_VAL -->|Fail| TIER3 TIER1 --> HRAA TIER2 --> HRAA TIER3 --> HRAA TIER4 --> HRAA HRAA --> CB CB -->|OK| ORDER CB -->|TRIP| TIER4 ORDER --> BROKER style TIER1 fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000 style TIER4 fill:#ff6b6b,stroke:#000,stroke-width:2px,color:#000 style CB fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000
Tier Conditions Position Sizing Risk Profile Target Risk Profile
TIER 1: FULL_RL Confidence ≥ 0.50
IC ≥ 0.05
100% RL Policy Highest return potential Aggressive
TIER 2: BLENDED Medium Confidence
OR IC ≥ 0.03
50% RL + 50% Kelly Balanced risk-reward Balanced
TIER 3: PURE_KELLY Low Confidence
OR IC < 0.03
100% Kelly Baseline Conservative, proven strategy Conservative
TIER 4: EMERGENCY Circuit Breaker OPEN
Drawdown > 5%
0% Position Size Capital preservation mode Capital preservation

Regime-Adaptive Kelly Fractions

Market Regime Kelly Fraction Risk Multiplier (γ) Typical Conditions
Bull 67% γ = 1.5 Strong upward trends, low volatility
Neutral 50% γ = 2.0 Range-bound markets, moderate volatility
Bear 25% γ = 4.0 Downward trends, elevated volatility
Crisis 17% γ = 6.0 Extreme volatility, market dislocation
Approach: The 4-tier fallback adapts position sizing based on signal confidence, market regime, and drawdown state. This is conceptually superior to fixed-fraction sizing but actual performance depends on model quality and market conditions. Backtested results showed improvement over fixed sizing in walk-forward validation.

Fully Automated Weekly Pipeline (73 Minutes)

Operational Excellence: The weekly pipeline automates all 8 steps from data fetch to production deployment in ~73 minutes with zero human intervention. Validation gates (IC ≥ 0.03, Sharpe > 0.5, p < 0.15) ensure only quality models reach production.
1
Data Fetch
~3 minutes

Fetch 1 week of new OHLCV bars (42 bars: 7 days × 6 bars/day) from Bybit for BTC, ETH, SOL. Includes DVOL volatility data from Deribit. Validates timestamp continuity (ERROR #103 fix).

2
Feature Engineering
~5 minutes

Compute 112 total features (56 raw + 56 rank), apply rank normalization, select 9-11 Boruta features per instrument. Lock feature order in JSON artifact for production consistency.

3
Transfer Learning Training (3 instruments)
~30 minutes (10min each)

Train TL models for BTC, ETH, SOL in parallel. Freeze 100 OLD trees, warm-start 50 NEW trees with 5x sample weighting on post-regime data. Walk-Forward Validation across 40 weekly windows with 200-bar purge gap.

4
Precalc Signal Generation
~5 minutes

Generate signals for last 200 bars using new models. Used for IC calculation and sanity checks. Validates model behavior on recent data.

5
RL Agent Training (3 policies)
~15 minutes (5min each with curriculum)

Train RL position sizing agents using curriculum learning (3 difficulty stages). Proximal Policy Optimization (PPO) with transaction cost model and slippage simulation. Curriculum learning (3 progressive difficulty stages) improves convergence and final policy quality.

6
Backtesting & Validation
~5 minutes

Run fast backtest mode (60x speedup via caching) on last 6 months. Calculate Sharpe ratio, hit rate, IC, maximum drawdown. Compare to previous model performance.

7
Validation Gates
~2 minutes

Deploy if ALL pass:
• IC ≥ 0.03 (information coefficient)
• Hit Rate ≥ 52% (directional accuracy)
• Sharpe > 0.5 (risk-adjusted return)
• p-value < 0.15 (statistical significance)
Rollback if ANY fail (keeps previous week's models in production)

8
Model Export & Deployment
~8 minutes

Export MLflow artifacts (models + metadata), build Docker container (319MB), push to GHCR, trigger K3S rolling update. Zero-downtime deployment with health checks. Total: 73 minutes from data fetch to production.

Business Continuity: If weekly pipeline fails (GitHub Actions outage, data provider issue), previous week's models remain in production. No manual intervention required. System automatically alerts via Prometheus → Grafana → Slack. Mean Time To Recovery (MTTR): <10 minutes for known issues.