Trade-Matrix ML/RL Data Pipeline Architecture

📊 Pipeline Deep Dive

Transfer Learning Training Pipeline (Every Sunday)

Methodology: Walk-Forward Validation with 200-bar purge gap prevents data leakage. This exceeds López de Prado's recommendation of h ≈ 0.01T (≈70 bars for our dataset). The purge gap is a well-established technique in quantitative finance research for preventing look-ahead bias.

graph TB subgraph "Data Sources" BYBIT_HIST["Bybit Historical
4H OHLCV 2022-2025
6,977+ bars"] DERIBIT_DVOL["Deribit DVOL
Volatility Index
Real-time + Historical"] end subgraph "Feature Engineering" RAW_FEATURES["Raw Features
112 Total - 56 Raw + 56 Rank"] RANK_NORM["Rank Normalization
Quintile Transform"] BORUTA["Boruta Selection
9-11 Features/Instrument"] LOCKED_FEATURES["Locked Feature Order
Production Consistency"] end subgraph "Walk-Forward Validation" WFV["40 Weekly Windows
200-Bar Purge Gap"] TRAIN_WINDOW["Training Window
In-Sample Data"] VAL_WINDOW["Validation Window
Out-of-Sample Data"] PURGE_GAP["Purge Gap
Prevent Lookahead"] end subgraph "Transfer Learning - Per Instrument" OLD_MODEL["OLD Model
Dynamic Tree Count - Frozen"] NEW_TREES["NEW Trees
50-250 - Grid Search"] SAMPLE_WEIGHT["Exponential Recency Weighting
decay_lambda=0.005"] TL_MODEL["Final TL Model
BTC/ETH/SOL"] end subgraph "Validation Gates" IC_CHECK["IC at least 0.05
TL Training Gate"] HITRATE_CHECK["Hit Rate at least 52 pct
Directional Accuracy"] SHARPE_CHECK["Sharpe at least 0.5
Risk-Adjusted Return"] DEPLOY_DECISION["Deploy or Rollback"] end subgraph "Model Registry" MLFLOW["MLflow Registry
Experiment Tracking"] MINIO["MinIO Storage
Model Artifacts 319MB"] PROD_TAG["Production Tag
Auto-Promotion"] end BYBIT_HIST --> RAW_FEATURES DERIBIT_DVOL --> RAW_FEATURES RAW_FEATURES --> RANK_NORM RANK_NORM --> BORUTA BORUTA --> LOCKED_FEATURES LOCKED_FEATURES --> WFV WFV --> TRAIN_WINDOW WFV --> VAL_WINDOW WFV --> PURGE_GAP TRAIN_WINDOW --> OLD_MODEL OLD_MODEL --> NEW_TREES NEW_TREES --> SAMPLE_WEIGHT SAMPLE_WEIGHT --> TL_MODEL TL_MODEL --> IC_CHECK IC_CHECK --> HITRATE_CHECK HITRATE_CHECK --> SHARPE_CHECK SHARPE_CHECK --> DEPLOY_DECISION DEPLOY_DECISION -->|Pass| MLFLOW DEPLOY_DECISION -->|Fail| OLD_MODEL MLFLOW --> MINIO MINIO --> PROD_TAG style TL_MODEL fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000 style BORUTA fill:#00ff88,stroke:#000,stroke-width:2px,color:#000 style DEPLOY_DECISION fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000

Why Transfer Learning Outperforms Traditional Retraining

▼

Aspect	Transfer Learning (Trade-Matrix)	Full Retraining (Alternative Approach)	Advantage
Knowledge Retention	Dynamic tree count frozen from OLD model	Starts from scratch every week	✓ Preserves patterns from 3+ years of data
Adaptation Speed	50-250 new trees (grid search) + exponential recency weighting	Slow convergence on new regimes	✓ Faster adaptation to new data
Training Stability	Warm-started from previous model	Random initialization each time	✓ Consistent performance week-over-week
Catastrophic Forgetting	Prevented by frozen trees	Risk of losing historical patterns	✓ Robust to short-term market noise
Computational Efficiency	Only trains 50-250 new trees (grid-searched)	Trains 150+ trees from scratch	✓ 65min total vs ~180min for full retraining

Design Rationale: Transfer Learning enables faster adaptation to regime shifts by preserving historical patterns in frozen trees while training new trees on recent data. This is particularly valuable in volatile crypto markets where market regimes can shift rapidly.

Real-Time ML Inference Pipeline (<5ms Latency)

Critical Production Issue Fixed: ERROR #102 and #103 (bar continuity failures) were root-caused and fixed in December 2025. Gap detection now prevents data holes that could cause stale feature computation and incorrect signals.

sequenceDiagram participant BYBIT as Bybit Exchange participant GAP_DET as Gap Detection participant CACHE as Feature Cache participant FEAT_ENG as Feature Engineering participant MODEL_LOAD as Model Loader participant ML_INF as ML Inference participant IC_VAL as IC Validator participant RL_AGENT as RL Position Sizer Note over BYBIT,RL_AGENT: Real-Time Inference - Every 4H Bar Close BYBIT->>GAP_DET: New 4H Bar - 2025-01-05 00:00 rect rgb(100, 50, 0) Note over GAP_DET: Gate 1 PRE-BOOTSTRAP: Check Last 200 Bars GAP_DET->>GAP_DET: Detect Missing Bars - 00:00 UTC convention alt Gap Found GAP_DET->>GAP_DET: Severity: CRITICAL/MINOR GAP_DET->>BYBIT: Fetch Missing Bars Note right of GAP_DET: ERROR 102 Fix: Sequential Startup end end GAP_DET->>CACHE: Check Feature Cache alt Cache Hit CACHE->>FEAT_ENG: Return Cached Features else Cache Miss CACHE->>FEAT_ENG: Compute Features FEAT_ENG->>FEAT_ENG: 56 Raw Indicators FEAT_ENG->>FEAT_ENG: Rank Normalization FEAT_ENG->>FEAT_ENG: Select Boruta 9-11 FEAT_ENG->>CACHE: Store - TTL 1h end FEAT_ENG->>MODEL_LOAD: Request Model - BTC/ETH/SOL rect rgb(0, 50, 100) Note over MODEL_LOAD: 4-Tier Resilient Loading MODEL_LOAD->>MODEL_LOAD: Tier 1: MLflow Registry - Production Tag alt Tier 1 Fails MODEL_LOAD->>MODEL_LOAD: Tier 2: Run ID Fallback end alt Tier 2 Fails MODEL_LOAD->>MODEL_LOAD: Tier 3: Direct S3 end alt Tier 3 Fails MODEL_LOAD->>MODEL_LOAD: Tier 4: Local Checkpoint end end MODEL_LOAD->>ML_INF: Model + locked_features.json rect rgb(0, 100, 50) Note over ML_INF: Sub-5ms Inference ML_INF->>ML_INF: Validate Feature Order - CRITICAL sklearn checks ML_INF->>ML_INF: Model.predict - Regression Output ML_INF->>ML_INF: Generate Signal + Confidence end ML_INF->>IC_VAL: Signal + Confidence IC_VAL->>IC_VAL: Calculate Rolling IC - 20-bar window alt IC at least 0.03 IC_VAL->>RL_AGENT: Valid Signal - High Quality else IC below 0.03 IC_VAL->>IC_VAL: Degrade to Kelly Baseline IC_VAL->>RL_AGENT: Degraded Signal - Use TIER 3 Fallback end Note over BYBIT,RL_AGENT: Total Latency under 5ms Cache Hit, under 15ms Cache Miss

Feature Order Validation: Why It's Critical

▼

Production Issue Discovered: In November 2025, we discovered sklearn validates both feature names AND order. Mismatched order causes silent prediction errors—not exceptions. This is a well-documented sklearn behavior that can produce arbitrarily wrong predictions without any error message.

Our Solution: locked_features.json Artifact

Every model stores its exact feature order as an MLflow artifact:

{
  "model_id": "btcusdt_tl_week51",
  "training_date": "2025-12-22",
  "features": [
    "rsi_14_rank",
    "macd_signal_rank",
    "bb_width_rank",
    "atr_14_rank",
    "volume_ratio_rank",
    "momentum_20_rank",
    "obv_delta_rank",
    "dvol_btc_rank",
    "correlation_eth_rank"
  ],
  "feature_count": 9,
  "checksum": "sha256:a3f2..."
}

Validation at Inference Time

Download locked_features.json from MLflow artifact store
Reorder computed features to match exact training order
Checksum validation ensures no corruption
Fail fast if feature mismatch detected (no silent errors)

Result: Zero feature order incidents since implementation (November 2025). The locked_features.json artifact with checksum validation ensures feature order consistency across all deployments.

Live Trading Execution Pipeline (E2E <50ms)

Design Principle: The 4-tier fallback system (FULL_RL → BLENDED → PURE_KELLY → EMERGENCY_FLAT) ensures graceful degradation of position sizing. If ML signals degrade or RL agents fail, the system falls back to proven Kelly criterion sizing rather than halting entirely.

graph TB subgraph "Signal Input" ML_SIG["ML Signal
Predicted Return + Confidence"] IC_VAL["IC Validation
0.05 Threshold"] end subgraph "4-Tier RL Fallback System" TIER1["TIER 1: FULL_RL
Confidence >= 0.50, IC >= 0.05
100% RL Policy"] TIER2["TIER 2: BLENDED
Medium Confidence
50% RL + 50% Kelly"] TIER3["TIER 3: PURE_KELLY
Low Confidence or IC < 0.03
100% Kelly Baseline"] TIER4["TIER 4: EMERGENCY
Circuit Breaker OPEN
Minimum Position Only"] end subgraph "Risk Management" HRAA["HRAA v2
Position Size Capping"] CB["Circuit Breaker
Drawdown > 5%"] end subgraph "Order Execution" ORDER["Order Generation
Market/Limit"] BROKER["Bybit API
< 50ms E2E"] end ML_SIG --> IC_VAL IC_VAL -->|Pass| TIER1 IC_VAL -->|Fail| TIER3 TIER1 --> HRAA TIER2 --> HRAA TIER3 --> HRAA TIER4 --> HRAA HRAA --> CB CB -->|OK| ORDER CB -->|TRIP| TIER4 ORDER --> BROKER style TIER1 fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000 style TIER4 fill:#ff6b6b,stroke:#000,stroke-width:2px,color:#000 style CB fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000

Tier	Conditions	Position Sizing	Risk Profile	Target Risk Profile
TIER 1: FULL_RL	Confidence ≥ 0.50 IC ≥ 0.05	100% RL Policy	Highest return potential	Aggressive
TIER 2: BLENDED	Medium Confidence OR IC ≥ 0.03	50% RL + 50% Kelly	Balanced risk-reward	Balanced
TIER 3: PURE_KELLY	Low Confidence OR IC < 0.03	100% Kelly Baseline	Conservative, proven strategy	Conservative
TIER 4: EMERGENCY	Circuit Breaker OPEN Drawdown > 5%	0% Position Size	Capital preservation mode	Capital preservation

Regime-Adaptive Kelly Fractions

Market Regime	Kelly Fraction	Risk Multiplier (γ)	Typical Conditions
Bull	67%	γ = 1.5	Strong upward trends, low volatility
Neutral	50%	γ = 2.0	Range-bound markets, moderate volatility
Bear	25%	γ = 4.0	Downward trends, elevated volatility
Crisis	17%	γ = 6.0	Extreme volatility, market dislocation

Approach: The 4-tier fallback adapts position sizing based on signal confidence, market regime, and drawdown state. This is conceptually superior to fixed-fraction sizing but actual performance depends on model quality and market conditions. Backtested results showed improvement over fixed sizing in walk-forward validation.

Fully Automated Weekly Pipeline (73 Minutes)

Operational Excellence: The weekly pipeline automates all 8 steps from data fetch to production deployment in ~73 minutes with zero human intervention. Validation gates (IC ≥ 0.03, Sharpe > 0.5, p < 0.15) ensure only quality models reach production.

Data Fetch

~3 minutes

Fetch 1 week of new OHLCV bars (42 bars: 7 days × 6 bars/day) from Bybit for BTC, ETH, SOL. Includes DVOL volatility data from Deribit. Validates timestamp continuity (ERROR #103 fix).

Feature Engineering

~5 minutes

Compute 112 total features (56 raw + 56 rank), apply rank normalization, select 9-11 Boruta features per instrument. Lock feature order in JSON artifact for production consistency.

Transfer Learning Training (3 instruments)

~30 minutes (10min each)

Train TL models for BTC, ETH, SOL in parallel. Freeze 100 OLD trees, warm-start 50 NEW trees with 5x sample weighting on post-regime data. Walk-Forward Validation across 40 weekly windows with 200-bar purge gap.

Precalc Signal Generation

~5 minutes

Generate signals for last 200 bars using new models. Used for IC calculation and sanity checks. Validates model behavior on recent data.

RL Agent Training (3 policies)

~15 minutes (5min each with curriculum)

Train RL position sizing agents using curriculum learning (3 difficulty stages). Proximal Policy Optimization (PPO) with transaction cost model and slippage simulation. Curriculum learning (3 progressive difficulty stages) improves convergence and final policy quality.

Backtesting & Validation

~5 minutes

Run fast backtest mode (60x speedup via caching) on last 6 months. Calculate Sharpe ratio, hit rate, IC, maximum drawdown. Compare to previous model performance.

Validation Gates

~2 minutes

Deploy if ALL pass:
• IC ≥ 0.03 (information coefficient)
• Hit Rate ≥ 52% (directional accuracy)
• Sharpe > 0.5 (risk-adjusted return)
• p-value < 0.15 (statistical significance)
Rollback if ANY fail (keeps previous week's models in production)

Model Export & Deployment

~8 minutes

Export MLflow artifacts (models + metadata), build Docker container (319MB), push to GHCR, trigger K3S rolling update. Zero-downtime deployment with health checks. Total: 73 minutes from data fetch to production.

Business Continuity: If weekly pipeline fails (GitHub Actions outage, data provider issue), previous week's models remain in production. No manual intervention required. System automatically alerts via Prometheus → Grafana → Slack. Mean Time To Recovery (MTTR): <10 minutes for known issues.

🧠 ML/RL Intelligence Pipeline

🔄 Three Critical Data Pipelines

📊 Pipeline Deep Dive

Transfer Learning Training Pipeline (Every Sunday)

Why Transfer Learning Outperforms Traditional Retraining

Real-Time ML Inference Pipeline (<5ms Latency)

Feature Order Validation: Why It's Critical

Our Solution: locked_features.json Artifact

Validation at Inference Time

Live Trading Execution Pipeline (E2E <50ms)

Regime-Adaptive Kelly Fractions

Fully Automated Weekly Pipeline (73 Minutes)