Transfer Learning Training Pipeline (Every Sunday)
Methodology: Walk-Forward Validation with 200-bar purge gap prevents data leakage. This exceeds López de Prado's recommendation of h ≈ 0.01T (≈70 bars for our dataset). The purge gap is a well-established technique in quantitative finance research for preventing look-ahead bias.
graph TB
subgraph "Data Sources"
BYBIT_HIST["Bybit Historical
4H OHLCV 2022-2025
6,977+ bars"]
DERIBIT_DVOL["Deribit DVOL
Volatility Index
Real-time + Historical"]
end
subgraph "Feature Engineering"
RAW_FEATURES["Raw Features
112 Total - 56 Raw + 56 Rank"]
RANK_NORM["Rank Normalization
Quintile Transform"]
BORUTA["Boruta Selection
9-11 Features/Instrument"]
LOCKED_FEATURES["Locked Feature Order
Production Consistency"]
end
subgraph "Walk-Forward Validation"
WFV["40 Weekly Windows
200-Bar Purge Gap"]
TRAIN_WINDOW["Training Window
In-Sample Data"]
VAL_WINDOW["Validation Window
Out-of-Sample Data"]
PURGE_GAP["Purge Gap
Prevent Lookahead"]
end
subgraph "Transfer Learning - Per Instrument"
OLD_MODEL["OLD Model
Dynamic Tree Count - Frozen"]
NEW_TREES["NEW Trees
50-250 - Grid Search"]
SAMPLE_WEIGHT["Exponential Recency Weighting
decay_lambda=0.005"]
TL_MODEL["Final TL Model
BTC/ETH/SOL"]
end
subgraph "Validation Gates"
IC_CHECK["IC at least 0.05
TL Training Gate"]
HITRATE_CHECK["Hit Rate at least 52 pct
Directional Accuracy"]
SHARPE_CHECK["Sharpe at least 0.5
Risk-Adjusted Return"]
DEPLOY_DECISION["Deploy or Rollback"]
end
subgraph "Model Registry"
MLFLOW["MLflow Registry
Experiment Tracking"]
MINIO["MinIO Storage
Model Artifacts 319MB"]
PROD_TAG["Production Tag
Auto-Promotion"]
end
BYBIT_HIST --> RAW_FEATURES
DERIBIT_DVOL --> RAW_FEATURES
RAW_FEATURES --> RANK_NORM
RANK_NORM --> BORUTA
BORUTA --> LOCKED_FEATURES
LOCKED_FEATURES --> WFV
WFV --> TRAIN_WINDOW
WFV --> VAL_WINDOW
WFV --> PURGE_GAP
TRAIN_WINDOW --> OLD_MODEL
OLD_MODEL --> NEW_TREES
NEW_TREES --> SAMPLE_WEIGHT
SAMPLE_WEIGHT --> TL_MODEL
TL_MODEL --> IC_CHECK
IC_CHECK --> HITRATE_CHECK
HITRATE_CHECK --> SHARPE_CHECK
SHARPE_CHECK --> DEPLOY_DECISION
DEPLOY_DECISION -->|Pass| MLFLOW
DEPLOY_DECISION -->|Fail| OLD_MODEL
MLFLOW --> MINIO
MINIO --> PROD_TAG
style TL_MODEL fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000
style BORUTA fill:#00ff88,stroke:#000,stroke-width:2px,color:#000
style DEPLOY_DECISION fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000
| Aspect |
Transfer Learning (Trade-Matrix) |
Full Retraining (Alternative Approach) |
Advantage |
| Knowledge Retention |
Dynamic tree count frozen from OLD model |
Starts from scratch every week |
✓ Preserves patterns from 3+ years of data |
| Adaptation Speed |
50-250 new trees (grid search) + exponential recency weighting |
Slow convergence on new regimes |
✓ Faster adaptation to new data |
| Training Stability |
Warm-started from previous model |
Random initialization each time |
✓ Consistent performance week-over-week |
| Catastrophic Forgetting |
Prevented by frozen trees |
Risk of losing historical patterns |
✓ Robust to short-term market noise |
| Computational Efficiency |
Only trains 50-250 new trees (grid-searched) |
Trains 150+ trees from scratch |
✓ 65min total vs ~180min for full retraining |
Design Rationale: Transfer Learning enables faster adaptation to regime shifts by preserving historical patterns in frozen trees while training new trees on recent data. This is particularly valuable in volatile crypto markets where market regimes can shift rapidly.
Real-Time ML Inference Pipeline (<5ms Latency)
Critical Production Issue Fixed: ERROR #102 and #103 (bar continuity failures) were root-caused and fixed in December 2025. Gap detection now prevents data holes that could cause stale feature computation and incorrect signals.
sequenceDiagram
participant BYBIT as Bybit Exchange
participant GAP_DET as Gap Detection
participant CACHE as Feature Cache
participant FEAT_ENG as Feature Engineering
participant MODEL_LOAD as Model Loader
participant ML_INF as ML Inference
participant IC_VAL as IC Validator
participant RL_AGENT as RL Position Sizer
Note over BYBIT,RL_AGENT: Real-Time Inference - Every 4H Bar Close
BYBIT->>GAP_DET: New 4H Bar - 2025-01-05 00:00
rect rgb(100, 50, 0)
Note over GAP_DET: Gate 1 PRE-BOOTSTRAP: Check Last 200 Bars
GAP_DET->>GAP_DET: Detect Missing Bars - 00:00 UTC convention
alt Gap Found
GAP_DET->>GAP_DET: Severity: CRITICAL/MINOR
GAP_DET->>BYBIT: Fetch Missing Bars
Note right of GAP_DET: ERROR 102 Fix: Sequential Startup
end
end
GAP_DET->>CACHE: Check Feature Cache
alt Cache Hit
CACHE->>FEAT_ENG: Return Cached Features
else Cache Miss
CACHE->>FEAT_ENG: Compute Features
FEAT_ENG->>FEAT_ENG: 56 Raw Indicators
FEAT_ENG->>FEAT_ENG: Rank Normalization
FEAT_ENG->>FEAT_ENG: Select Boruta 9-11
FEAT_ENG->>CACHE: Store - TTL 1h
end
FEAT_ENG->>MODEL_LOAD: Request Model - BTC/ETH/SOL
rect rgb(0, 50, 100)
Note over MODEL_LOAD: 4-Tier Resilient Loading
MODEL_LOAD->>MODEL_LOAD: Tier 1: MLflow Registry - Production Tag
alt Tier 1 Fails
MODEL_LOAD->>MODEL_LOAD: Tier 2: Run ID Fallback
end
alt Tier 2 Fails
MODEL_LOAD->>MODEL_LOAD: Tier 3: Direct S3
end
alt Tier 3 Fails
MODEL_LOAD->>MODEL_LOAD: Tier 4: Local Checkpoint
end
end
MODEL_LOAD->>ML_INF: Model + locked_features.json
rect rgb(0, 100, 50)
Note over ML_INF: Sub-5ms Inference
ML_INF->>ML_INF: Validate Feature Order - CRITICAL sklearn checks
ML_INF->>ML_INF: Model.predict - Regression Output
ML_INF->>ML_INF: Generate Signal + Confidence
end
ML_INF->>IC_VAL: Signal + Confidence
IC_VAL->>IC_VAL: Calculate Rolling IC - 20-bar window
alt IC at least 0.03
IC_VAL->>RL_AGENT: Valid Signal - High Quality
else IC below 0.03
IC_VAL->>IC_VAL: Degrade to Kelly Baseline
IC_VAL->>RL_AGENT: Degraded Signal - Use TIER 3 Fallback
end
Note over BYBIT,RL_AGENT: Total Latency under 5ms Cache Hit, under 15ms Cache Miss
Production Issue Discovered: In November 2025, we discovered sklearn validates both feature names AND order. Mismatched order causes silent prediction errors—not exceptions. This is a well-documented sklearn behavior that can produce arbitrarily wrong predictions without any error message.
Our Solution: locked_features.json Artifact
Every model stores its exact feature order as an MLflow artifact:
{
"model_id": "btcusdt_tl_week51",
"training_date": "2025-12-22",
"features": [
"rsi_14_rank",
"macd_signal_rank",
"bb_width_rank",
"atr_14_rank",
"volume_ratio_rank",
"momentum_20_rank",
"obv_delta_rank",
"dvol_btc_rank",
"correlation_eth_rank"
],
"feature_count": 9,
"checksum": "sha256:a3f2..."
}
Validation at Inference Time
- Download locked_features.json from MLflow artifact store
- Reorder computed features to match exact training order
- Checksum validation ensures no corruption
- Fail fast if feature mismatch detected (no silent errors)
Result: Zero feature order incidents since implementation (November 2025). The locked_features.json artifact with checksum validation ensures feature order consistency across all deployments.
Live Trading Execution Pipeline (E2E <50ms)
Design Principle: The 4-tier fallback system (FULL_RL → BLENDED → PURE_KELLY → EMERGENCY_FLAT) ensures graceful degradation of position sizing. If ML signals degrade or RL agents fail, the system falls back to proven Kelly criterion sizing rather than halting entirely.
graph TB
subgraph "Signal Input"
ML_SIG["ML Signal
Predicted Return + Confidence"]
IC_VAL["IC Validation
0.05 Threshold"]
end
subgraph "4-Tier RL Fallback System"
TIER1["TIER 1: FULL_RL
Confidence >= 0.50, IC >= 0.05
100% RL Policy"]
TIER2["TIER 2: BLENDED
Medium Confidence
50% RL + 50% Kelly"]
TIER3["TIER 3: PURE_KELLY
Low Confidence or IC < 0.03
100% Kelly Baseline"]
TIER4["TIER 4: EMERGENCY
Circuit Breaker OPEN
Minimum Position Only"]
end
subgraph "Risk Management"
HRAA["HRAA v2
Position Size Capping"]
CB["Circuit Breaker
Drawdown > 5%"]
end
subgraph "Order Execution"
ORDER["Order Generation
Market/Limit"]
BROKER["Bybit API
< 50ms E2E"]
end
ML_SIG --> IC_VAL
IC_VAL -->|Pass| TIER1
IC_VAL -->|Fail| TIER3
TIER1 --> HRAA
TIER2 --> HRAA
TIER3 --> HRAA
TIER4 --> HRAA
HRAA --> CB
CB -->|OK| ORDER
CB -->|TRIP| TIER4
ORDER --> BROKER
style TIER1 fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000
style TIER4 fill:#ff6b6b,stroke:#000,stroke-width:2px,color:#000
style CB fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000
| Tier |
Conditions |
Position Sizing |
Risk Profile |
Target Risk Profile |
| TIER 1: FULL_RL |
Confidence ≥ 0.50 IC ≥ 0.05 |
100% RL Policy |
Highest return potential |
Aggressive |
| TIER 2: BLENDED |
Medium Confidence OR IC ≥ 0.03 |
50% RL + 50% Kelly |
Balanced risk-reward |
Balanced |
| TIER 3: PURE_KELLY |
Low Confidence OR IC < 0.03 |
100% Kelly Baseline |
Conservative, proven strategy |
Conservative |
| TIER 4: EMERGENCY |
Circuit Breaker OPEN Drawdown > 5% |
0% Position Size |
Capital preservation mode |
Capital preservation |
Regime-Adaptive Kelly Fractions
| Market Regime |
Kelly Fraction |
Risk Multiplier (γ) |
Typical Conditions |
| Bull |
67% |
γ = 1.5 |
Strong upward trends, low volatility |
| Neutral |
50% |
γ = 2.0 |
Range-bound markets, moderate volatility |
| Bear |
25% |
γ = 4.0 |
Downward trends, elevated volatility |
| Crisis |
17% |
γ = 6.0 |
Extreme volatility, market dislocation |
Approach: The 4-tier fallback adapts position sizing based on signal confidence, market regime, and drawdown state. This is conceptually superior to fixed-fraction sizing but actual performance depends on model quality and market conditions. Backtested results showed improvement over fixed sizing in walk-forward validation.
Fully Automated Weekly Pipeline (73 Minutes)
Operational Excellence: The weekly pipeline automates all 8 steps from data fetch to production deployment in ~73 minutes with zero human intervention. Validation gates (IC ≥ 0.03, Sharpe > 0.5, p < 0.15) ensure only quality models reach production.
1
Data Fetch
~3 minutes
Fetch 1 week of new OHLCV bars (42 bars: 7 days × 6 bars/day) from Bybit for BTC, ETH, SOL. Includes DVOL volatility data from Deribit. Validates timestamp continuity (ERROR #103 fix).
2
Feature Engineering
~5 minutes
Compute 112 total features (56 raw + 56 rank), apply rank normalization, select 9-11 Boruta features per instrument. Lock feature order in JSON artifact for production consistency.
3
Transfer Learning Training (3 instruments)
~30 minutes (10min each)
Train TL models for BTC, ETH, SOL in parallel. Freeze 100 OLD trees, warm-start 50 NEW trees with 5x sample weighting on post-regime data. Walk-Forward Validation across 40 weekly windows with 200-bar purge gap.
4
Precalc Signal Generation
~5 minutes
Generate signals for last 200 bars using new models. Used for IC calculation and sanity checks. Validates model behavior on recent data.
5
RL Agent Training (3 policies)
~15 minutes (5min each with curriculum)
Train RL position sizing agents using curriculum learning (3 difficulty stages). Proximal Policy Optimization (PPO) with transaction cost model and slippage simulation. Curriculum learning (3 progressive difficulty stages) improves convergence and final policy quality.
6
Backtesting & Validation
~5 minutes
Run fast backtest mode (60x speedup via caching) on last 6 months. Calculate Sharpe ratio, hit rate, IC, maximum drawdown. Compare to previous model performance.
7
Validation Gates
~2 minutes
Deploy if ALL pass:
• IC ≥ 0.03 (information coefficient)
• Hit Rate ≥ 52% (directional accuracy)
• Sharpe > 0.5 (risk-adjusted return)
• p-value < 0.15 (statistical significance)
Rollback if ANY fail (keeps previous week's models in production)
8
Model Export & Deployment
~8 minutes
Export MLflow artifacts (models + metadata), build Docker container (319MB), push to GHCR, trigger K3S rolling update. Zero-downtime deployment with health checks. Total: 73 minutes from data fetch to production.
Business Continuity: If weekly pipeline fails (GitHub Actions outage, data provider issue), previous week's models remain in production. No manual intervention required. System automatically alerts via Prometheus → Grafana → Slack. Mean Time To Recovery (MTTR): <10 minutes for known issues.