🚀 Trade-Matrix

Production Cryptocurrency Trading Platform
Production-Ready | Event-Driven Architecture | Sub-5ms ML Inference
CI/CD Cost
$0/mo
CI/CD automation (GitHub Actions PRO free tier). Total infra: ~$96/mo including Azure VM.
CI/CD Free Tier
ML Inference Latency
<5ms
CPU-only sklearn inference (RF/XGBoost)
CPU-Only
Deployment Time
8min
Weekly model updates with zero downtime
CI/CD Optimized
Orchestration
K3S
Kubernetes orchestration with liveness/readiness probes and rolling updates
Self-Hosted
Monitoring Metrics
413+
71 base metric families × label cardinality (instrument, strategy, status)
Full Visibility
Model Training Time
65min
Transfer Learning with Walk-Forward Validation
Weekly Updates

⚖️ Technical Design Decisions

Design Philosophy: Trade-Matrix prioritizes architectural correctness and research rigor over scale. Every design decision is justified by quantitative research standards, not marketing claims.
Decision Trade-Matrix Choice Rationale
CI/CD Automation GitHub Actions (PRO free tier) $0/mo CI/CD cost within PRO limits (300/3,000 mins, 1.3/100GB bandwidth). Total infra ~$96/mo including Azure VM + storage.
ML Inference <5ms (CPU-only, sklearn RF/XGBoost) Lightweight models enable sub-5ms inference without GPU. Suitable for 4H bar frequency (6 inferences/day per instrument).
Model Updates Weekly automated (Transfer Learning) Balances adaptation speed with stability for mid-frequency crypto trading. Preserves old model knowledge via frozen trees.
Risk Management 4-Tier Fallback + Circuit Breaker Graceful degradation from RL → Kelly → Emergency. Drawdown > 5% triggers automatic position flattening.
Position Sizing RL-based with regime-adaptive Kelly baseline Adapts to 4 market regimes (Bear: 25%, Neutral: 50%, Bull: 67%, Crisis: 17% Kelly fraction).
Feature Selection Boruta Selection (9-13 features per instrument) Automated wrapper method selects all statistically significant features. Reduces overfitting vs. hand-picked feature sets.
Walk-Forward Validation 200-bar purge gap (40 weekly windows) Exceeds López de Prado's recommendation (h ≈ 0.01T ≈ 70 bars). Prevents data leakage between train/test folds.
Deployment GHCR-only (6.24GB base + 319MB models) Split architecture enables weekly model updates (319MB) without re-deploying base (6.24GB). Zero-downtime rolling updates.
Observability 413+ Prometheus time series + Loki logs 71 base metric families × label cardinality. 4 Grafana dashboards (Trading Cockpit, Market Analysis, Institutional Analytics, Infrastructure).
Current Constraints: Trade-Matrix operates on a single Azure B2als_v2 VM (2 vCPU, 4GB RAM, ~$36/mo). CPU-only inference, 4H bar frequency, 3 instruments (BTC, ETH, SOL), single exchange (Bybit). Architecture supports scaling to GPU inference, tick data, and multi-exchange—but current scope is intentionally constrained for R&D validation.

🏗️ System Architecture

High-Level System Components

📊
Data Ingestion Layer
Bybit Exchange Integration (live trading)
Deribit DVOL (volatility data)
3+ years historical OHLCV (2022-2025)
4-hour bar resolution for institutional-grade analysis
🤖
ML/RL Intelligence
Transfer Learning Models (BTC, ETH, SOL)
4-Tier RL Position Sizing
Regime Detection (4-state HMM)
Weekly automated updates with validation gates
🛡️
Risk Management
HRAA v2 Algorithm (hierarchical allocation)
Circuit Breaker (3-state FSM)
Position Limits (per-instrument)
Kelly Criterion Baseline (regime-adaptive)
Event-Driven Core
NautilusTrader Framework
MessageBus Architecture (sub-ms routing)
Real-time Order Management
Portfolio tracking with tick-level precision
💾
Data Storage
PostgreSQL 15 + TimescaleDB 2.14
Redis 7.2 (feature cache, pub/sub)
MinIO (ML artifacts, models)
MLflow (model registry, experiments)
📈
Monitoring Stack
Prometheus 2.48 (413+ metrics)
Grafana 10.2 (real-time dashboards)
Loki 2.9 (log aggregation)
30-day retention for forensic analysis
graph TB subgraph "Data Sources" BYBIT[Bybit Exchange
Live Trading] DERIBIT[Deribit
DVOL Volatility] HISTORICAL[Historical Data
2022-2025 OHLCV] end subgraph "Trade-Matrix Core Platform" subgraph "Intelligence Layer" ML[Transfer Learning Models
BTC/ETH/SOL] RL[RL Position Sizing
4-Tier Fallback] REGIME[Regime Detection
4-State HMM] end subgraph "Trading Engine" MSGBUS[MessageBus
Event Router] RISK[Risk Engine
HRAA v2 + Circuit Breaker] EXEC[Execution Engine
Order Management] PORTFOLIO[Portfolio Engine
Position Tracking] end subgraph "Data Layer" REDIS[(Redis 7.2
Cache & Pub/Sub)] POSTGRES[(PostgreSQL + TimescaleDB
Time Series Data)] MINIO[(MinIO
ML Artifacts)] MLFLOW[(MLflow
Model Registry)] end end subgraph "Infrastructure" K3S[K3S Cluster
Production Orchestration] GITHUB[GitHub Actions
CI/CD Pipeline] PROMETHEUS[Prometheus + Grafana
Monitoring Stack] end BYBIT -->|WebSocket| MSGBUS DERIBIT -->|API| ML HISTORICAL -->|Batch| ML MSGBUS --> ML ML --> RL RL --> RISK RISK --> EXEC EXEC --> PORTFOLIO MSGBUS <--> REDIS ML <--> MLFLOW PORTFOLIO --> POSTGRES MLFLOW <--> MINIO EXEC -->|Orders| BYBIT GITHUB -->|Deploy| K3S K3S -->|Runs| MSGBUS PROMETHEUS -->|Monitor| K3S style ML fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000 style RL fill:#00ff88,stroke:#000,stroke-width:2px,color:#000 style RISK fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000 style MSGBUS fill:#ff6b6b,stroke:#000,stroke-width:3px,color:#fff

Complete System Architecture

Real-time Data Flow
Batch/Historical Flow
Configuration/Control
Monitoring/Metrics
graph TB subgraph ExternalData["External Data Sources"] BYBIT_EX[Bybit Exchange - WebSocket] DERIBIT_EX[Deribit Exchange - DVOL] HISTORICAL_S3[Historical Data - MinIO] end subgraph NautilusTrader["NautilusTrader Core"] MSGBUS[MessageBus] DATAENGINE[DataEngine] RISKENGINE[RiskEngine - HRAA v2] EXECENGINE[ExecutionEngine] PORTFOLIO_ENG[PortfolioEngine] CACHE[Cache] CATALOG[DataCatalog] STRATEGIES[ML Strategies] end subgraph MLServices["ML Services"] ML_INFERENCE[Signal Generator] RL_AGENT[RL Position Sizer] REGIME_DETECT[Regime Detector] TL_TRAINER[TL Model Trainer] RL_TRAINER[RL Agent Trainer] FEATURE_ENG[Feature Engineer] end subgraph StorageLayer["Storage Layer"] REDIS[(Redis)] POSTGRES[(PostgreSQL)] MINIO[(MinIO)] MLFLOW_DB[(MLflow)] end subgraph Monitoring["Monitoring"] PROMETHEUS[Prometheus] GRAFANA[Grafana] LOKI[Loki] end subgraph Deployment["Deployment"] K3S[K3S Cluster] GHCR[GitHub Registry] GITHUB_ACTIONS[GitHub Actions] end BYBIT_EX --> DATAENGINE DATAENGINE --> MSGBUS MSGBUS --> CACHE CACHE --> ML_INFERENCE ML_INFERENCE --> RL_AGENT RL_AGENT --> STRATEGIES STRATEGIES --> MSGBUS MSGBUS --> RISKENGINE RISKENGINE --> EXECENGINE EXECENGINE --> BYBIT_EX BYBIT_EX --> PORTFOLIO_ENG HISTORICAL_S3 -.-> FEATURE_ENG DERIBIT_EX -.-> FEATURE_ENG FEATURE_ENG -.-> TL_TRAINER TL_TRAINER -.-> MLFLOW_DB MLFLOW_DB -.-> ML_INFERENCE FEATURE_ENG -.-> RL_TRAINER RL_TRAINER -.-> MLFLOW_DB MLFLOW_DB -.-> RL_AGENT CACHE -.-> REDIS PORTFOLIO_ENG -.-> POSTGRES CATALOG -.-> MINIO TL_TRAINER -.-> MINIO MSGBUS -.-> PROMETHEUS RISKENGINE -.-> PROMETHEUS ML_INFERENCE -.-> PROMETHEUS PROMETHEUS -.-> GRAFANA K3S -.-> LOKI GITHUB_ACTIONS --> GHCR GHCR --> K3S K3S --> MSGBUS style MSGBUS fill:#ff6b6b,stroke:#000,stroke-width:3px,color:#fff style ML_INFERENCE fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000 style RL_AGENT fill:#00ff88,stroke:#000,stroke-width:2px,color:#000 style RISKENGINE fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000

Component Details & Specifications

NautilusTrader Core Components

  • MessageBus: Event-driven routing between components. Pub/sub pattern for decoupled component communication with zero message loss guarantees.
  • DataEngine: Normalizes market data from multiple sources into unified format. Currently processes 4H bars from Bybit and Deribit DVOL.
  • RiskEngine: Implements HRAA v2 with per-instrument position limits, portfolio-level constraints, and circuit breaker integration.
  • ExecutionEngine: Order lifecycle management with fill tracking and reconciliation. Manages order submission, execution monitoring, and position updates.
  • PortfolioEngine: Real-time position tracking with mark-to-market PnL updates. Calculates Sharpe ratio, maximum drawdown, and other performance metrics on-the-fly.

ML/RL Services

  • Unified Signal Generator: Ensemble of 3 TL models (BTC, ETH, SOL) with 4-tier resilient loading. Sub-5ms inference via feature caching and optimized sklearn pipelines.
  • RL Position Sizer: Reinforcement Learning agent trained via curriculum learning. 4-tier fallback: FULL_RL → BLENDED (50/50 with Kelly) → PURE_KELLY → EMERGENCY_FLAT (0% on circuit breaker OPEN).
  • Regime Detector: 4-state Hidden Markov Model with Markov-Switching GARCH. Classifies market as Bear/Neutral/Bull/Crisis. Kelly fractions: 25%/50%/67%/17% respectively.
  • TL Model Trainer: Automated weekly training pipeline with Walk-Forward Validation (40 windows, 200-bar purge gap). Boruta feature selection locks 9-13 features per instrument to prevent overfitting.
  • RL Agent Trainer: Soft Actor-Critic (SAC) with curriculum learning. Trains in 45 minutes (vs 120 minutes without curriculum). Environment: Bybit 4H bars, transaction cost model, slippage simulation.

Storage Systems

  • Redis 7.2: Feature cache (TTL-based), pub/sub for ML signals, session persistence.
  • PostgreSQL 15 + TimescaleDB 2.14: Time-series storage for OHLCV bars, ML predictions, portfolio snapshots. Hypertable compression enabled.
  • MinIO: S3-compatible object store for ML models (200-500MB per model), training datasets, and backtest results. Organized by instrument and version.
  • MLflow: Model registry with lifecycle management (Staging → Production), experiment tracking and artifact versioning. Tag-based promotion workflow.

Monitoring Stack

  • Prometheus 2.48: Collects 413+ time series metrics (71 base families × instrument/strategy/status labels). Retention: 30 days. Scrape interval: 15 seconds.
  • Grafana 10.2: 4 specialized dashboards (Trading Cockpit, Market Analysis, Institutional Analytics, Infrastructure). Auto-refresh: 5 seconds.
  • Loki 2.9: Log aggregation with 30-day retention. Indexes: service, level, instrument, strategy. Query performance: <1s for 10M log lines via LogQL.

Hybrid Deployment Architecture

Cost Optimization: Trade-Matrix minimizes CI/CD automation costs to $0/month by leveraging GitHub PRO free tier. Total infrastructure cost is approximately $96/month (Azure VM ~$36, storage ~$10, electricity/internet ~$50).
📦
GitHub Container Registry (GHCR)
Base Image: 6.24GB (Python 3.12, dependencies, vendored NautilusTrader)
Model Layer: 319MB (TL models, RL policies, feature configs)
Total Size: 6.54GB combined
Update Frequency: Weekly models, monthly base
Bandwidth: 1.3GB/month (within PRO 100GB/month limit)
⚙️
GitHub Actions CI/CD
Weekly Pipeline: 73 minutes (training + deployment)
Compute Minutes: ~300/month (within PRO 3,000 limit)
Automation: 15-step validation pipeline
Zero Human Intervention
☸️
K3S Production Cluster
Orchestration: Lightweight Kubernetes (K3S 1.28)
Auto-scaling: Horizontal pod autoscaling
Health Checks: Liveness + readiness probes
Zero-Downtime: Rolling updates (max surge 1)
Azure VMSS Ephemeral Workers
Instance Type: Standard_D2s_v3 (8GB RAM, 2 vCPU)
Scaling: 0 → 1 for 15-25 min seeding tasks
Annual Savings: $418/year vs always-on instance
Use Case: Signal history pre-calculation
sequenceDiagram participant DEV as Developer/PM participant GITHUB as GitHub Actions participant GHCR as GitHub Container Registry participant K3S as K3S Production Cluster participant TRADE as Trading System Note over DEV,TRADE: Weekly Model Update Workflow (Every Sunday) DEV->>GITHUB: git push (trigger weekly pipeline) rect rgb(0, 50, 100) Note over GITHUB: Phase 1: Training (65 min) GITHUB->>GITHUB: Fetch data from Bybit GITHUB->>GITHUB: Feature engineering (Boruta) GITHUB->>GITHUB: Train TL models (3 instruments) GITHUB->>GITHUB: Train RL agents (curriculum) GITHUB->>GITHUB: Validate (IC at least 0.03, Sharpe over 0.5) end rect rgb(0, 100, 50) Note over GITHUB: Phase 2: Package Models (3 min) GITHUB->>GITHUB: Export MLflow artifacts GITHUB->>GITHUB: Build combined container (6.54GB) GITHUB->>GHCR: Push to GHCR (within free tier) end rect rgb(100, 50, 0) Note over K3S: Phase 3: Deployment (5 min) K3S->>GHCR: Pull new image (6.54GB, cached layers) K3S->>K3S: Rolling update (zero downtime) K3S->>TRADE: Deploy new trading pods TRADE->>TRADE: Health checks pass K3S->>TRADE: Route traffic to new pods K3S->>K3S: Terminate old pods end rect rgb(80, 0, 80) Note over K3S: Phase 3.5: Signal History Seeding (15-25 min, as needed) K3S->>K3S: Scale Azure VMSS 0→1 (Standard_D2s_v3) K3S->>TRADE: Run signal pre-calculation (200 bars) TRADE->>TRADE: Seed PostgreSQL with historical signals K3S->>K3S: Scale VMSS 1→0 (terminate worker) end TRADE-->>DEV: Deployment complete notification DEV->>K3S: Verify metrics (Grafana) Note over DEV,TRADE: Total Time: ~73 minutes | CI/CD Cost: $0 (GitHub Actions PRO)

Cost Breakdown vs Traditional Cloud Deployments

Trade-Matrix (GitHub PRO Optimization)

  • Compute: $0/month (300 mins/month ÷ 3,000 free mins = 10% utilization)
  • Container Storage: $0/month (1.5GB ÷ 100GB/month free = 3% utilization)
  • Bandwidth: $0/month (1.3GB ÷ 100GB/month free = 2.6% utilization)
  • CI/CD Total: $0/month
  • Infrastructure Total: ~$96/month (Azure VM ~$36, storage ~$10, electricity/internet ~$50)

Equivalent AWS Setup

  • EC2 Compute: t3.large (2 vCPU, 8GB RAM) × 2 = $120/month
  • EKS Cluster: Control plane = $73/month
  • ECR Storage: 10GB containers = $1/month
  • S3 + RDS: Storage + backups = $80/month
  • Data Transfer: 100GB/month = $9/month
  • CloudWatch: Monitoring + logs = $30/month
  • Total: $313/month ($3,756/year)

Equivalent GCP Setup

  • GCE Compute: n1-standard-2 × 2 = $100/month
  • GKE Cluster: Control plane = $73/month
  • Container Registry: 10GB = $2/month
  • Cloud Storage + SQL: = $90/month
  • Network Egress: 100GB/month = $12/month
  • Stackdriver: Monitoring + logs = $40/month
  • Total: $317/month ($3,804/year)

Annual Savings

~$2,700/year savings
vs equivalent AWS setup ($313/mo - $96/mo = $217/mo × 12)
Scalability Note: While current deployment minimizes costs (~$96/month total), the architecture is designed to scale to managed cloud infrastructure (AWS/GCP/Azure) if trading volume requires additional compute. The hybrid container strategy (large base + small models) remains optimal for bandwidth efficiency at any scale.

⚡ Real-Time Trading Workflow

sequenceDiagram participant BYBIT as Bybit Exchange participant DC as DataClient participant MB as MessageBus participant DE as DataEngine participant C as Cache participant ML as ML Inference (Sub-5ms) participant RL as RL Position Sizer (4-Tier) participant S as Strategy participant RE as RiskEngine (HRAA v2) participant CB as Circuit Breaker participant EE as ExecEngine participant P as Portfolio Note over BYBIT,P: Live Trading Flow (Typical Latency: <50ms end-to-end) BYBIT->>DC: Market Data (WebSocket) - BTC-USDT 4H Bar Close DC->>MB: Publish BarEvent MB->>DE: Route to DataEngine DE->>C: Update Cache par Feature Computation C->>ML: Extract Features (9-11 Boruta-selected) ML->>ML: Model Inference - 4-Tier Resilient Load ML->>ML: IC Validation (threshold >= 0.05) end ML->>RL: Signal + Confidence (e.g., BUY, conf=0.73) alt High Confidence 50+ AND High IC 05+ RL->>RL: TIER 1: FULL_RL - 100 percent RL Policy else Medium Confidence OR Medium IC RL->>RL: TIER 2: BLENDED - 50 percent RL + 50 percent Kelly else Low Confidence OR IC Failure RL->>RL: TIER 3: PURE_KELLY - 100 percent Kelly end RL->>CB: Check Circuit Breaker Status alt Circuit Breaker OPEN Drawdown over 5 percent CB->>RL: EMERGENCY_FLAT - 0 percent Position Size RL->>S: Flatten Position else Circuit Breaker CLOSED CB->>RL: OK RL->>S: Position Size (e.g., 15 percent capital) end S->>MB: Submit Order (Market/Limit) MB->>RE: Risk Validation RE->>RE: Check Position Limits - Per-Instrument + Portfolio RE->>RE: Calculate VaR Impact alt Risk Checks Pass RE->>MB: Order Approved MB->>EE: Execute Order EE->>BYBIT: Place Order BYBIT->>EE: Order Acknowledged BYBIT-->>EE: Fill Event EE->>MB: Broadcast Fill MB->>P: Update Position P->>P: Calculate PnL - Mark-to-Market else Risk Checks Fail RE->>MB: Order Rejected MB->>S: Rejection Notice end Note over BYBIT,P: Position Monitored in Real-Time for Circuit Breaker Triggers
Performance Notes: End-to-end latency from market data receipt to order placement averages <50ms, with ML inference contributing <5ms on CPU-only sklearn models. For a 4-hour bar trading strategy, latency is not a competitive differentiator—signal quality and risk management are the primary alpha sources.