High-Level System Components
📊
Data Ingestion Layer
Bybit Exchange Integration (live trading)
Deribit DVOL (volatility data)
3+ years historical OHLCV (2022-2025)
4-hour bar resolution for institutional-grade analysis
🤖
ML/RL Intelligence
Transfer Learning Models (BTC, ETH, SOL)
4-Tier RL Position Sizing
Regime Detection (4-state HMM)
Weekly automated updates with validation gates
🛡️
Risk Management
HRAA v2 Algorithm (hierarchical allocation)
Circuit Breaker (3-state FSM)
Position Limits (per-instrument)
Kelly Criterion Baseline (regime-adaptive)
⚡
Event-Driven Core
NautilusTrader Framework
MessageBus Architecture (sub-ms routing)
Real-time Order Management
Portfolio tracking with tick-level precision
💾
Data Storage
PostgreSQL 15 + TimescaleDB 2.14
Redis 7.2 (feature cache, pub/sub)
MinIO (ML artifacts, models)
MLflow (model registry, experiments)
📈
Monitoring Stack
Prometheus 2.48 (413+ metrics)
Grafana 10.2 (real-time dashboards)
Loki 2.9 (log aggregation)
30-day retention for forensic analysis
graph TB
subgraph "Data Sources"
BYBIT[Bybit Exchange
Live Trading]
DERIBIT[Deribit
DVOL Volatility]
HISTORICAL[Historical Data
2022-2025 OHLCV]
end
subgraph "Trade-Matrix Core Platform"
subgraph "Intelligence Layer"
ML[Transfer Learning Models
BTC/ETH/SOL]
RL[RL Position Sizing
4-Tier Fallback]
REGIME[Regime Detection
4-State HMM]
end
subgraph "Trading Engine"
MSGBUS[MessageBus
Event Router]
RISK[Risk Engine
HRAA v2 + Circuit Breaker]
EXEC[Execution Engine
Order Management]
PORTFOLIO[Portfolio Engine
Position Tracking]
end
subgraph "Data Layer"
REDIS[(Redis 7.2
Cache & Pub/Sub)]
POSTGRES[(PostgreSQL + TimescaleDB
Time Series Data)]
MINIO[(MinIO
ML Artifacts)]
MLFLOW[(MLflow
Model Registry)]
end
end
subgraph "Infrastructure"
K3S[K3S Cluster
Production Orchestration]
GITHUB[GitHub Actions
CI/CD Pipeline]
PROMETHEUS[Prometheus + Grafana
Monitoring Stack]
end
BYBIT -->|WebSocket| MSGBUS
DERIBIT -->|API| ML
HISTORICAL -->|Batch| ML
MSGBUS --> ML
ML --> RL
RL --> RISK
RISK --> EXEC
EXEC --> PORTFOLIO
MSGBUS <--> REDIS
ML <--> MLFLOW
PORTFOLIO --> POSTGRES
MLFLOW <--> MINIO
EXEC -->|Orders| BYBIT
GITHUB -->|Deploy| K3S
K3S -->|Runs| MSGBUS
PROMETHEUS -->|Monitor| K3S
style ML fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000
style RL fill:#00ff88,stroke:#000,stroke-width:2px,color:#000
style RISK fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000
style MSGBUS fill:#ff6b6b,stroke:#000,stroke-width:3px,color:#fff
Complete System Architecture
graph TB
subgraph ExternalData["External Data Sources"]
BYBIT_EX[Bybit Exchange - WebSocket]
DERIBIT_EX[Deribit Exchange - DVOL]
HISTORICAL_S3[Historical Data - MinIO]
end
subgraph NautilusTrader["NautilusTrader Core"]
MSGBUS[MessageBus]
DATAENGINE[DataEngine]
RISKENGINE[RiskEngine - HRAA v2]
EXECENGINE[ExecutionEngine]
PORTFOLIO_ENG[PortfolioEngine]
CACHE[Cache]
CATALOG[DataCatalog]
STRATEGIES[ML Strategies]
end
subgraph MLServices["ML Services"]
ML_INFERENCE[Signal Generator]
RL_AGENT[RL Position Sizer]
REGIME_DETECT[Regime Detector]
TL_TRAINER[TL Model Trainer]
RL_TRAINER[RL Agent Trainer]
FEATURE_ENG[Feature Engineer]
end
subgraph StorageLayer["Storage Layer"]
REDIS[(Redis)]
POSTGRES[(PostgreSQL)]
MINIO[(MinIO)]
MLFLOW_DB[(MLflow)]
end
subgraph Monitoring["Monitoring"]
PROMETHEUS[Prometheus]
GRAFANA[Grafana]
LOKI[Loki]
end
subgraph Deployment["Deployment"]
K3S[K3S Cluster]
GHCR[GitHub Registry]
GITHUB_ACTIONS[GitHub Actions]
end
BYBIT_EX --> DATAENGINE
DATAENGINE --> MSGBUS
MSGBUS --> CACHE
CACHE --> ML_INFERENCE
ML_INFERENCE --> RL_AGENT
RL_AGENT --> STRATEGIES
STRATEGIES --> MSGBUS
MSGBUS --> RISKENGINE
RISKENGINE --> EXECENGINE
EXECENGINE --> BYBIT_EX
BYBIT_EX --> PORTFOLIO_ENG
HISTORICAL_S3 -.-> FEATURE_ENG
DERIBIT_EX -.-> FEATURE_ENG
FEATURE_ENG -.-> TL_TRAINER
TL_TRAINER -.-> MLFLOW_DB
MLFLOW_DB -.-> ML_INFERENCE
FEATURE_ENG -.-> RL_TRAINER
RL_TRAINER -.-> MLFLOW_DB
MLFLOW_DB -.-> RL_AGENT
CACHE -.-> REDIS
PORTFOLIO_ENG -.-> POSTGRES
CATALOG -.-> MINIO
TL_TRAINER -.-> MINIO
MSGBUS -.-> PROMETHEUS
RISKENGINE -.-> PROMETHEUS
ML_INFERENCE -.-> PROMETHEUS
PROMETHEUS -.-> GRAFANA
K3S -.-> LOKI
GITHUB_ACTIONS --> GHCR
GHCR --> K3S
K3S --> MSGBUS
style MSGBUS fill:#ff6b6b,stroke:#000,stroke-width:3px,color:#fff
style ML_INFERENCE fill:#00d4ff,stroke:#000,stroke-width:2px,color:#000
style RL_AGENT fill:#00ff88,stroke:#000,stroke-width:2px,color:#000
style RISKENGINE fill:#ffd93d,stroke:#000,stroke-width:2px,color:#000
NautilusTrader Core Components
- MessageBus: Event-driven routing between components. Pub/sub pattern for decoupled component communication with zero message loss guarantees.
- DataEngine: Normalizes market data from multiple sources into unified format. Currently processes 4H bars from Bybit and Deribit DVOL.
- RiskEngine: Implements HRAA v2 with per-instrument position limits, portfolio-level constraints, and circuit breaker integration.
- ExecutionEngine: Order lifecycle management with fill tracking and reconciliation. Manages order submission, execution monitoring, and position updates.
- PortfolioEngine: Real-time position tracking with mark-to-market PnL updates. Calculates Sharpe ratio, maximum drawdown, and other performance metrics on-the-fly.
ML/RL Services
- Unified Signal Generator: Ensemble of 3 TL models (BTC, ETH, SOL) with 4-tier resilient loading. Sub-5ms inference via feature caching and optimized sklearn pipelines.
- RL Position Sizer: Reinforcement Learning agent trained via curriculum learning. 4-tier fallback: FULL_RL → BLENDED (50/50 with Kelly) → PURE_KELLY → EMERGENCY_FLAT (0% on circuit breaker OPEN).
- Regime Detector: 4-state Hidden Markov Model with Markov-Switching GARCH. Classifies market as Bear/Neutral/Bull/Crisis. Kelly fractions: 25%/50%/67%/17% respectively.
- TL Model Trainer: Automated weekly training pipeline with Walk-Forward Validation (40 windows, 200-bar purge gap). Boruta feature selection locks 9-13 features per instrument to prevent overfitting.
- RL Agent Trainer: Soft Actor-Critic (SAC) with curriculum learning. Trains in 45 minutes (vs 120 minutes without curriculum). Environment: Bybit 4H bars, transaction cost model, slippage simulation.
Storage Systems
- Redis 7.2: Feature cache (TTL-based), pub/sub for ML signals, session persistence.
- PostgreSQL 15 + TimescaleDB 2.14: Time-series storage for OHLCV bars, ML predictions, portfolio snapshots. Hypertable compression enabled.
- MinIO: S3-compatible object store for ML models (200-500MB per model), training datasets, and backtest results. Organized by instrument and version.
- MLflow: Model registry with lifecycle management (Staging → Production), experiment tracking and artifact versioning. Tag-based promotion workflow.
Monitoring Stack
- Prometheus 2.48: Collects 413+ time series metrics (71 base families × instrument/strategy/status labels). Retention: 30 days. Scrape interval: 15 seconds.
- Grafana 10.2: 4 specialized dashboards (Trading Cockpit, Market Analysis, Institutional Analytics, Infrastructure). Auto-refresh: 5 seconds.
- Loki 2.9: Log aggregation with 30-day retention. Indexes: service, level, instrument, strategy. Query performance: <1s for 10M log lines via LogQL.
Hybrid Deployment Architecture
Cost Optimization: Trade-Matrix minimizes CI/CD automation costs to $0/month by leveraging GitHub PRO free tier. Total infrastructure cost is approximately $96/month (Azure VM ~$36, storage ~$10, electricity/internet ~$50).
📦
GitHub Container Registry (GHCR)
Base Image: 6.24GB (Python 3.12, dependencies, vendored NautilusTrader)
Model Layer: 319MB (TL models, RL policies, feature configs)
Total Size: 6.54GB combined
Update Frequency: Weekly models, monthly base
Bandwidth: 1.3GB/month (within PRO 100GB/month limit)
⚙️
GitHub Actions CI/CD
Weekly Pipeline: 73 minutes (training + deployment)
Compute Minutes: ~300/month (within PRO 3,000 limit)
Automation: 15-step validation pipeline
Zero Human Intervention
☸️
K3S Production Cluster
Orchestration: Lightweight Kubernetes (K3S 1.28)
Auto-scaling: Horizontal pod autoscaling
Health Checks: Liveness + readiness probes
Zero-Downtime: Rolling updates (max surge 1)
⚡
Azure VMSS Ephemeral Workers
Instance Type: Standard_D2s_v3 (8GB RAM, 2 vCPU)
Scaling: 0 → 1 for 15-25 min seeding tasks
Annual Savings: $418/year vs always-on instance
Use Case: Signal history pre-calculation
sequenceDiagram
participant DEV as Developer/PM
participant GITHUB as GitHub Actions
participant GHCR as GitHub Container Registry
participant K3S as K3S Production Cluster
participant TRADE as Trading System
Note over DEV,TRADE: Weekly Model Update Workflow (Every Sunday)
DEV->>GITHUB: git push (trigger weekly pipeline)
rect rgb(0, 50, 100)
Note over GITHUB: Phase 1: Training (65 min)
GITHUB->>GITHUB: Fetch data from Bybit
GITHUB->>GITHUB: Feature engineering (Boruta)
GITHUB->>GITHUB: Train TL models (3 instruments)
GITHUB->>GITHUB: Train RL agents (curriculum)
GITHUB->>GITHUB: Validate (IC at least 0.03, Sharpe over 0.5)
end
rect rgb(0, 100, 50)
Note over GITHUB: Phase 2: Package Models (3 min)
GITHUB->>GITHUB: Export MLflow artifacts
GITHUB->>GITHUB: Build combined container (6.54GB)
GITHUB->>GHCR: Push to GHCR (within free tier)
end
rect rgb(100, 50, 0)
Note over K3S: Phase 3: Deployment (5 min)
K3S->>GHCR: Pull new image (6.54GB, cached layers)
K3S->>K3S: Rolling update (zero downtime)
K3S->>TRADE: Deploy new trading pods
TRADE->>TRADE: Health checks pass
K3S->>TRADE: Route traffic to new pods
K3S->>K3S: Terminate old pods
end
rect rgb(80, 0, 80)
Note over K3S: Phase 3.5: Signal History Seeding (15-25 min, as needed)
K3S->>K3S: Scale Azure VMSS 0→1 (Standard_D2s_v3)
K3S->>TRADE: Run signal pre-calculation (200 bars)
TRADE->>TRADE: Seed PostgreSQL with historical signals
K3S->>K3S: Scale VMSS 1→0 (terminate worker)
end
TRADE-->>DEV: Deployment complete notification
DEV->>K3S: Verify metrics (Grafana)
Note over DEV,TRADE: Total Time: ~73 minutes | CI/CD Cost: $0 (GitHub Actions PRO)
Trade-Matrix (GitHub PRO Optimization)
- Compute: $0/month (300 mins/month ÷ 3,000 free mins = 10% utilization)
- Container Storage: $0/month (1.5GB ÷ 100GB/month free = 3% utilization)
- Bandwidth: $0/month (1.3GB ÷ 100GB/month free = 2.6% utilization)
- CI/CD Total: $0/month
- Infrastructure Total: ~$96/month (Azure VM ~$36, storage ~$10, electricity/internet ~$50)
Equivalent AWS Setup
- EC2 Compute: t3.large (2 vCPU, 8GB RAM) × 2 = $120/month
- EKS Cluster: Control plane = $73/month
- ECR Storage: 10GB containers = $1/month
- S3 + RDS: Storage + backups = $80/month
- Data Transfer: 100GB/month = $9/month
- CloudWatch: Monitoring + logs = $30/month
- Total: $313/month ($3,756/year)
Equivalent GCP Setup
- GCE Compute: n1-standard-2 × 2 = $100/month
- GKE Cluster: Control plane = $73/month
- Container Registry: 10GB = $2/month
- Cloud Storage + SQL: = $90/month
- Network Egress: 100GB/month = $12/month
- Stackdriver: Monitoring + logs = $40/month
- Total: $317/month ($3,804/year)
Annual Savings
~$2,700/year savings
vs equivalent AWS setup ($313/mo - $96/mo = $217/mo × 12)
Scalability Note: While current deployment minimizes costs (~$96/month total), the architecture is designed to scale to managed cloud infrastructure (AWS/GCP/Azure) if trading volume requires additional compute. The hybrid container strategy (large base + small models) remains optimal for bandwidth efficiency at any scale.