Weeks 10-12: Paper Trading Testing, Bug Fixes & Architectural Refinements

~5 min read

Implementation Overview

Weeks 10-12 marked a critical phase of stabilisation and refinement for ZephyrApex, focusing on extensive paper trading testing, comprehensive bug fixing, and key architectural changes. I transitioned from IBKR to Alpaca as my primary execution broker, resolved persistent MySQL lock contention issues, implemented the pending order supervisor flow for reliable order management, and significantly enhanced system observability with improved logging and monitoring.

What I Built

Technical Core

  • Pending Order Supervisor Flow: I implemented comprehensive pending order management with account ID invariants, ghost order detection, and automated resubmission for failed orders
  • Alpaca Broker Integration: I transitioned from IBKR to Alpaca as the primary execution broker, including adapter implementation, price normalisation, and bracket order handling
  • MySQL Lock Contention Resolution: I added Redis-based distributed locking mechanisms to resolve database concurrency issues and prevent race conditions in account state management
  • Observability Enhancements: I improved system monitoring with enhanced logging, audit trails, error tracking via Sentry, and performance telemetry for database operations
  • Paper Trading Testing Suite: I conducted extensive QA testing including fuzz testing, load testing, and performance validation with automated smoke test scripts
  • Bug Fixes & Data Quality: I resolved order handling issues, implemented data cleansing for alternative data streams, fixed UI inconsistencies, and improved error handling across the stack
  • Trade Alert Automation: I created CLI script for generating X (Twitter) trade alerts with visual icons and character limits
  • Regime Visibility & Sentiment Updates: I enhanced market regime detection and batch sentiment data processing for improved decision-making

Code Highlight

This week's focus: implementing the pending order supervisor with distributed locking to ensure reliable order resubmission and prevent ghost orders.

    async def _run_loop(self) -> None:
        lock_key = "global:pending_order_supervisor"
        while self.is_running:
            settings = get_settings()
            interval_seconds = max(
                1,
                int(getattr(settings, "pending_order_supervisor_interval_seconds", 5)),
            )
            lease_seconds = max(
                30,
                int(getattr(settings, "pending_order_supervisor_lease_seconds", 120)),
            )
            lock_timeout = max(30, lease_seconds)
            cycle_started_at = datetime.now(timezone.utc)
            self._last_run_at = cycle_started_at
            cycle_lock_skips = 0

            try:
                if self.lock_manager:
                    async with self.lock_manager.lock(lock_key, timeout=lock_timeout):
                        cycle_summary = (
                            await self._sync_worker._resubmit_unsent_pending_orders()
                        )
                else:
                    cycle_summary = (
                        await self._sync_worker._resubmit_unsent_pending_orders()
                    )
                self._cycles_completed += 1
                self._last_success_at = datetime.now(timezone.utc)
                self._last_error = None
                self._last_cycle_summary = {
                    "claimed_count": int(cycle_summary.get("claimed_count", 0)),
                    "submitted_count": int(cycle_summary.get("submitted_count", 0)),
                    "failed_count": int(cycle_summary.get("failed_count", 0)),
                    "retryable_count": int(cycle_summary.get("retryable_count", 0)),
                    "skipped_count": int(cycle_summary.get("skipped_count", 0)),
                    "error_count": int(cycle_summary.get("error_count", 0)),
                }
            except LockAcquisitionError:
                self._lock_skips += 1
                cycle_lock_skips += 1
                self._record_lock_skip(datetime.now(timezone.utc))
                logger.info(
                    "Skipped pending-order supervisor cycle because lock is already held",
                    extra={"lock_key": lock_key},
                )
            except Exception as e:

Architecture Decision

Broker Selection: IBKR vs Alpaca

After extensive testing and debugging of IBKR integration issues, I made the strategic decision to switch to Alpaca as my primary execution broker. Alpaca offered more reliable API stability, better documentation, and simpler authentication compared to IBKR's complex TWS integration. This change reduced operational complexity while maintaining full order type support (market, limit, stop, bracket orders) and real-time execution capabilities. The transition involved implementing Alpaca-specific price normalisation rules and client order ID generation to ensure compliance with their API constraints.

Testing Results

Confidence Check

All unit tests pass after extensive fixes and additions, covering critical paper trading scenarios:

  • Distributed locking mechanisms and MySQL contention resolution
  • Alpaca broker adapter integration and order submission
  • Pending order supervisor flow and resubmission logic
  • Observability enhancements and error handling
  • Paper trading engine performance and reliability

Next Steps

With paper trading thoroughly tested and architectural foundations solidified, the focus shifts to live equity trading deployment. I'll monitor Alpaca execution performance, validate the pending order supervisor in production, and continue enhancing observability for real-time trading operations.


Follow @therealkamba on X for regular updates. View all posts →