Mar 17, 2023 Trading service failure report

Published on Mar 18, 2023Updated on Apr 12, 20243 min read

1. Impact and timelines

Between 8:39:00 AM and 09:28:15 AM UTC on March 17, 2023, OKX trading systems were partially or fully unavailable.
The timeline for this incident is detailed below:
08:39:00 AM UTC: Some OKX trading systems began raising intermittent alerts. Our monitoring systems immediately notified engineering and teams began investigations.
08:49:00 AM UTC: To ensure an orderly market, all trading was intentionally suspended, and an outage notification was prepared for release. IT and engineering teams were hard at work resolving the issue, having identified the root cause.
08:50:00 AM UTC: System outage notification published to the Status page.
09:18:15 AM UTC: Pre-open: canceling orders, placing/amending post-only orders, and transferring funds to trading accounts were enabled.
09:28:15 AM UTC: All trading services fully resumed.

2. Why did this downtime happen?

Servers underlying a core infrastructure component experienced unexpectedly high transient load due to a log process, causing resource exhaustion and the subsequent failure of the component. This in turn rendered downstream trading systems unable to service some requests. To maintain an orderly market, we suspended all trading services while our team worked on resolving the issue.

3. What steps are we taking to prevent similar disruptions in the future?

1). Scale and optimize the technical specs of the pertinent logs. For example, limiting log file size to prevent similar disruptions.
2). Improve internal monitoring systems and alert processes, including monitoring both server-end and client-end issues, to resolve similar issues before they occur or to timely resolve these issues before they cause any serious disruptions.
3). Improve system disruption processing procedures. We will retain complete records of the disruption for reconstruction and in-depth analyses so that we can adopt more comprehensive preventative measures against similar disruptions.

4. Our commitment to you

OKX is dedicated to providing an ultra-reliable, high-performance, and multi-functional platform for our valued customers. To this end, we will make continued efforts to optimize system performance, stability, and functionality. However, due to the complexity and challenges of running a high-performance trading system 24/7, 365 days a year, unexpected disruptions can occasionally arise.
We understand that timely communication is critical to our customers and that transparency is integral to building trust. In the event of any issues, we will notify our customers as quickly as possible via our official Telegram community channels, the System/Status API channel, and the Status page.

OKX team

March 20, 2023