From Thursday, January 9 to Tuesday, January 14, 2025, Ashby’s scheduling features allowed users and candidates to book over a subset of recurring calendar events (events without an end date). The issue impacted all scheduling features: direct booking, manual scheduling, and auto-scheduling.
Approximately 6.6% of interviews scheduled during this timeframe conflicted with existing recurring events.
Why did this happen?
Ashby keeps an up-to-date record of users' calendar events in order to power our scheduling features. To speed up these features, an automated task regularly deletes these events when they are no longer relevant for scheduling. An update to this task was deployed on January 9. This update contained a bug that deleted all recurring events without a specified end date.
How did we resolve the situation?
On Thursday, January 9, 2025 at 3:06AM PST, we deployed the change containing the bug. At 6:05PM PST, the affected code ran as scheduled and deleted events that should not have been deleted.
On Friday, January 10 at 8:19AM PST, we became aware of the issue. By 10:28am PST, we had identified the steps required to restore the missing events and began that process.
By Saturday, January 11 at 1:14AM PST, we had restored what we then thought were all missing events for all customers.
On Monday, January 13 at 6:16AM PST, we received new reports of double-booked interviews. By 2:58PM PST, we had identified an issue in how we had restored missing events, impacting 2.8% of application users.
By 5:11AM PST on January 14, we had fixed the root cause and started restoring the remaining events. By 8:30AM PST, this restoration was finished.
What have we put in place to prevent it from happening in the future?
For the specific automated task that deletes events, we have implemented a method for allowing near-immediate restoration of deleted calendar events. In addition, we have updated our QA processes to be more comprehensive for changes to this automated task.
As part of the postmortem we conducted after the incident was fully resolved, we determined that better tooling could have allowed us to release this update to a small subset of customers before rolling it out further. While we do this for most features, we do not have it for these automated tasks. We are currently working on that tooling.