Many of our systems at work retrieve data from third-party websites through API calls. When these fail, gaps are left in the data, leading to inaccurate reports. To mitigate this risk, the systems are designed to automatically repeat unsuccessful calls. Typically, APIs control how many and how often attempts can be made within a given time period. To avoid exceeding these limits, we exponentially increase the interval between each call. For example, if the first attempt does not work, a system waits for 5 seconds before the second attempt, 10 seconds before the third attempt, 20 seconds before the fourth, and so on.
Recently, whilst reviewing application logs, we noticed that the intervals were irregular. Between retries, the durations were 15 seconds, 30 seconds, 60 seconds, then 0 seconds, instead of the expected 5 seconds, 10 seconds, 20 seconds, and 40 seconds. This pattern was puzzling, especially since the C# code appeared to be correct.
int durationInMs = RetryIntervalInMs * (2 ^ retryCounter);
The whole team came together to investigate but could not find the cause. We went into the week-end without a solution.
That same week-end, in the middle of writing C code for an embedded program, as I was typing a call to the pow() function, I had an epiphany.
The ^ operator in C# is not used to raise a value to a given power, contrary to what we had assumed in our code. Instead, it performs an XOR operation. All of us had our languages mixed up—some remembering BASIC, others SQL. Astonishing!
On the following Monday, after cursing ourselves and having a good laugh, we quickly fixed the bug.
int durationInMs = RetryIntervalInMs * (int)Math.Pow(2, retryCounter);