Building Bulletproof Periodic Tasks
Most modern software systems require execution of periodic tasks.
For example, “Trigger generation of customer monthly invoices each 1st of month at 9AM”.
One venerable tool for handling that on Unix-like operating systems is cron, software that’s now 50 years old.
But these tasks can also be scheduled by Celery beat on Python stacks, Dagster orchestration jobs, or plenty of other solutions.
Here are two critical principles for creating reliable scheduled tasks.
Idempotency ♻️
Any periodic task should be designed to be relaunched multiple times without undesirable behaviour.
It implies enforcing these kinds of rules when interacting with external systems:
- With a SQL database, we could ensure a record does not exist before inserting it, or we can gracefully handle the SQL error which happens on a unique index violation.
- With an external mail server, store somewhere the fact that an email has already been sent to a given user.
- With a remote API, keep track of successful calls and distinguish between retryable failures (like network timeouts) and permanent ones (like validation errors).
Most importantly, ensure the code does not crash or create multiple data records if executed multiple times.
Why is it useful? 🤔
- If the task orchestrator was down at the time where the task should have been executed, it can be relaunched later without fear.
- If the task has crashed and you fixed it, you can relaunch it, even if half of the operations have already been done (maybe SQL records had been created, but mails could not have been sent).
- You can also execute the task before the scheduled time if it makes sense to do it for business reasons.
Increased Execution Frequency 🚤
Any periodic task should be scheduled more frequently than necessary.
Of course it requires idempotency on these tasks.
For example, the task in charge of sending customer’s invoices could be run every day at 9AM instead of only the 1st of month.
Then, in normal circumstances, nothing will be done by the task except the 1st of month.
So, why do such a thing? 🤔
- If the orchestrator was down at the time where the task had to execute, the task will be done the next day even if nobody notices the problem or is able to intervene.
- It allows you to detect a bug sooner after a code change, for example the next day in our example, instead of the next 1st of month.
- Finally, it forces you to ensure tasks are idempotent.
In practice, make sure these frequent executions do not involve a significant load of your system, nor create tons of unwanted logs.
Conclusion
By embracing both idempotency and increased execution frequency, you transform fragile scheduled tasks into resilient system components. These principles not only protect against infrastructure failures but also improve maintainability and reduce operational stress. Remember: the most reliable periodic tasks are those designed to gracefully handle the unexpected.
By Thomas Martin
Follow me or comment