In a perfect world, every API call would succeed, every database write would commit, and every piece of our automated workflows would run flawlessly. But we don't code in a perfect world. We code in the real world, where networks are fickle, services have downtime, and unexpected inputs are a fact of life. For any serious automation, success isn't defined by the absence of errors, but by how gracefully we handle them when they inevitably occur.
This is especially true when building with atomic actions. At action.do, we see an atomic action as the fundamental building block of any robust agentic workflow. It's a single, indivisible task that either succeeds completely or fails completely, preventing dangerous, partially-completed states.
But what happens when an action does fail? Stopping the entire process isn't always the right answer. A well-designed error-handling strategy is the difference between a brittle script and a resilient, production-grade automation. Let's explore the essential strategies for managing atomic action failures.
The "atomic" nature of an action.do action is its superpower. It guarantees that you won't, for example, charge a customer's credit card but fail to create their user account. The action as a whole fails.
This is a fantastic starting point, but it's not the end of the story. Your workflow orchestrator now needs to know what to do with that failure signal. Should it:
Choosing the right strategy depends entirely on the context of the action. Thoughtful error handling turns your workflow automation from a simple sequence of tasks into an intelligent, self-healing system.
When an atomic action throws an error, you have several powerful strategies at your disposal.
Sometimes, the safest and most sensible thing to do is to stop everything immediately. If an action is a critical dependency for every subsequent step, there's no point in continuing.
// Inside an action handler...
try {
const payment = await paymentGateway.charge(amount, card);
return { success: true, transactionId: payment.id };
} catch (error) {
console.error('Critical payment failure:', error);
// This will stop the workflow immediately.
throw new Error('Payment processing failed.');
}
Many errors are transient. A temporary network hiccup, an API rate limit, or a momentary service outage can cause an action to fail. In these cases, simply trying again after a short delay is often all that's needed. Exponential backoff is a sophisticated retry strategy where you increase the wait time between each subsequent retry.
What if a primary service fails, but you have a backup plan? Graceful degradation allows your workflow to continue with a slightly reduced or different functionality, which is often far better than failing entirely.
import { action } from '@do-sdk/core';
import { premiumEmailService, basicSmtpService } from './services';
export const sendWelcomeEmail = action({
name: 'send-welcome-email',
description: 'Sends a welcome email to a new user, with a fallback.',
inputs: {
to: { type: 'string', required: true },
name: { type: 'string', required: true }
},
handler: async ({ inputs }) => {
const { to, name } = inputs;
try {
// Attempt 1: Use the premium service
const result = await premiumEmailService.send({ to, name });
return { success: true, service: 'premium', messageId: result.id };
} catch (error) {
console.warn('Premium email service failed. Attempting fallback...');
// Attempt 2: Use the basic service
try {
const fallbackResult = await basicSmtpService.send({ to, name });
// The action succeeded, just not in the ideal way.
return { success: true, service: 'fallback', messageId: fallbackResult.id };
} catch (fallbackError) {
console.error('All email services failed.');
// If the fallback also fails, then we Fail Fast.
throw new Error(`Unable to send welcome email to ${to}.`);
}
}
},
});
For tasks that are important but not critically urgent, a Dead-Letter Queue is an excellent pattern. If an action fails even after several retries, instead of terminating the workflow or losing the task forever, you can move the task's data to a separate queue (the DLQ).
Errors are not just problems to be fixed; they are opportunities to make your systems more resilient. By thinking through failure scenarios at the level of the atomic action, you can build intelligent, self-healing agentic workflows that are robust by design.
The action.do framework encourages this a-priori thinking. By encapsulating logic into small, testable, and reusable actions, you can apply the right error-handling strategy for each specific task, leading to more predictable and reliable automation.
Ready to move beyond brittle scripts? Get started with action.do and turn your complex processes into simple, repeatable, and resilient tasks.