In a perfect world, every API call returns a 200 OK, every network connection is stable, and every third-party service has 100% uptime. But we don't build for a perfect world; we build for the real one. In the real world, things fail. The true test of a robust automation system isn't whether it works when everything goes right, but how it behaves when things go wrong.
This is where orchestration shines. While an action.do is the perfect atomic building block for a single task, a workflow.do is the conductor that makes sure the whole symphony plays on, even if one instrument hits a sour note. It's designed with failure in mind, providing powerful tools for advanced error handling and intelligent retry logic right out of the box.
Imagine a standard user-onboarding script. It’s a simple, linear process:
What happens if the payment gateway (Step 2) is temporarily down? The script crashes. You're left with a user in your database but no payment, no CRM entry, and no welcome email. Your data is in an inconsistent state, and your new user is left in limbo. This brittle approach is a maintenance nightmare and a poor user experience.
Robust automation requires more than just executing steps; it requires managing state, handling exceptions, and defining what to do when the unexpected happens.
workflow.do elevates your automation from a fragile script to a resilient, self-healing service. Instead of just failing, it allows you to define a clear, coded policy for dealing with transient errors.
The most common and effective strategy for handling temporary outages is a "retry with exponential backoff." The idea is simple: if an action fails, wait a bit and try again. If it fails again, wait longer before the next attempt. This gives the failing service time to recover without overwhelming it with constant requests.
Let's see how you can implement this within a workflow.do definition. Building on our onboarding scenario, we can wrap the fallible payment action in a workflow.retry block.
import { workflow, action } from '@do-sdk/core';
import { createUser, chargeFee, sendWelcomeEmail } from './actions'; // Your predefined actions
const userSignupWorkflow = workflow.create({
id: 'user-signup-workflow',
description: 'Handles new user signups with resilient payment processing.',
execute: async ({ userDetails }) => {
// This step is unlikely to fail, but could also be wrapped in a try/catch
const user = await createUser.execute(userDetails);
try {
// Attempt the payment action with a built-in retry policy
const paymentResult = await workflow.retry(
() => chargeFee.execute({ userId: user.id, amount: 29.99 }),
{
retries: 3, // Attempt up to 3 times AFTER the initial failure
delay: 1000, // Wait 1 second before the first retry
backoffFactor: 2 // Double the wait time on each subsequent retry (1s, 2s, 4s)
}
);
} catch (error) {
// This block executes only if all retries fail
console.error(`Permanent payment failure for user ${user.id}:`, error);
// Execute a defined fallback plan
await sendPaymentFailedNotification.execute({ email: userDetails.email });
// Fail the workflow gracefully, leaving the system in a known state
throw new Error('Payment processing failed permanently.');
}
// This part only runs if the 'try' block succeeds
await sendWelcomeEmail.execute({
email: userDetails.email,
name: userDetails.name
});
return { success: true, userId: user.id, status: 'active' };
}
});
In this example, if the chargeFee action fails, the workflow doesn’t immediately crash. It automatically waits one second and tries again. If that fails, it waits two seconds, then four. Only after the initial attempt and all three retries have failed does it enter the catch block. This simple addition transforms a brittle process into one that can automatically recover from temporary service glitches.
Retries are for transient errors. But what about permanent ones? A try/catch block within workflow.do is your tool for defining explicit business logic for handling unrecoverable failures.
This is where you turn a catastrophic error into a manageable business event. Instead of the process simply dying, you can codify your response:
By defining these paths in code, you are practicing Business as Code. Your company's policies for handling exceptions are no longer just a paragraph in a dusty handbook; they are living, breathing, and executable parts of your automated services.
This level of resilience is non-negotiable for building sophisticated agentic workflows. An autonomous agent tasked with managing customer subscriptions can't call a developer every time a payment API times out. It needs the intelligence to handle these situations on its own.
Stop writing scripts that break. Start building resilient Services-as-Software that gracefully handle the inevitable failures of the real world. With the advanced error handling and retry logic in workflow.do, you can build automations that you can trust to run, recover, and scale.