Machine learning pipelines can be notoriously fragile. A single, monolithic training script that handles everything from data fetching to model deployment is a common sight, but it's also a single point of failure. When a network glitch interrupts a download or a GPU runs out of memory halfway through, the entire process often crashes, forcing you to start over. Debugging is a nightmare, and recovering from failure is a manual, error-prone effort.
What if we could build AI and ML workflows with the same reliability and precision as modern distributed systems? The key is to stop thinking in monolithic scripts and start thinking in atomic steps.
This is where action.do comes in. It provides a simple, powerful execution layer to build robust, reliable workflows by composing them from single-purpose, atomic actions. It's the fundamental building block for bringing the "Business-as-Code" philosophy to your AI pipelines.
At its core, an atomic action is a single, indivisible operation. It operates on an all-or-nothing basis: it either completes successfully, or it fails entirely, leaving no messy partial state behind. This concept is the bedrock of reliable systems.
In the context of a model training pipeline, this could be:
By breaking a complex process down into these granular actions, you gain unprecedented control and visibility.
action.do is a platform designed to do one thing exceptionally well: execute, audit, and repeat atomic actions. When you build your AI workflow on this foundation, you unlock three critical properties.
Each action is a self-contained unit. If the train-epoch action fails due to a hardware issue, the preceding preprocess-data action remains complete and valid. You don't have to start from scratch; you can simply investigate and retry the failed step.
Every action executed via action.do is a verifiable event. The platform logs what action was called, what parameters were used, and whether it succeeded or failed. This creates an immutable audit trail, making it easy to trace the lineage of a model, debug failures, and ensure compliance. No more guessing which script parameters were used for a specific training run.
Why is idempotency so important? Idempotency ensures that executing the same action multiple times with the same parameters has the same effect as executing it once. Imagine your connection drops while saving a model checkpoint. With an idempotent save-checkpoint action, you can safely retry the operation without fear of corrupting your storage or creating duplicate, conflicting model files. It's crucial for building self-healing systems that can recover from transient failures without causing unintended side effects.
Let's move beyond theory. Imagine building a training loop. Instead of a giant function, you can compose it from atomic actions using the .do SDK.
import { Do } from '@do-sdk/core';
// Initialize the .do client with your API key
const a = new Do(process.env.DO_API_KEY);
// Define the parameters for this training run
const trainingRunId = 'resnet-v2-run-123';
const datasetVersion = 'imagenet-subset-v1.2';
for (let i = 1; i <= 10; i++) {
  console.log(`Starting Epoch ${i}...`);
  // Execute an atomic training epoch
  const { result, error } = await a.action.execute({
    name: 'train-epoch',
    params: {
      runId: trainingRunId,
      datasetVersion: datasetVersion,
      epochNumber: i
    }
  });
  if (error) {
    console.error(`Epoch ${i} failed:`, error);
    // The workflow can now decide to halt, notify an admin, or retry.
    break;
  } else {
    console.log(`Epoch ${i} Succeeded. Accuracy: ${result.accuracy}`);
  }
}
Now, if Epoch 5 fails, the error object contains detailed context. The loop stops, but the results and checkpoints from Epochs 1 through 4 are safely stored and accounted for. You can debug the specific failure of Epoch 5 without having to re-run the entire process.
While action.do represents the individual steps or building blocks, this is only part of the story. A complete machine learning pipeline is an orchestrated sequence of these actions. This is where workflow.do comes in.
A workflow.do is a sequence or graph of action.do executions that, together, achieve a larger business outcome—like fully training, evaluating, and deploying a new model. You compose complex workflows from simple, reusable, and reliable atomic actions.
The true power of the .do platform is its flexibility. You aren't limited to a pre-canned set of operations. You can define your own custom actions by wrapping your existing Python scripts, microservices, or serverless functions.
Have a custom data augmentation script? Wrap it in an augment-data action. A specialized model evaluation function? Turn it into an evaluate-custom-metrics action. This allows you to turn your existing business and machine learning logic into reusable, auditable, and idempotent components that can be snapped together to build powerful agentic workflows.
By embracing atomic actions, you transform fragile, opaque scripts into robust, observable, and resilient AI systems. Start building your next AI pipeline one atomic step at a time.