Automation in Real Life Looks Nothing Like the Demo
Every automation demo is clean. The data comes in correctly formatted. The API responds on the first call. The edge cases don't exist. The error states are never triggered. The result is elegant, fast, and satisfying to watch.
Production is different. In production, the barcode scanner sends a duplicate. The API returns a 429 at 2am. The record that was supposed to be a vinyl album turns out to be a promotional item with no catalog number and three conflicting identifiers on three different databases. The automation you built for a clean world now has to survive an unclean one.
This is the gap most automation projects fall into. Not because the builder was careless, but because the demo and the deployment are solving different problems.
The real work is in the edge cases
When I built the warehouse automation system for vinyl records, the happy path took about two weeks to design and implement. The next three months were spent on everything else. What happens when a record has no barcode? What happens when Discogs returns three pressing variants and we need to pick the right one? What happens when the image capture fails? What happens when eBay rejects a listing because the title is one character too long?
None of these edge cases are interesting to demo. All of them are critical to operate. A system that handles 80% of cases automatically and falls over on the remaining 20% hasn't reduced the work β it's redistributed it onto the worst possible cases.
Error handling is a design decision, not an afterthought
The question isn't whether your automation will encounter errors. It will. The question is what the system does when it does. Does it fail silently? Does it retry indefinitely? Does it route the failed item to a human review queue with enough context to resolve it quickly? Does it alert someone β and if so, who, and through what channel?
Good error handling design means deciding these questions before the first production failure, not after. It means building the retry logic, the fallback paths, and the notification system as first-class parts of the pipeline β not as patches applied after something breaks in production.
Data quality is an upstream problem
Most automation failures are data quality problems wearing automation clothes. The pipeline didn't break β the data it received didn't match the assumptions the pipeline was built on. A field that was supposed to be a number is a string. A required field is missing. A date is in a format that wasn't anticipated.
The solution isn't to make the automation more tolerant of bad data. It's to address data quality at the source β validation at ingestion, normalization as a first step, and explicit handling for records that don't meet the minimum standard for automated processing.
The demo is a hypothesis. The system is the proof.
I've stopped showing demos that don't include the error states. Not because I want to make things look harder than they are, but because the error states are where the real design decisions live. Anyone can build a happy path. The question is whether the system can survive contact with reality.
That's the difference between automation that works in a presentation and automation that works at 2am when nobody is watching.
