Twenty-five years of bad deploys
Every architect has the story. Here are mine, with the lessons attached.
The first one
It was 2003. I was twenty-six. The deploy script was a shell script. The shell script had a typo. The typo deleted the production database.
We restored from backup. The backup was three hours old. The CEO did not yell. He looked at me and said: "what are we going to do differently?" That is the question. That has always been the question.
The pattern
Every bad deploy I have ever been part of fits one of three patterns:
1. The human factor. Someone was tired. Someone was rushed. Someone did not read the runbook. The fix is process, not technology.
2. The system factor. The system permitted a state that should have been impossible. The fix is technology, not process.
3. The political factor. Someone shipped without telling the right people. The fix is culture, not technology or process.
The mistake junior architects make is treating all three as the same problem. They are not.
The lesson
The job of the architect is to make the system forgiving. Not to make humans perfect. Humans will not be perfect. The system has to absorb their imperfection.
Filed in Career Development
Tags: Career · Lessons · Reliability