procrastination diagram

Extreme System Administration

About a year ago, after having been exposed to Extreme Programming at my new job, and observing how it spilled over into system administration practices, I scrawled out an outline for a manifesto in a paper notebook. I remember googling for "Extreme System Administration" and being disappointed by the number of relevant hits I got, but it seems I should have been searching for "Agile System Administration".

It turns out that this is sort of thing is happening more from the developer direction, and seems to have turned into a movement called DevOps (thanks, Adam), although the grumpy sysadmin in me grimly suspects that some of these developer types (which I'm being paid to be at the moment, I should mention) are going to find the hard-real-time aspects of the job a bit of a surprise. Also, from the perspective of someone who prefers the phrase "Systems Programmer" on his business cards, it has seemed to me that a good sysadmin was already sort of between the pure-development and pure-operations cultures.

So anyway, in the spirit of "The useful things we were doing, let's do more of that"...

Obviously, from XP/Agile:

  • pairing/review

    A second pair of eyes should be nigh-mandatory

  • planning game/customer communication

    perhaps kanban?

  • daily standup meetings
  • refactor mercilessly/you aren't gonna need it/DON'T REPEAT YOURSELF
  • everything possible in version control

    (really, the key thing is that everything needs to be reversible)

  • testing/continuous integration

    (perhaps cruise control for your installer in a VM)

  • test, test, test

    If you're not testing it, you don't know it works

But also:

  • AUTOMATE

    If you do something twice, automate it the second time.

  • MONITOR

    If you're not monitoring it, it's down

  • TICKET

    If you're not keeping track of it, you'll forget it

  • INVENTORY

    If you don't know where it is, you don't have it

Emergencies, once dealt with, are an opportunity to ask

  • How could we have noticed this faster?
  • How can we make this impossible or correct it automatically?
  • How do we test for this?

User communications/Ticketing:

  • Support vs. Trouble

    These are distinct processes and need to be handled differently.

  • Support is a customer relationship thing:

    You need to figure out what the question is and answer it

  • Trouble means something needs doing
  • Tickets have potential many/many relationships
    • Many people may have the same problem, and the same problem may be several different problems
    • Potentially exposing to users that other users are having similar tickles the tension between privacy and transparency, on one hand, users will be embarrassed, on the other hand, encouraging them to help each other is always good.
  • Every support ticket should end with "This was answer [link]" or "This is now answer [link]"
  • Users should be walked past the current problem board on the way to complaining; possibly given a "I am also having this problem" checkbox

What Problem Do You Think You're Solving?

  • This is something that needs to be have a clear answer for every task

Values

  • Repeatability
  • Transparency
  • Privacy
  • Communication
  • Accountability
  • no ad-hoc techniques
  • Rational Security (realistic threat models)
  • The Plural of Anecdote Is Not Data

Transparency vs. Communication

  • Transparency

    What your users find when they go looking

  • Communication

    How you tell users things or how you find things out from users

This is obviously very rough, and is an only slightly cleaned up version of my notes. I don't actually claim that this is new, or interesting, but it's something that is more useful outside my head than inside.

Also, I really need to get these posts done before midnight...

I buy into everything displayed above but the "put everything into source control". I never, EVER want to see binary files anywhere in source control.

If your version control can't handle it sanely, you need better version control. But what I was trying
to say, you need to be able to come back six months
later, figure out what you did, and hopefully roll
it back. And you need to do that the same way for
everything. I have a tool that looks about correct, right here...

(Also, this is a manifesto, not precise instructions)

Creative Commons License
This work by Karl Ramm is licensed under a Creative Commons Attribution 3.0 United States License.