I am involved in some official project where I got a chance to set up web infrastructure. This was the first time I did any kind of system administration work so it has been an adventure.
Here are some lessons I learned as I did and continue to do this work.
Be ready to invest lots of time reading up documentation. For web applications:
Having this knowledge handy can make recovery easier and less stressful. For example, data errors are especially costly and a bug might make you desperate as you are looking for some magical band-aid on StackOverflow in the middle of an outage.
Common thinking goes something like “We don’t have time to read documentation, let’s release our product first”. And this is fine if you want to throw away your code and your company in a couple of years. If you are in for some long-term project though, the errors you commit here pile on and on. Think of it as a high leverage investment. Reading up about this stuff will help to deal with emergencies much easier because you have the knowledge of internal workings on your fingertips and you already know a few failure modes of the system.
Scenario: I just ran a DELETE
SQL query without qualifying WHERE
what to do?!
Oh, just stop the database, do a Point-In-Time-Recovery (PITR) and restart it
again. This will cause downtime but still better than losing data.
What I mean by that is to use the most solid, middle of the road technologies possible. This point has some nuances. The risk you can afford depends on the component you are talking about. If we are talking about a tool that measures system resources like CPU, RAM, and NIC traffic, you can probably experiment a bit. But for something like your primary data store, you absolutely can’t afford to take risks because if your data gets corrupted/deleted the recovery might be impossible. The idea here is the less the cost of your failures, the more risk you can afford.
While I was doing infra, some people suggested Docker. And Docker looks good, judging by its hype. I didn’t use it because I don’t know how docker replaces processes and how security works in Docker (Docker might not be replacing processes at all).
Knowing how kernel operates lets you know
Another less concrete benefit is it can make you spend less money on your infra. This is particularly important for companies that hope to make a profit anytime soon. And contrary to many people’s opinion, saving money on infra is not time-consuming. It comes a lot from intuitive judgment as you get more and more familiar with this subject.
Having your infrastructure automated using tools like Chef, Puppet, or Ansible brings these benefits.
If you have your infra automated, the only hard work you will do is whenever you
are setting up some component that is different from your existing infra.
Another way to say that is, going from 0
to 1
is difficult but going from
1
to n
is smooth as silk.
This is obvious to anyone who has written a Hello, World
program. But it’s
still pretty important to have a comprehensive logging system and it brings
in some benefits.
The more verbose your logs are, the better. The ideal set up is having logs sent to a different centralized server with a beefy disk, and have all logs time ordered. Just writing this made my mouth water (yeah, I haven’t done it yet).
As a culture, we like to burn our midnight oil. Here’s the deadline, few more steps and you will be done. Or maybe your boss has forced you in lockdown mode. Or maybe the company will sink if you can’t get this done in the next couple of hours.
It might still be fine if you are debugging something. However, doing the infra stuff while you are tired can be catastrophic.
Even if you are doing all the nice things like automating and versioning your infra, it is hard to review. In my experience, reviewing infra code is harder then application code. And infra bugs are harder to find in running systems and are much more costly.
In my opinion, doing it while your mind is functioning at an optimal level shows your respect towards infra work.
I still have a long way to go I will definitely think about these things differently as I get more experienced. But having my thoughts persisted somewhere will help in retrospection later.