You know the big problem with cloud computing? It is TOO EASY to spin servers. I have encounter problems where sys admin choke a huge bill on a POC cos he spins c3.x2large servers for a single static website, thinking that cost is only when users surf the website. For those who handles infrastructure, there is a temptation to just throw servers at the problem, cos “don’t worry it is cheap! only $1.50 per hour on so and so cloud service!”. Ya right, I wonder how often you patronise a Mr. Bean stall on an hourly basis for 2 weeks. It is like saying credit card is cheap because it cost interest is than 1% per day.
Therefore it is important that when you set up a HA architecture for your company, it is important that you take care of the architecture’s cost. After all you are the cost center to put it bluntly. Below are some of the guidelines I imposed on myself when setting up.
1) Rule No 1: System must not collapse. Rule No 2: Don’t forget rule No 1
Never ever ever let your production system collapse! Reason is that the uptime of the servers is equivalent to the cashflow to your company. Think about this. If your servers collapse, there will be no users using the app/service, which means no $$ to the company top line, which is a no brainer. Hence don’t ever ever let it collapse.
Before rolling any system out, make sure you do stress test and unit test multiple times under multiple conditions. Try to stress test your system under at least 20,000 rpm load to see if it can withstand the load. Also if possible, have human beta testers test under that load to put it to test under real conditions, so that you really know how fast/slow the app performs under tests.
2) Under provision app/web servers and over-provision database service.
Gone are the days where 2 web servers and 1 database is sufficient. you have situations where there are over 30-40 web servers and 3-4 RDS. From the start, try to underprovision web servers and overprovision database. Cos it is easy to scale up your web servers but it is harder to scale up your DB.
3) Push autoscaling of webapps to 70% at least
Don’t play safe to autoscale at 50% CPU. Try to push the autoscaling at 70%-80% CPU range. It gives a better bang of the buck while leaving some room for the CPU to go up further before another instance spun up and take the load.
4) Calculate the cost of the setup using the tools available.
Both AWS and GCE has their own calculators to help you estimate your cost. Hence it is good that you use these calculators to evaluate if your setup is bursting your budget. Also AWS has its billing alerts. Use it. Kinda disappointing that GCE does not have a similar service.
5) Negotiate with relevant stakeholders for a soft/silent launch
It is good to have a silent launch a week before the marketing push, so that you can test the production setup under actual usage. Use the opportunity to monitor for errors which you can fix quickly before the users start flowing in.
6) Work closely with the sales/marketing department on the number of users expected to use the app/services.
Most likely you will be like “Huh? why do I have to liaise with them? My work is overwhelming enough!”. Well, because they take care of the top line, and you take care of the bottom line, both groups are what we say in Chinese 唇亡齿寒 which means without the lips, the teeth will feel cold. So take the initiative, drop by their cubicle with a smile and say hello.
Also it would be good if you learn a thing or two about sales funnel concept which I am learning myself. It helps to guard yourself against the sales/marketing staff tendency to oversell the product/services, which in turn may cause you to over-provision your infrastructure, which in turn, will jack up the cost, and which in turn may mean lesser bonus for you. And yes, learning that skill or your interaction with the business / marketing department helps you a lot in the event you want to run your own start up.
In conclusion, the aspect of managing cost of an HA setup is actually more office politics / human interaction than hard core technical expertise. Do remember that you are a cost center and the management will always see you as such if you are handling the infrastructure. Just like you, I hate the sterotype/stigma. But lets put a positive spin in this environment. If you can combine your technical competency with cost management / marketing skills, aka second skilling, you will be able to put yourself in good employment positions.