FACT: Startups suck at scaling their web services..
I started thinking about this the other day. I 'think' it's because startups are more interested in building things rather than operating things.
Lijit has been reasonably good at delivering a scalable web service, before that the team had pretty good success at Raindance (our last gig), as well. I guess it wasn't always that way, as we had our share of problems, but over the last decade or so we figured out what to do and what not to do. As I thought about it, I thought it may make a good post.
Random thoughts on startups and their service delivery platforms (aka; you might be a green bean, if)
- Releasing your web service on a single computer is mental. What happens when that computer breaks? This is NOT an acceptable risk regardless of the level of investment in your company. You wouldn't release the service on a laptop, why are you releasing it on a single server? If you can't afford at least 3 or 4 boxes wait till you can. You only have once to make a first impression. For most services, 2 web/app servers and a live database + a slave backup database is the minimum. You can argue about this with me but I won't listen. If you have less than this, your company is a hobby – not a startup – you are a green bean.
- Network design is NOT connecting your servers to the network. If you give the network design about as much thought as the power connections, you are a green bean. You have to look high and low to find a startup that invests in a load-balancer, which is also mental. DNS Round Robin is NOT a load balancer. Go to Ebay and buy yourself a load balancing switch. This allows you to take a web/app server down and deal with issues without effecting users. And better yet, it allows your web servers to crash without anyone noticing. We have been long-time Foundry Networks hardware load balancing users. You can buy old models on EBay and its money well spent. Go find a network engineer to rent to help you design to your service expectations. Network gear is really reliable, so a rent-a-engineer can help you design and configure and then likely you can just let it run. I have also had a lot of success with letting the vendor help me design the network and then renting a friend to audit the design.
- Get a Blackberry. This is not a vanity item – Green bean. Register for a free monitoring service that will ping your site and email when things GET SLOW (not stop). In spite of your best efforts someone will notice something weird and email you. You want to know this immediately – not the next time you check your email. I can't believe the number of startups out there where dudes go hiking for a day or two and then come back and see things aren't working. If this has happened in your company – you are a green bean. AT LEAST, get a cheap old analog pager; they are a couple bucks a month.
- SUCCESS is not an excuse. I don't care that your site is very popular; if it's slow or doesn't work it sucks. I'm not going to buy the "we are so popular excuse" if you have one box. If you are slow and have one box you are a green bean and running a hobby on the side.
- Decouple your service from the Database whenever possible - seriously. Most every bad thing that happens to your site or service has something to do with connections to the database; not enough, too many, exhausted pool, memory leaks, slow queries, etc. When possible, build queues and flat files that can recover if you need to reboot the database box, or switch between a primary and backup database. Likely, not everything has to be written or read in real-time. Use this to your advantage – I'll take slightly stale data over no service, every day of the week.
- Never, ever - ever – ever – ever- use long running processes. Obviously you can't avoid the database but the rest you largely can. Use Apache to fork services with low lifetimes. Make a tee shirt that says "I hate long running processes". Everything leaks memory green bean.
- Make sure one person is responsible for keeping the service running. A wise man once told me, if everyone is responsible, no one is responsible. You can pass responsibility around – but only one person should be 'responsible' at a time. If you are 15 employees and don't have a dedicated "operations person" yet, you my friend are a green bean.
I'll give some more thoughts to this.. I'm sure there are more.. Please comment if you disagree or agree..
Note: The term "Green Bean" is a registered trademark of Jim Lejeal