FACT: Startups suck at scaling their web services..
I started thinking about this the other day. I 'think' it's because startups are more interested in building things rather than operating things.
Lijit has been reasonably good at delivering a scalable web service, before that the team had pretty good success at Raindance (our last gig), as well. I guess it wasn't always that way, as we had our share of problems, but over the last decade or so we figured out what to do and what not to do. As I thought about it, I thought it may make a good post.
Random thoughts on startups and their service delivery platforms (aka; you might be a green bean, if)
- Releasing your web service on a single computer is mental. What happens when that computer breaks? This is NOT an acceptable risk regardless of the level of investment in your company. You wouldn't release the service on a laptop, why are you releasing it on a single server? If you can't afford at least 3 or 4 boxes wait till you can. You only have once to make a first impression. For most services, 2 web/app servers and a live database + a slave backup database is the minimum. You can argue about this with me but I won't listen. If you have less than this, your company is a hobby – not a startup – you are a green bean.
- Network design is NOT connecting your servers to the network. If you give the network design about as much thought as the power connections, you are a green bean. You have to look high and low to find a startup that invests in a load-balancer, which is also mental. DNS Round Robin is NOT a load balancer. Go to Ebay and buy yourself a load balancing switch. This allows you to take a web/app server down and deal with issues without effecting users. And better yet, it allows your web servers to crash without anyone noticing. We have been long-time Foundry Networks hardware load balancing users. You can buy old models on EBay and its money well spent. Go find a network engineer to rent to help you design to your service expectations. Network gear is really reliable, so a rent-a-engineer can help you design and configure and then likely you can just let it run. I have also had a lot of success with letting the vendor help me design the network and then renting a friend to audit the design.
- Get a Blackberry. This is not a vanity item – Green bean. Register for a free monitoring service that will ping your site and email when things GET SLOW (not stop). In spite of your best efforts someone will notice something weird and email you. You want to know this immediately – not the next time you check your email. I can't believe the number of startups out there where dudes go hiking for a day or two and then come back and see things aren't working. If this has happened in your company – you are a green bean. AT LEAST, get a cheap old analog pager; they are a couple bucks a month.
- SUCCESS is not an excuse. I don't care that your site is very popular; if it's slow or doesn't work it sucks. I'm not going to buy the "we are so popular excuse" if you have one box. If you are slow and have one box you are a green bean and running a hobby on the side.
- Decouple your service from the Database whenever possible - seriously. Most every bad thing that happens to your site or service has something to do with connections to the database; not enough, too many, exhausted pool, memory leaks, slow queries, etc. When possible, build queues and flat files that can recover if you need to reboot the database box, or switch between a primary and backup database. Likely, not everything has to be written or read in real-time. Use this to your advantage – I'll take slightly stale data over no service, every day of the week.
- Never, ever - ever – ever – ever- use long running processes. Obviously you can't avoid the database but the rest you largely can. Use Apache to fork services with low lifetimes. Make a tee shirt that says "I hate long running processes". Everything leaks memory green bean.
- Make sure one person is responsible for keeping the service running. A wise man once told me, if everyone is responsible, no one is responsible. You can pass responsibility around – but only one person should be 'responsible' at a time. If you are 15 employees and don't have a dedicated "operations person" yet, you my friend are a green bean.
I'll give some more thoughts to this.. I'm sure there are more.. Please comment if you disagree or agree..
Note: The term "Green Bean" is a registered trademark of Jim Lejeal
Great post Todd. Love your thoughts on what is probably the next step of running a service: where the actual costs are. As our service grows, we’ve been focused on the actual cost of providing the service. We’re looking at power use, CPU, storage space, hardware costs. We look at each of our configuration and estimated how much we spend per marginal page served.
Posted by: You Mon Tsang | 2007.12.09 at 10:25 AM
@You Mon: Maybe I’ll take this up in a future post. I will say however that most web service based business have 90% gross margin or better so this stuff is kind of in the noise. That’s what great about these kinds of businesses... Depending on the business however, storage can certainly get expensive.
Posted by: Todd Vernon | 2007.12.09 at 10:25 AM
Great advice for startups, Todd. These are some of the technology aspects that are often overlooked until something breaks. I love how you have touched on the most important points and condensed them into one post.
Posted by: Tom Chikoore | 2007.12.09 at 10:26 AM
@Tom: Thanks, most stuff out there is less prescriptive and quickly gets down into nuts and bolts of specific technologies. This post was designed to simply be a checklist of stuff that make a big difference in the early days.
Posted by: Todd Vernon | 2007.12.09 at 10:27 AM
Interesting points.
You mention decoupling the database and using flat files - what mechanism should be used to lock and share those flat files? Since there are multiple web servers you\’ll need some way to lock/share them right? NFS and flock? That just sounds like migrating your problems from the database into a new area. Or are you referring to caching out the ultimate html page and not just the data behind it? In which case sure, that works fine...for anonymous users
Posted by: greggles | 2007.12.09 at 10:30 AM
@greggles: Don’t get me wrong, you can get really exotic here but that’s not what I’m suggesting.. A lot of services have stats pages, or profile pages, or report type pages that can be very query intensive. Im suggesting that a lot of these don’t change that much from day to day.. Build a task that writes flat files periodically. Then when the page get served, just serve the flat file..No locks required.. In the past I have done a lot of projects where a lot was happening in real-time. As these actions happened we wanted to record (or log) that the actions had happened.. Rather than putting database calls right in the middle of a pseudo real-time process we just wrote those actions to queues that another process would log as time permitted. That way a database slowdown would not affect a real-time process (web conferencing). The major theme is don’t think of the database as your bitch. Its very expensive to interact with it (even if it’s not when you first build the product). Always ask yourself if we ‘have’ to interact with it now, or can I ‘easily’ move that out of the flow of app. my $.02
Posted by: Todd Vernon | 2007.12.09 at 10:31 AM
Right on... do it right the first time, or fail. So what’s the opposite of a green bean?
Posted by: Thomas Jordan | 2007.12.09 at 10:32 AM
@Thomas Jordan: Hmm, I don’t know.. I have to think about that. I got the term from Jim Lejeal one of my co-founders at Raindance. I will consult him.
Posted by: Todd Vernon | 2007.12.09 at 10:33 AM
green bean--very funny but so right on. Great post and comments. As for opposite, I think you just want to be something other than a green bean, not necessarily the opposite. There are many ways to be without being a bean. Don’t be the bean.
Posted by: rando | 2007.12.09 at 10:33 AM
I’ve found that a great way to scale initially for service based startups is AWS (aws.amazon.com) and RightScale. AWS is pay for what you use and RightScale is $500/mnth. That is pretty cheap while you figure out if your idea has traction. If you are able to prove your business, then bringing some of the RightScale tools in house probably makes sense considering the community around AWS. Note that AWS for all processing, storage, queuing capabilities, etc. probably doesn’t scale longterm (in some cases it might) but again it is a great way to prove your idea and offload scalability concerns upfront.
Posted by: thompsa6 | 2007.12.09 at 10:34 AM