I wrote an article on caching and some common problems found in production and how to fix them. It was posted on my work site, so I’m just posting a link to it here for those that follow my personal blog.
Archive for Work
I wrote an article on my company blog on using Postgres’ writable CTE feature to improve performance and write cleaner code. The article is available at:
I’ve been told by a number of people I should expand it further to include updates and the implications there and then consider doing a talk on it at a Postgres conference. Hrm….
One of the side projects I work on outside of my job is a Japanese anime site called AniDB. It was put together by some German guys quite some time ago with Perl, PostgreSQL, and some old donated hardware/bandwidth from an ISP. They were kind/foolish enough to let me help out with the site about two years ago. Anidb has almost anything you could possibly need to know about Japanese anime series (much like imdb for movies). Including things like what characters know each other between the different series.
Read the rest of this entry »
Wow it’s been a while since I wrote anything.
Today at work I was reminded about a few different technologies and pieces of advice that I just take for granted because of the people I work around; and today these came together to form a bit of awesomeness I felt like sharing. For a particular client we built a high performance node.js API that drives their mobile application. One of the technologies that powers it is a Riak cluster that we put together to provide a highly available and highly scalable data store for it.
The first is SSDs.
Artur Bergman gives a great rant on why you should just shut up and buy SSDs but at the time we put the cluster together we just couldn’t get them anywhere so we used some fast traditional drives. We recently had a chance to upgrade these nodes onto systems with SSDs as part of a move to a new location in our datacenter. To do this without downtime we added all the new nodes into the existing Riak cluster at line A in the graph. This took our cluster from 5 nodes to 11. This was a mixture of the old and new. At line B we started removing the old nodes down to the six new nodes with SSDs. The difference is pretty dramatic, especially once Riak had finished moving everything over. What you see here is a histogram of the response times for PUTs into Riak. Basically how long it takes to write data into the datastore.
The second is monitoring
The above graphic isn’t from Riak. It’s actually from our API. The API measures how long every backend call it makes takes, be it from Riak, or any of the rest of the myriad backend services. That’s right, _every_ request. We aggregate them, and push them out to Circonus which handles this huge volume of data and graphs it, and alerts us if something is wrong. My coworker Brian Bickerton has even done a write up on how we do this.
The third is better monitoring
Monitoring isn’t a new topic. Neither is testing. But people can’t seem to do either of them well. The graphic earlier is a histogram of the data points. Separating your data into “buckets” so that you can see where it falls. The darker areas mean more items fall into that bucket. I can mouse over any time period to get a more detailed view.
So what this is showing me is that at my cursor 98% of the data is less than where my cursor is, 1% is at my cursor, and 1% is larger. This is out of the 122,814 samples that were taken during that time period. So I can tell the we’ve made a huge impact in reducing the worst case situations for our users. This is actually a really important segment of users that often gets overlooked. It’s important to optimize the common case but if you’re one of the 1% of those users outside of it, it really sucks. Here we’ve shown our worst case writes going from a full second to less than 200ms. But it also shows why histograms are important. Let’s look at that same time period with the AVERAGE response time overlayed. And we’ll cap the graph at 200ms to show things clearer.
The light blue average line shows some really interesting information. Notice how the average response time goes up each night. This is when the backups are happening. We can see how with the SSDs we no longer take any performance hit during the backups. But it also shows us how much information we’re missing by only looking at an average. Our basic use-case performance hasn’t really changed that much. We were around 30-50ms before and now we seems to stay at around 25ms. A nice gain, but from a user’s perspective 20ms is not noticeable We would have no idea about the 1-2% of our users that were getting really horrible 200+ms performance. However these users will notice. The histogram also shows that our normal use cases seems to separate into two different operation types. One that takes < 10ms and another take takes 40-50ms.
It’s really important to be able to tell the difference between theory and reality and to be able to make decisions based on real concrete data; and even more important to be able to measure those decisions to make sure they were right. Monitoring at all levels of your stack provides this and lets me say with confidence:
Just buy the SSDs already.
Going fishing with my coworkers has turned into an annual event. A number of coworkers, friends and I went out fishing on the Chesapeake Bay. Going out on Captain Ireland’s Patience we left the dock at 6am for almost 8 hours of beer and fish. Last year we immediately caught our limit of rockfish and headed back. This year the rockfish were much harder to find and so we caught a large number of bluefish as well. 31 in fact! Add that to 17 rockfish (our limit + one for the boat) and a spanish mackerel and we had an amazing haul.
My wonderful wife cooked them all up tonight and it was delicious. The rockfish we stuffed with salt, scallions, chopped up ginger, oil and baked at 350 until done. We then dip each chunk in a very small amount of soy sauce. We slightly overcooked it but it was still absolutely delicious. The bluefish filets we dipped in flour, salt, old Bay, and ginger powder and then pan fried them for 2 mins on each side. Bluefish is normally a very strong tasting fish, but the pan frying virtually eliminated the fishiness. It had a nice thin crust and was also really good.