Identifying and Fixing Web Application Performance Problems
Identifying performance problems; it is a daunting task given to many software engineers who are working on scaling up applications. As requests increase in frequency from hundreds, thousands, tens of thousands per minute, being able to locate performance bottle necks and then fix them in crucial for long-term success. While not every application will scale in the same way, I’ve thought about tips that will hopefully help many engineers facing the same problems that I am facing.
Locating the Areas to Improve
An application has hundreds of endpoints and a ton of code, how can one possibly know where to start looking for performance gains? Through tools like New Relic, I can identify low performers to the Apdex and average request time.
My favorite view in New Relic is the “Apdex most dissatisfying” view under transactions. Through this view, I am able to see requests which most disatisfy the user base. However, I focus on transactions with moderate to high throughput to see the most gain. We’ve recently lowered our t-threshold from 0.50 (default) to 0.04 (aggressive). By doing this, we have set a high performance bar that allows for better feedback through the apdex.
Another great view is the simple “databases” tab. By knowing the load on the database, and equating that to usage, I can understand which transactions are heavy hitters, and how their throughput affects the database. Also, if memcache or redis usage is ballooning, it is an indicator that there may be caching in an incorrect place.
After identifying transactions that are candidates for improvement, I follow a few sets of rules, and then explore fringe cases that don’t fall into them on a case-by-case basis.
Assume it is the Database
Databases do a ton of work, and there is a good chance that applications are using them in a way that reduces their maximum throughput. Although there are many types of database issues, two that are most seen by me are n+1 queries and missing indices.
n+1 queries are pesky occurences where an application is requesting information from the database in an iteration that could have been loaded by the database in a single query outside of the iteration. This topic has been hit pretty heavily, but one new contribution I can add is that a query doesn’t have to hit the database to be a performance deficit. In Rails, the active record cache can save a database call at the expense of fully loading Arel objects. I use Rails panel to identify “cached queries” and try to remove them in as many cases as possible. I’ve seen performance gains as drastic as 10+ seconds just from cached queries that don’t even hit the database.
Missing indices is a very common database issue that is exactly as it sounds. Outside of the usual index suspects, Postgres offers partial index which can be a huge performance gain in certain situations. Take for instance this edited filtered scan explanation:
Limit (cost=0.00..88.80 rows=1 width=493) (actual time=0.387..0.387 rows=0 loops=1)
Index Scan using index_name on my_table (cost=0.56..12.60 rows=1 width=430) (actual time=0.106..0.106 rows=1 loops=1)
Index Cond: (some_text_field = 'some value'::text)
Filter: (team_id = 1 AND NOT deleted)
Rows Removed by Filter: 1720
Total runtime: 0.406 ms
It is good that this is hitting an index, but do you notice the filter conditional on a constant value? By taking advantage of a partial index WHERE NOT deleted
, huge performance gains can be realized:
Limit (cost=0.28..8.30 rows=1 width=493) (actual time=0.038..0.038 rows=0 loops=1)
-> Index Scan using index_name on my_table (cost=0.28..8.30 rows=1 width=493) (actual time=0.038..0.038 rows=0 loops=1)
Index Cond: (some_text_field = 'some value'::text)
Filter: (team_id = 1)
Total runtime: 0.065 ms
There are tons of small tricks like this that are picked up through interacting with EXPLAIN ANALYZE
. I encourage people looking to learn to dig into troublesome queries and really taking the time to understand the debug output. I’ve found some queries that can go from 15s down to .01ms just from partial indices.
Look for Sequences of Code to Memoize
Memoization is a hot topic in the Ruby world. I personally reach for @method_name ||= the content
and stick to one line methods whenever possible. However, it is possible to cache begin ; end
and if
statements in Ruby.
When a request is being executed, most memory is going to be specific for that request. A decision can be made of “if this count is 1 at the beginning of the request and 2 at the end, does that matter?”. If the answer is no (it usually is), then one can memoize the count method to store the result in memory. This is how n+1 queries can be removed.
Discover the Context of Code
A continuation of memoization is truly understanding the context that code will be executing in. In order to make decisions like memoization, one must be cognizant of the usage of that code elsewhere in the codebase. For instance, will the method be used in the foreground, background, once a second, 100 times per second, etc. Code has to work and be performant in all contexts, and the only way to do this is fully understand it. It is the job of a software engineer to discover this context.
One example of how drastic context can be is a class which caches a value in memcache (off box). If the code is executing in a background job 1 at a time, then the cache is necessary to prevent recalculation between runs. However, if the code is executing inside of an iteration of a single run, then cache could be avoided in exchange for memory. Finding instances where application code can be moved from off-box to in-memory will speed it up significantly.
Conclusion
I don’t usually end my posts with conclusions, but it is important to re-iterate that performance optimization is an open book where one solution won’t be the end all. I’ve learned something new almost every time I’ve went in for speed improvements, and I come out happy every single time. Give it a shot and let me know if you find any cool techniques!