Query conservationist
The fewer the queries the better
I always enjoy trying to do my work creating web applications using the fewest number of database queries possible. It's kind of a pride thing, I guess. That and I suppose it makes to improve an app's performance and reduce the load on a server. If you're into that kind of thing. ;)
A technique I use subtitutes the temptation of using queries whose results call queries and instead uses only two.
Something that commonly comes up involves a needing to learn something specific about every item of a query result. While looping through the first set of results it's easy to just run a query each time to get what I need. Problem comes when your apps scale up from result sets in the double digits to those in the thousands or more. Running thousands of queries just to get what you want is a little selfish, don't you think?
When I create dynamic reports for clients that show row upon row and column upon column of totals pulled from the four corners of a vast datascape the number of queries needed can skyrocket. One technique I'm rather fond of employing does everything needed in two queries. Let's play pretend here and say we're working with a grocer who wants to see a report of how many of different kinds of fruit have been sold everyday for the last month.
The first query might look like this:
SELECT name AS fruit_name
FROM produce
WHERE type = 'fruit'
...And might return results looking like this:
FRUIT_NAME
apple
banana
orange
raspberries
For each of the items in that result set, you could run a query to get the total sold for each day of a month. For the shortest month of the year with 28 days, this would mean 112 queries! You can see how this could easily get out of hand.
Instead, try a second query that gathers up the dates for every fruit sale during the month, en masse. You can then dump these results into an array who's indexes match the name for each type of fruit. For each fruit you can put another array with a slot for each day of the given month that stores the total number each was sold on a given day. It may end up looking like this:
Array()
[apple] => Array()
[2006-02-03] => 4
[2006-02-07] => 12
[2006-02-10] => 9
[orange] => Array()
[2006-02-01] => 8
[2006-02-03] => 19
[2006-02-04] => 13From here it's pretty easy to loop through the first set of data and match on the second set using those spiffy associative indexes.
This is a pretty rudimentary example but it's important for me to be mindful of such techniques as I build bigger and bigger apps. Like I mentioned, when creating reports sometimes the data is being mined from many different areas of the database all with different calculations that would make a single query nearly impossible. By finding a common bit of data like an ID and leveraging some arrays you can keep the number of queries down.



I used this to cut down the query count on a heavy site by 800%(!). Beware, though, that you're trading queries for RAM and there is usually only so much RAM. So, choose wisely.
Left by Jan | Feb. 3, 2007 at 3:11pm
"This is a pretty rudimentary example but it's important for me to be mindful of such techniques as I build bigger and bigger apps. "
Well... Generally it's a good philosophy. However, as apps become bigger and bigger, you might have to rethink this strategy. If you want to scale horizontally by partitioning data across database servers, you might have to go back to doing it the not-so-clever way.
Granted, you'll need to have quite a bit of traffic on your webapp, but still... In these times, any Joe Schmoe can create a webapp and recieve millons of hits in a single day, if it's Digged and Slashdotted and whatnot.
Left by Monkey | Feb. 5, 2007 at 12:27pm
That is a good point. As applications get bigger, performance becomes more and more important. Often, the database is the fastest place to process queries, but just the other day I ran into a situation where is was simpler to process the logic of the query with PHP. I had a search script running around 70 seconds - it was essentially useless because it was so slow. Joe tried to speed it up with some database voodoo, but actually made it run slower. Then I used wrote up a script that grabbed all the data in question, and looped through it. The script completed in 3 seconds, and I was happy.
I'm sure the best way to solve the problem I encountered would be to optimize the query and database for that particular situation, but our first optimizations efforts weren't very good so we did it a different way.
Left by Dave at SC | Feb. 6, 2007 at 8:27am
Leave a Comment