SiteCrafting Blah Blah Blog
Feb. 20, 2007 at 2:53pm
Optimizing GROUP BY with Multiple LEFT JOINS
Creative use of temporary tables
At SiteCrafting, I enjoy working with a large number of different projects, each with their own requirements, technology, and problems to be solved, unfortunately, I sometimes forget about past solutions, until after I have finished writing a piece of code. Such is the case with a query that was eating up some serious processing time.
The problem was with a GROUP BY query with LEFT JOINs to several other tables and summing up totals from those joined tables. This query was taking about 4 minutes 45 seconds to run, and worse yet, it was affecting searches which had nothing to do with that query, and probably also eating up precious memory and cpu resources.
In order to optimze the table I used the EXPLAIN
statement to figure out where the bottle necks were and then
restructured the primary keys and indexing for the tables I was trying
to left join to. This reduced the time of the query to about 35
seconds, this was still too slow and was still eating up too many
resources.
I decided to put the problem to the side and
continued working another project. That's when I remembered using
temporary tables to solve a similar problem for a past project. The
temporary tables solution reduced the query time to about .5 seconds.
Below is an example of the temporary table solution that I implemented for MySQL 4.1 and PHP 4.0.
PLEASE NOTE:
The database user must have the CREATE TEMPORARY TABLE privelege to the database.
Let's start with three sample tables that a hotel uses.
CREATE TABLE `rooms` (
`room_id` int(11) NOT NULL auto_increment,
`floor_number` int(10) NOT NULL,
`room_number` varchar(10) default NULL,
`size` varchar(10) default NULL,
`beds` int(11) default NULL,
`towel_allotment` int(11) default NULL,
PRIMARY KEY (`room_id`)
) TYPE=MyISAM
CREATE TABLE `room_services` (
`service_id` int(11) NOT NULL auto_increment,
`room_id` int(11) default NULL,
`date_serviced` datetime default NULL,
`serviced_by` varchar(50) default NULL,
`order_amount` float default NULL,
PRIMARY KEY (`service_id`),
KEY `room_id` (`room_id`)
) TYPE=MyISAM
CREATE TABLE `room_cleanings` (
`cleaning_id` int(11) NOT NULL auto_increment,
`room_id` int(11) default NULL,
`date_cleaned` datetime default NULL,
`cleaned_by` varchar(50) default NULL,
PRIMARY KEY (`cleaning_id`),
KEY `room_id` (`room_id`)
) TYPE=MyISAM
For some bizarre reason, the management wants a single report that lists the number of times a room received room service and cleaning, even for the new rooms that were added, which have not received any services or cleaning.
We start with this query.
SELECT rooms.room_id,
rooms.floor_number,
rooms.room_number,
COUNT( DISTINCT room_services.service_id) AS vTotalServices,
COUNT( DISTINCT room_cleanings.cleaning_id) AS vTotalCleanings
FROM rooms
LEFT JOIN room_services ON room_services.room_id = rooms.room_id
LEFT JOIN room_cleanings ON room_cleanings.room_id = rooms.room_id
GROUP BY rooms.room_id
ORDER BY rooms.floor_number, rooms.room_number
This looks like a good query. All three tables use the room_id as an index, which is numeric, another plus that queries like. However, this query is too slow, but why should that be.
In order to figure that out, we need to take a look at the total number of records that is being generated by this query. Let's say that there are 100 records for the rooms table, 40,000 for room_services, and 10,000 for room_cleanings. When using left joins, to get the total number of records, where there is a one-to-many relationship, we multiple all the records together.
100 rooms * 40,000 services * 10,000 cleanings = 3,990,000 total
The sample query has almost 4 million records to count up.
This is where a single temporary table can be of help. The room_services table has the most records, a temporary table can be used to store the total number of services for that room.
PLEASE NOTE: In PHP, a temporary table will exist as long as a connection remains open.
Here's a step-by-step example in PHP and MySQL
STEP 1 - create the temporary table
// $connection is a mysql_connection
$sql = <<<EOD
CREATE TEMPORARY TABLE `tmp_service_totals` (
`room_id` int(11) NOT NULL default '0',
`total_services` int(11) default NULL,
PRIMARY KEY (`room_id`)
) TYPE=MyISAM
EOD;
$res = mysql_query($sql, $connection);
STEP 2 - Populate the temporary table
// the temporary table stores the total number of services for each room
$sql = <<<EOD
INSERT INTO `tmp_service_totals` (room_id, total_services )
SELECT room_services.room_id,
COUNT(DISTINCT room_services.service_id )
FROM room_services
GROUP BY rooms_services.room_id
EOD;
$res = mysql_query($sql, $connection);
STEP 3 - Update our original query using the temporary table
// query to get the totals for each room
$sql = <<<EOD
SELECT rooms.room_id,
rooms.floor_number,
rooms.room_number,
tmp_service_totals.total_services AS vTotalServices,
COUNT( DISTINCT room_cleanings.cleaning_id) AS vTotalCleanings
FROM rooms
LEFT JOIN tmp_service_totals ON tmp_service_totals.room_id = rooms.room_id
LEFT JOIN room_cleanings ON room_cleanings.room_id = rooms.room_id
GROUP BY rooms.room_id
ORDER BY rooms.floor_number, rooms.room_number
EOD;
$res = mysql_query($sql, $connection);
STEP 4 - Store the records into an array
while( $record = mysql_fetch_assoc($res) ){
$records[] = $record;
}
STEP 5 - Destroy the temporary table
$sql = <<<EOD
DROP TABLE tmp_service_totals
EOD;
Summary
Let's take another look at the query in step 3. Since the left join between rooms and tmp_service_totals is one-to-one, we will not need to factor that in to determine the total number of records that was generated.
100 rooms * 10,000 cleaning = 100,000 total records
So, we went from having four million records TO one hundred thousand records.
For my real world query, after optimizing tables structures and indexes the time was averaging 34 seconds. After implementing the temporary table, the final average time was about .5 seconds, including the creation of the temporary table and inserting data into that table.
Posted in From the Workbench, MySQL, PHP by Ken Foubert
Comments (0)
Add your comment below