CoderZone.org
Pages: 1 « previous     next »
  Print  
Author Topic: Simple SQL Performance Tip  (Read 5158 times) Bookmark and Share
Cruise Elroy
Newbie
*
Posts: 14



View Profile
« on: Dec 04, 2010, 11:12:02 am »

This is a pretty simple tip, but I see a lot of SQL written that shows that many people haven't been taught this basic concept. The concept is,

"When doing a SELECT in SQL, always do whatever you can to reduce the number of rows collected in the first part(s) of your SQL. That is, include the most restrictive clause first whenever possible."

For example, let's say you have a site with 10,000 users, and, as might be expected, your 'users' table has 10,000 rows, (one for each of them.)

On this site you have a calendar application. Each user can make as many entries in the calendar as they want. Let's say that there are 500,000 rows in the 'calendar' table, and when a user views a given month the application pulls out the rows for them just for that month.

Along comes  user #1000 and clicks to view his calendar for June of 2005. He has 300 rows in the 'calendar' table, but only 5 of them are in June of 2005.

You could do get his rows by doing something like this:

Code:  
Highlight Mode: (MySQL)
  1. SELECT * FROM 'calendar'
  2.     WHERE month = '6'
  3.     AND year = '2005'
  4.     AND user_id = '1000'
 

This is incredibly inefficient, and probably very slow. It's also a memory hog. What this query does is pull every row in the entire table (SELECT * FROM 'calendar'), and then it throws out all of the rows that aren't in June (WHERE month = '6'), then it throws out all of the rows that aren't in 2005 (AND year = '2005'), and finally, it throws out everything that doesn't belong to user 1000 (AND user_id = '1000').

You retrieved 500,000 rows -the entire table- in order to end up with just 5 rows. That's a lot of needless, expensive sifting and discarding of rows to get the ones you want.

What you should probably do instead is select by his user_id first:

Code:  
Highlight Mode: (MySQL)
  1. SELECT * FROM 'calendar'
  2.     WHERE  user_id = '1000'
  3.     AND month = '6'
  4.     AND year = '2005'
 

By only grabbing rows that belong to this user first, the query reduces the initial number of rows retrieved to 300 (down from 500,000). After that, throwing out the other rows for non-matching months and years is a trivial task. The second query uses less memory and runs a whole lot faster.
Logged
Tags:
Pages: 1
  Print  
 
Jump to: