November 2010
M T W T F S S
« Apr   May »
1234567
891011121314
15161718192021
22232425262728
2930  

Loose index scan vs. covered indexes in MySQL

Loose index scan in MySQL can really help optimizing “group by” queries in some cases (for example, if you have only min() and/or max() as your aggregate functions). For example, if you have this query (to find maximum delay for all US flights with departure on Sundays in 2010):

select max(DepDelayMinutes), 	carrier, dayofweek 
from ontime_2010 
where dayofweek = 7 
group by Carrier,  dayofweek

the usual case will be adding a covered index on (dayofweek, Carrier, DepDelayMinutes). And MySQL will use this index fine (using index mean it will use the covered index):

mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 
where dayofweek =7 group by Carrier, dayofweek\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: ref
possible_keys: covered
          key: covered
      key_len: 2
          ref: const
         rows: 905138
        Extra: Using where; Using index
1 row in set (0.00 sec)

However, as the dayofweek part has low number of unique values, mysql will have to scan a lots of index entries (estimated rows: 905138).

MySQL can use loose index scan. Unfortunately, a o lots of limitations apply:

  • The query is over a single table.
  • The GROUP BY names only columns that form a leftmost prefix of the index and no other columns.
  • The only aggregate functions used in the select list (if any) are MIN() and MAX(), same column
  • etc… (see docs for details)

As our example query is suitable for loose index scan, we can create another index:

mysql> alter table ontime_2010 add key lis1(Carrier, dayofweek, DepDelayMinutes);

mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 
where dayofweek =7 group by Carrier, dayofweek \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: range
possible_keys: DayOfWeek,covered
          key: lis1
      key_len: 5
          ref: NULL
         rows: 19
        Extra: Using where; Using index for group-by
1 row in set (0.01 sec)

Here, Using index for group-by, means that MySQL uses loose index scan.
Also, and it is really great, it works with range on dayofweek too:

mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 
where dayofweek > 3 group by Carrier, dayofweek \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: range
possible_keys: DayOfWeek,covered
          key: lis1
      key_len: 5
          ref: NULL
         rows: 19
        Extra: Using where; Using index for group-by
1 row in set (0.00 sec)

And original covered index does not work with ranges in where clause:

mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 use index (covered) 
where dayofweek > 3 group by Carrier, dayofweek \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ontime_2010
         type: range
possible_keys: covered
          key: covered
      key_len: 2
          ref: NULL
         rows: 2416543
        Extra: Using where; Using index; Using temporary; Using filesort

In the above example, MySQL uses index but still have to create temporary table and filesort.

Now speed comparison:
I’m using “ontime” flight performance statistics data from transtats.bts.gov
The table only consist of data for 2010.
Table size: 5M rows, ~2G in size. Table structure:

CREATE TABLE `ontime_2010` (
  `YearD` int(11) DEFAULT NULL,
  `MonthD` tinyint(4) DEFAULT NULL,
  `DayofMonth` tinyint(4) DEFAULT NULL,
  `DayOfWeek` tinyint(4) DEFAULT NULL,
  `Carrier` char(2) DEFAULT NULL,
  `Origin` char(5) DEFAULT NULL,
  `DepDelayMinutes` int(11) DEFAULT NULL,
... more fields here ...
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Results (cached index and data):
“where dayofweek = 7” (ref)

  • Loose index scan: 0 sec
  • Covered index: 0.6 sec

“where dayofweek > 3” (range)

  • Loose index scan: 0 sec
  • index (+ filesort): 5.53 sec

I will also present the findings from this article (among other things) during the upcoming webinar, Getting the Best MySQL Performance in Your Products: Part 3, Query Tuning, which will take place on Tuesday, Nov 23 (webinar is free, webinar registration)

Comments are closed.