MySQL, which we host on Amazon’s RDS, is one of the most important parts of our stack to optimize for performance. I found that in order to make one query against a table with a few million rows run quickly enough, I needed to add a composite index across six columns. I was a bit surprised to see this error come back from MySQL:
ERROR 1071 (42000): Specified key was too long; max key length is 3072 bytes
A bit of reading through the MySQL documentation informed me that:
The InnoDB internal maximum key length is 3500 bytes, but MySQL itself restricts this to 3072 bytes.
This limit applies to the length of the combined index key in a multi-column index.
What does MySQL do with those extra 428 bytes? Who knows! What is clear is that the index I needed covered too many columns that were too large, and MySQL was not going to create that index.
I was trying to create a composite index of five VARCHAR(255) columns and one INT(11). Because the collation on the table was UTF-8 and because MySQL assumes three bytes of storage for each Unicode character1, that puts us at 255 characters * 3 bytes * five columns + 4 bytes for that last INT(11). That’s 3829 bytes ““ 757 too many!
Luckily, a look through one of those VARCHAR(255) columns showed that it only contained a few distinct values. Changing that column to an ENUM brought it from 765 to just two bytes, which puts us at 3066 bytes total ““ under the limit by a whole six bytes!
MySQL can create and drop indexes rather quickly, but performing the migration to alter the column type on a table with that many millions of rows is not something MySQL does well. Fortunately, for Rails users there is the wonderful LHM gem which adds robust, no downtime migrations for large MySQL tables. Using the gem, I was able to perform the lengthy migration with no downtime, create the multi-column index on my newly-shrunk table, and drop one particularly nasty query’s runtime by 95%.
So if you ever bump into an error about long key lengths when trying to create an index, take a look at the columns you’re indexing and see if you can’t shrink them down a bit. See if you can switch any VARCHARs to ENUMs or if your VARCHAR(255)s really need to be 255 characters wide. If you’re feeling lucky, you could try applying an ASCII-only collation to your Unicode VARCHAR columns, but you never know where Unicode characters can crop up! 🐍
If you read that and thought to yourself, “Doesn’t Unicode usually take four bytes, not three?” you’re right. Usually. MySQL implements the less popular three-byte UTF-8 by default, but offers a utf8mb4 character set for four-byte UTF-8. ↩
The post Adventures in MySQL: When Composite Indexes Go Long appeared first on Civis Analytics.