Things they don't teach you in school
I’ve read some articles recently on the speed differences between InnoDB and MyISAM database formats under MySQL. The articles suggest that InnoDB is faster in many respects than MyISAM. On the other hand, many people will tell you that MyISAM is not a relational database format, and can’t be compared to InnoDB. So, which database format is better, and why should you care?
Well, if you’re doing much with MySQL such as hosting your forums, blog, or other dynamically published information on your website, you’ll want to know which is more appropriate for your site. MySQL’s default database engine is called MyISAM which is a flat file, B-tree indexed database engine. It is very fast (if indexed properly). So fast, in fact, it would be hard to find any relational database system thats faster in basic queries. The InnoDB database engine is a transactional, relational database engine that supports all of the SQL buzzwords like stored procedures, views, transactions, ACID compliance, etc. So which is right for you? That’s a loaded question with many answers. But unless you have full control over your server, chances are your ISP doesn’t have MySQL configured to use InnoDB, so the question might be pointless. However, if you run your own server, you can probably reconfigure MySQL to use InnoDB. The question is, should you?
For anything but raw power and speed necessities, I would recommend InnoDB for one simple reason: Fault tolerant. I have been personally burned on more than one occasion by a corrupt MySQL table. This may happen if your hard drive filles up or your server crashes unexpectedly, leaving the database in an unsafe state. InnoDB uses a transaction log and can create checkpoints which help to keep your database healthy in times of troubled weather. MyISAM doesn’t have this feature.
But recently I had the need to load a very large dataset and de-dupe the tables. The source files totaled over 200 million lines, which needed to be parsed and imported. I tried Microsoft SQL Server 2005 with this project, and it took all night to import it using the integration tools. I created some custom integration package with some VB script to parse the files. Microsoft SQL Server is of course a better choice for large scale data warehouses because it can take full advantage of Windows’ superior threading, but for this project I needed something fast, simple, and portable.
I imported one of the files which contains about 2 million records. I created the table using the InnoDB engine. I was very disappointed at the results. I’ll let you be the judge:
LOAD DATA INFILE ‘/work/nameservers.csv’ INTO TABLE nameservers FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ” (nameserver,nameserverid);
The table structure is simply nameserverid int, nameserver varchar(80);
InnoDB results
Query OK, 1856807 rows affected (1 hour 14 min 48.78 sec)
Records: 1856807 Deleted: 0 Skipped: 0 Warnings: 0
MyISAM results:
mysql> LOAD DATA INFILE ‘/work/nameservers.csv’ INTO TABLE nameservers FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ” (nameserver,nameserverid);
Query OK, 1856807 rows affected (18.66 sec)
Records: 1856807 Deleted: 0 Skipped: 0 Warnings: 0
Furthermore, indexing is also much faster with MyISAM:
InnoDB:
ysql> create index ix_ns on nameservers2(nameserver);
Query OK, 1856807 rows affected (10 min 16.35 sec)
Records: 1856807 Duplicates: 0 Warnings: 0MyISAM:
mysql> create index ix_ns on nameservers(nameserver);
Query OK, 1856807 rows affected (36.57 sec)
Records: 1856807 Duplicates: 0 Warnings: 0
Am I missing something? I might be wrong in my overall assessment, but for shear speed, it’s hard to beat MyISAM. What’s your experience with large scale MySQL databases?
web·pit n. a place of discovery; a repository of information; where coders come for enlightenment; a programmers diary.
Please click here to login to post a comment.