Database corner: Beginner’s guide to Mysql storage engines

SQL | Tech and Tools   |   
Published November 1, 2016   |   
arvindl

When a database is created, one often overlooked but critical factor in performance is the storage engine (particularly as the database grows). In many instances, the temptation is to just accept the default and continue on developing your project. This can lead to unexpected negative impacts on performance, backups, and data integrity later in the application life cycle, such as when your team implements analytics and MySQL dashboards.

To avoid these potential pitfalls, we are going to take a closer look at some of the most widely used storage engines supported by MySQL (as of version 5.7).

Supported Storage Engines

What are my options?

By default, MySQL 5.7 supports ten storage engines (InnoDB, MyISAM, Memory, CSV, Archive, Blackhole, NDB, Merge, Federated, and Example). To see which ones are available and supported by your server, use this command:

mysql> SHOW ENGINESG

This will output a list of storage engines and tell you which are available, not available, or which is currently set to the default. The “Support:” column will display ‘YES’, ‘NO’, or ‘DEFAULT’, respectively.

In some applications, the need may arise to have different storage engines for different tables in the same database. This is an example of why you need to carefully plan the data model for your application. In most cases, however, only one storage engine will be needed.

Storage Engine Capabilities

What are they good at?

Let’s take a closer look at some of the most commonly used storage engines. This will give us an idea of what each engine was designed to do and how they can best be used to serve our business goals.

InnoDB: The default option in MySQL 5.7, InnoDB is a robust storage engine that offers:

  • Full ACID compliance
  • Commit, rollback, and crash-recovery
  • Row-level locking
  • FOREIGN KEY referential-integrity constraints
  • Increase multi-user concurrency (via non-locking reads)

With the above functionality that InnoDB offers, it is obvious why it is the default engine in MySQL. It is an engine that performs well and offers many of the required attributes that any database would need. However, a comprehensive discussion of all of its capabilities is outside the scope of this article. This is the engine that will most likely be used in the majority of applications.

MyISAM: The functionality that sets MyISAM apart is its capability for:

  • full text search indexes
  • table-level locking
  • lack of support for transactions

Though it is a fast storage engine, it is best suited for use in read-heavy and mostly read applications such as data warehousing and web applications that don’t need transaction support or ACID compliance.

NDB (or NDBCLUSTER): If a clustered environment is where your database will be working, NDB is the storage engine of choice. It is best when you need:

  • Distributed computing
  • High-redundancy
  • High-availability
  • The highest possible uptimes

Take note that support for NDB is not included in the distribution of standard MySQL Server 5.7 binaries. You will have to update to latest binary release of MySQL Cluster. Though, if you’re developing in a cluster environment, you probably have the necessary experience to deal with these tasks.

CSV: A useful storage engine when data needs to be shared with other applications that use CSV formatted data. The tables are stored as comma separated value text files. Though this makes sharing the data with scripts and applications easier, one drawback is that the CSV files are not indexed. So, the data should be stored in an InnoDB table until the Import/Export stage of the process.

Blackhole: This engine accepts but does not store data. Similar to the UNIX /dev/null, queries always return an empty set. This can be useful in a distributed database environment where you do not want to store data locally or in performance or other testing situations.

Archive: Just as the name implies, this engine is excellent for seldom-referenced historical data. The tables are not indexed and compression happens upon insert. Transactions are not supported. Use this storage engine for archiving and retrieving past data.

Federated: This storage engine is for creating a single, local, logical database by linking several different physical MySQL servers. No data is stored on the local server and queries are automatically executed on the respective remote server. It is perfect for distributed data mart environments and can vastly improve performance when using MySQL for analytical reporting.

Designating a storage engine

How do I change which storage engine is used?

The storage engine that is used is established upon table creation. As previously stated, InnoDB is the default storage engine in MySQL versions 5.5 and higher. If you would like to use a different one, it is best to do this within your CREATE TABLE statement. For instance, let’s say that you have identified a table that needs use the CSV storage engine. Your overly simplified CREATE TABLE statement might look like this:

mysql> CREATE TABLE Shared_Data (
    -> Data_ID INTEGER NOT NULL,
    -> Name VARCHAR(50) NOT NULL,
    -> Description VARCHAR(150)
    -> ) ENGINE=’CSV’;
After which we would perform an INSERT statement as usual:
mysql> INSERT INTO Shared_Data VALUES
-> (1,’device one’, ‘the latest version of the best tech’),
-> (2,’device two’, ‘the fastest one on the market’);

Upon success, if you inspect the database directory, there should now be a ‘Shared_Data.CSV’ file in it that contains the records you have inserted into the Shared_Data table.

The same methodology can be used for any one of the many storage engines that MySQL supports. Though it is possible to change the storage engine after a table has been created with an ALTER TABLE statement, it is best practice to plan accordingly and set it in the beginning.

In closing

MySQL has many options

As you can see, MySQL offers support for storage engines designed to handle very different tasks in many different environments. Identifying which engines to use and when to use them can help us avoid unnecessary complications and performance issues as our applications scale.

Whether you need 99.999% uptime and reliability on your distributed computing cluster or you need ACID compliant transaction support with FOREIGN KEY constraints, MySQL has a storage engine to suit your needs.

As always, proper planning and identification of your project goals and requirements is the best way to accurately identify which storage engines are best suited for your application. Hopefully, this article serves as a useful starting point for helping you in that respect.

This article originally appeared here. Republished with permission. Submit your copyright complaints here.