Monday, September 23, 2013

Understanding Storage IOPs

Input output operations per second, commonly termed IOPs, is an important performance measure of a data storage system. The unit of IOPs is operations per second. Input operations, i.e. when data is written to the storage, are different than output operations when data is read off the storage. Their corresponding IOP number is also different. Therefore it is important to know both the inputs per second and the outputs per second performance of a storage system. These numbers are not independent of each other - reading off the storage system while writing to it may affect the performance of both operations. Therefore IOPs are quoted with a weighted average of the inputs and outputs per second. For example, a storage system may be capable of performing 100,000 IOPs with 70% reads and 30% writes, with the operations being performed concurrently. The storage engineer needs to validate the concurrent input/ouput requirements of the application using the storage system to ascertain if they can be met by the storage.

But IOPs by themselves don’t paint the whole picture of storage performance. More information is needed to understand the whole picture. Here are questions leading to the complete picture:

  1. What is the size of the read/written data block while measuring IOPs?
  2. What is the end-to-end latency seen by the application in reading or writing the data block?
  3. With regards to writes, are the reported IOP numbers for synchronous writes or asynchronous writes?
  4. With regards to reads, what is the role of cache?

Large data blocks take longer to write to/read from storage. So IOP numbers of 4kB blocks will be very different than IOPs for 1MB blocks. The most relevant block size for an IOP measure should correspond to the sizes of blocks being written to the storage system by the application. Knowing the IO profile of the application is key to choosing an appropriate storage system for an application.

Latency is a key IOP qualifier. Storage system latency is the time from when the application issues an IO request to the storage system to when the request is completed - the read data is delivered to the application or the storage system acknowledges that the data block has been written. The important question with respect to writes is when the storage system acknowledges that the data block has been persisted on  non-volatile storage. For some applications writes may be asynchronous - they are acknowledged before they have been persisted on non-volatile storage. Since asynchronous IOP and latency numbers look better (higher IOPs, lower latency), promotional storage systems’ material often quotes asynchronous write IOPs.

Some systems have battery-backed non-volatile RAM to allow the acknowledgement to be sent to the application as soon as the data block is written to RAM - usually orders of magnitude faster than data storage media like SSD or disk. The question then becomes, what is the size of this non-volatile RAM that can hold data blocks before the data need to be persisted on the slower storage media. While some applications have bursty write profiles which play nicely with this (limited) non-volatile RAM, applications that require sustained write performance may not benefit much from such methods.

Similarly, RAM can be used to cache data for reads - read latency decreases when a cache hits serve data blocks from RAM instead of being read off slower storage media. The size of the RAM cache as well as the application’s read patterns - is some data read more often than other data - are important considerations while working with caches.

The key to having an intelligent conversation about IOPs is to know your application and to seek definitive answers about latency, data block sizes, synchronous/asynchronous assumptions and caches.