In contrast with conventional disk-based tables, in memory tables possess neither row IDs nor clustering keys to help SQL Server find a row. In-memory tables are therefore required to have at least one index. In this installment we shall look at the details of the hash index. As we shall see, a hash index is most useful for helping SQL Server find a single row among many, the so-called “needle-in-a-haystack” seek.
A hash index can be placed on any desired column, but we shall consider its most important use, the indexing of a table’s primary key. Each key value is provided as input to a hash algorithm that yields an integer from a potentially enormous range. Though Microsoft is not forthcoming about the details of this particular hash function, all hash functions have in common the fact that while the same input always yields the same hash value, it is impossible to work backwards and determine what input has produced some particular hash value. When a table is defined, the table’s developer decides on a fixed number of hash buckets; each represents a single hash value. As rows are inserted into the table, the key for the new row is fed to the hash function. The pointer to the memory location of the new row is placed in the hash bucket matching the hash value of the key. Later, if SQL Server needs to look up a row based on its key, the hash value for that key tells SQL Server in which bucket to look for the pointer for the desired row.
There is no rule that says each unique key must generate a unique hash value. Indeed, if the number of hash buckets is small, there are certainly going to be different rows with the same hash value. The pointer in the bucket points to the first row with that value, and that row has a pointer to the next row with the same hash value, which may have a pointer to a third, and so forth. As we shall see later, this linked-list structure is critical in managing the updates of row values. However, it also means that if there are not enough hash buckets for the number of rows in a table, the access time for individual rows will increase as SQL Server must chase down lists of pointers rather than find the row it wants directly from the first pointer in the hash bucket.
When creating an in-memory table with a hash index, define a bucket count that not only exceeds the number of rows in the table now, but also exceeds the number of rows expected in the future, up to the time that the table will be destroyed and a new larger version created.
Remember, you cannot create an index on a Hekaton table; it must be included in the initial CREATE TABLE statement. Remember also that you cannot modify a Hekaton table, so the bucket count you select is permanent for the lifetime of the table.
One doesn’t normally think of “compiling” a table, but in fact SQL Server creates code that defines the necessary data structures for loading rows of a particular table into RAM.
When SQL Server starts up, one of its many jobs is to load Hekaton tables into memory. To do this, it generates and compiles the C-language code that describes the table structure. If you are curious, you can locate the “xtp” folder, which should be in the default folder for your SQL Server databases. This folder will be empty when SQL Server is shut down, and when SQL Server starts subfolders will appear each bearing the database ID number of a database that contains in-memory tables. Within each folder, you will see “C” code, various object files, and dlls representing the compiled data structure that is the table. The object ID of your in-memory table can be seen in the name of the corresponding dll file. As far as I know, the only reason these files exist is that the Windows operating system requires that natively compiled executable files be represented by a file in the OS filesystem. (This is not a requirement for .NET dll files.) These files will not change and will be deleted if SQL Server shuts down.
You can open the c-language files if you wish, but it will only be to satisfy your curiosity; there is no practical information to be obtained from the generated code.
After the dlls are created, SQL Server can begin to stream data from the special FlLESTREAM filegroup created for the in-memory table. If you examine the physical location of the in-memory filegroup, you can see GUID-labelled folders that contain the actual data. If you examine the modified date of these files, you can note that the files are changed as new row data is entered. Note that the file sizes are exact powers of two.
It should now be clear why the requirements for in-memory tables are so stringent and why they cannot be modified once created. A table modification would require recompiling the dll, and the recomilation of the dll would require the old version be removed from RAM. Since the dll is the data structure for the row data, removing the dll means removing the entire in-memory table. In other words, the very structure of an in-memory tables forces it to be deleted in order to effect changes.
In the next section, we will take a look at in-memory nonclustered indexes.
Please do not hesitate to drop a quick note if you have any comments or questions.
To learn more about new enhancements to SQL Server 2014 including Hekaton tables, have a look at our new 3-day hands-on course – SQL Server 2014 Performance Enhancements. You can take the course from one of our global Education Centers or or online from home or office.