Like supporting actors and actresses who never enjoy the spotlight, data types in SQL Server play critical roles and yet rarely receive attention. In fact, selecting the best data types for columns is one of the most important decisions during SQL Server table design. For that reason, let’s take a few moments to review some of the fundamental features of the most commonly used SQL Server data types.
Character data in SQL Server can be represented either by a one-byte-per-character data type or Unicode, a two-byte per character representation. In Unicode, the codes for written English contain zero in their first byte and the ANSI code for that character in the second byte. In other words, if you are using a Unicode type to store English, you are doubling your storage requirements and cutting your data read performance in half. Save Unicode for those cases where it is necessary.
Character data in SQL Server can also be saved in variable-length and fixed-length data types. If your data does not take up the full allotted space in a fixed-length column, the data is padded with spaces so it does. In contrast, in variable-length columns only the actual data characters are stored. While this seems to suggest that variable-length columns are better, variable-length columns incur overhead. For this reason, you should only use variable-length columns when the advantages are significant. As a rough rule of thumb, the first variable-length column in a table costs you four extra bytes per row. Each additional variable-length column in the table costs an additional two bytes per row. Furthermore, there is a small amount of additional processing associated with variable length data.
CHAR – fixed-length ANSI characters
VARCHAR – variable-length ANSI characters
NCHAR – fixed-length Unicode characters (The “N” stands for “national”.)
NVARCHAR – variable-length Unicode characters
In addition, VARCHAR(MAX) and NVARCHAR(MAX) can be used when data may (or actually does) exceed the roughly 8000 byte limit for fitting row data on an SQL Server page.
BINARY, VARBINARY and VARBINARY(MAX) are exactly analogous to CHAR, VARCHAR, and VARCHAR(MAX). The only difference is that the bytes in the binary data types are not assumed to represent character data. The SQL functions that operate on character data generally operate on binary data in exactly the same way.
The integer numeric data types are easy to understand. Since we want our table rows to take up as few bytes as possible, we simply select the smallest integer data type that can do the job.
BIGINT – an eight-byte type with a range of roughly minus to plus nine quintillion (not a number you bump into every day).
INT – a four-byte type with a range of
SMALLINT – two bytes, from
TINYINT – One byte. SQL Server’s only unsigned data type ranging from 0 to 255.
BIT – a single bit can be used to represent true/false, yes/no, up/down, or any other binary choice.
Of course, real world data ofter involves fractional quantities. For historical reasons, there are two names, DECIMAL and NUMERIC, for SQL Server’s exact decimal fraction data types. Doesn’t matter which name you use. Here, we’ll stick with DECIMAL. A DECIMAL declaration includes two specifications, precision and scale. Precision specifies the total number of digits and scale represents the number of digits to the right of the decimal point (In other words, how “big” the number is.) For example,
is suitable for a number that requires two places to the right of the decimal and a maximum of seven digits. The largest number that can be represented in this declaration would be 99999.99. DECIMAL supports a precision up to 38 digits, but as always we only declare a data type as big as we really need.
MONEY and SMALLMONEY
MONEY and SMALLMONEY are not ANSI standard data types and today are something of an anachronism, They were created in an era when the storage of DECIMAL data was inefficient for currency values. But Microsoft has improved the efficiency of the DECIMAL data type and MONEY not longer provides an advantage for data storage. Making matters worse, calculations using the MONEY date type can yield erroneous results. Don’t worry; if you only do sums your calculations will be correct. But this is not necessarily the case for calculations involving division. For example, an analyst may wish to calculate the fraction that each product contributes to the total revenue. In this calculation we are likely dividing small numbers by a substantially larger one. In such a case, we may well get an answer that is close, but not quite numerically correct. If you would like to learn more about the MONEY data type, Learning Tree has a blog on the topic: https://blog.learningtree.com/is-money-bad-the-money-datatype-in-sql-server/.
The FLOAT and REAL data types support the IEEE 754 standard for floating point data. For commercial applications, however, it is generally better to use the exact DECIMAL data type to represent decimal fractions. FLOAT and REAL incur roundoff error in ways often not appreciated by businesspeople. Best to leave floating point data to the scientists and engineers.
DATETIME is the oldest and still the most commonly used datetime data type in SQL Server. The precision is just a tad over three milliseconds. In many business applications saving time to the nearest minute is good enough, making SMALLDATETIME a better choice and saving four bytes for each data point.
Microsoft has introduced a number of new datetime data types which provide many advantages. DATE stores only the date with no time value, not only saving bytes but avoiding mishaps when query writers forget about the possible impact of time in the DATETIME data type, particularly when querying over intervals of time. DATETIME2 (not the most imaginatively named data type) provides greater precision. DATETIMEOFFSET can be valuable for global organizations, as it saves the time zone as well as the date and time.
We don’t have the time and space here to discuss all of SQL Server’s data types in the detail they deserve, but here are a few observations on some of the more specialized data types.
XML stores xml text in SQL Server, but is much more powerful than storing xml as a text data type as it provides the additional tools of XQuery and XPath to aid in the querying and manipulation of this data.
UNIQUEIDENTIFER is a huge number, sixteen bytes, that is required in some applications including merge replication and the use of filestream data storage. In general, however, uniqueidentifier is a poor choice for keys. Keys usually appear in multiple locations in a database, and unnecessarily repeating sixteen byte keys is generally wasteful. Use UNIQUEIDENTIFIER only when it is required by the task at hand.
GEOMETRY and GEOGRAPHY are types used to represent spatial data. Spacial data could include diagrams and office floor plans, but is quite often map data. Geometry is a flat representation and geography takes into account the curvature of the earth. Clearly these are specialty items; you either don’t need them or you need them a lot.
Selecting the most suitable data types in SQL Server is a task that should receive careful attention. We’ve provided a brief overview here, but this should only be the beginning of a careful exploration of SQL Server data types. including the advantages and disadvantages of each.
AUTHOR: Dan Buskirk
SQL Server Training