Shakespeare famously asked “What’s in a name?” A great deal, as it turns out, since the words used to describe things affects how we perceive them and think about them. Perhaps it’s the rapid rate of change that makes technology particularly susceptible to the effects of how we use words to describe things. Terms too precise can simply cease to be used, while overly casual application of terms can obscure important distinctions.
This seems to be the case when folks talk about “artificial” and “surrogate” keys. Let’s take a closer look. Everyone seems to agree that a “natural” key is some characteristic out there in the real world that can serve as a key in a database. Employees already have a social security number (or social insurance number, or whatever your country calls it). If you were creating a table of employees you might well use this unique value as a key. In contrast with “natural keys”, however, many people use “artificial keys” and “surrogate keys” as if they were synonymous; in doing so an important distinction can be lost. Long before there were computers numbers were assigned to, say, purchase orders. These numbers are “artificial” in the sense that they are automatically generated and the actual numerical value has no intrinsic meaning. Order numbers, however, are used not just by computers as keys. Employees use them, and certainly customers may call up and inquire about a particular order number. Originally, in the early days of database technology, the term “surrogate key” was used to talk about values pointing to the physical location of records in a database. These values are internal to the database and are not available to database user, nor would any database user want them. We won’t consider that meaning here, though the concept might well be applied to the “uniquified” key SQL Server creates when building a clustered index on a non-unique column.
We’ll consider the more common distinction of the word “surrogate”, which is to refer to a key created by database designers solely for the internal use of the database, not for use by staff workers or database users. One of the more common examples of a surrogate key is in data warehouses where a date is uniquely identified by an integer, not by a datetime datatype. The actual datetime value, along with its associated characteristics, can always be looked up in the dimension table. In some cases, this can provide performance benefits, particularly if the datetime datatype employed takes up more bytes than an integer. There is another potential value to an integer surrogate as well. Specific integers can be used to capture meanings unavailable using datetime datatypes, for example “not applicable” or “date to be determined”.
To sum up, natural keys are existing characteristic values of database objects that serve as keys. Artificial keys are generated automatically and have no intrinsic meaning, but are important to database users in their work. Surrogate keys are created by database developers for some specific design purpose and are meaningless to, and not to be used by, clerical staff and database users.