<home

Why Keys Must Be Composite

“For a successful technology, reality must take precedence over public relations, for nature cannot be fooled”

- Richard Feynman
Personal observations on the reliability of the Shuttle
Appendix to the Rogers Commission report

Keys are sets of attributes. A composite key is a key consisting of more than one attribute - multiple attributes making up a single key.

In the relational model of data composite keys are fundamentally important. Relational databases would not be possible without composite keys. Before E.F.Codd’s relational model (1969), databases relied on “navigational” storage structures that sorted and linked data using addresses or pointers between data elements. With data models of that pre-relational kind, a single attribute (very probably an artificial one created specifically for the purpose) is sufficient for identifying and navigating between data elements.

Relational databases are different. The crucial innovation that Codd made was the idea of an abstraction of all data as relations whereby data was stored, accessed and manipulated by its natural structure alone. The attributes of data themselves are what identify information in a relational database. Relations consist of unique tuples (“rows”) and the tuples represent unique instances of things in the real world.

Real things don’t always come with single-valued identifiers attached to them. To identify correctly the things we wish to model in a relational database and to ensure that facts are uniquely and accurately recorded it is absolutely necessary to allow their identifying keys to be sets of attributes rather than just single values.

Consider a relational table of exam results such as the following:

The purpose of the Results table is to record individual exam results. A student may take multiple exams (Charlotte sat History and French on the same day) but will never be permitted to take the same exam twice in a single day. The composite key consisting of {Student, Exam, Date} is therefore the only possible key of this table – the only irreducibly unique set of attributes guaranteed to identify tuples.

The composite key serves at least three important functions. Firstly, the presence of a tuple with a particular combination of key values asserts that the specified student took the specified exam on the specified date. Secondly, the key accurately identifies a particular exam result – the user of the database knows that by querying the table on a given combination of Student, Exam and Date she is guaranteed to find at-most one result. Thirdly therefore, the key implements a data integrity rule by preventing duplicate data from entering the table.

It is sometimes assumed that a simple key – a key consisting of just one single attribute – is preferable to a composite one, even if the simple key is artificially created for that purpose and serves no other function. Here’s a revised version of the Results table, with a new simple key attribute called Id:

The values of the Id attribute serve no purpose here other than to identify a tuple in the database. The Id attribute contains no useful information and does not identify an exam result in reality – it is simply a label appended to a tuple that was already properly identifiable even without the Id. The Id is unlikely to be useful to students or professors unless they have access to the table so that they can see the “real” key – the attributes of Student, Exam and Date, which do identify an exam result in the real world. Enforcing the uniqueness of Id does not enforce the uniqueness of the real data attributes, so the presence of Id doesn’t do anything to improve data integrity either.

It ought to be obvious that the simple key is no substitute for the original composite key. Even in the presence of the simple key, the composite is still essential for correct identification of the facts that the table is supposed to represent. Enforcement of the composite key is still just as necessary as before to keep duplicate data out of the table. We wouldn’t want to allow the following situation to arise:

What was the result of Charlotte’s History exam?

If we do choose to create a simple key therefore, we still need to make {Student, Exam, Date} a key as well. To be as useful as the original version of the table, the revised table must now have two keys: {Student, Exam, Date} and {Id}.

In conclusion, composite keys are essential to data integrity and correct identification of data. Database tuples describe uniquely identifiable things, which by definition should have meaningful keys consisting of one or more attributes. Appending a simple key to tuples that already have a composite key doesn’t make them any “more unique” than they were before. Nor does a simple key obviate the need for composite keys.