Adding redundant data

A correct data model avoids redundancy by keeping any attribute only in the table for the entity that it describes. If the attribute data is needed in a different context, you join tables to make the connection. But joining takes time. If a frequently used join affects performance, you can eliminate it by duplicating the joined data in another table.

In the stores_demo database, the manufact table contains the names of manufacturers and their delivery times. An actual working database might contain many other attributes of a supplier, such as address and sales representative name.

The contents of manufact are primarily a supplement to the stock table. Suppose that a time-critical application frequently refers to the delivery lead time of a particular product but to no other column of manufact. For each such reference, the database server must read two or three pages of data to perform the lookup.

You can add a new column, lead_time, to the stock table and fill it with copies of the lead_time column from the corresponding rows of manufact. That arrangement eliminates the lookup and therefore speeds up the application.

Like derived data, redundant data takes space and poses an integrity risk. In the example described in the previous paragraph, many extra copies of the lead time for each manufacturer can exist. (Each manufacturer can appear in stock many times.) The programs that insert or update a row of manufact must also update multiple rows of stock.

The integrity risk is simply that the redundant copies of the data might not be accurate. If a lead time is changed in manufact, the stock column is outdated until it is also updated. As you do with derived data, define the conditions under which redundant data might be wrong.

For more information about database design, see the HCL OneDB™ Database Design and Implementation Guide.