12.2 Normalizing a table

It is important in terms of table storage efficiency and processing efficiency to normalize tables. The columns that constitute a table should be examined during table normalization.

This section describes the following normalization topics:

Organization of this section
(1) Normalization for improved table storage efficiency
(2) Normalization for improved table processing efficiency

(1) Normalization for improved table storage efficiency

If a table has multiple columns that contain similar data, the table should be normalized so that all the columns contain unique data. Such normalization improves the efficiency of data storage for the table. This subsection describes how this works using the example depicted in the following figure.

Figure 12-1 Multiple columns in a table containing similar data

[Figure]

The PCODE and PNAME columns in the STOCK table have a one-to-one correspondence before normalization, which means that the data in these columns is redundant. For this case, another table called PRODUCT can be created that consists of the STOCK table's PCODE and PNAME columns. The PRODUCT table is created so that the PCODE and PNAME columns do not contain duplicative data.

(2) Normalization for improved table processing efficiency

(a) Same table used by multiple applications

If the same table is used by multiple applications, normalization can result in a separate table for each application. This can improve the level of concurrent execution for each table. This subsection describes how this works using the example depicted in the following figure.

Figure 12-2 Same table used by multiple applications

[Figure]

The PMANAGE (product management) table is used by the inventory management application and the orders management application. The PMANAGE table can be normalized to the STOCK table that is used only by the inventory management application and the ORDERS table that is used only by the orders management application.

(b) Columns with different access frequencies

If some of a table's columns are accessed frequently and some are not, normalization can result in a table consisting of the columns that are accessed frequently and a table consisting of the columns that are accessed infrequently. This subsection describes how this works, using the example shown in the following figure.

Figure 12-3 Columns with different access frequencies

[Figure]

Taking the STOCK table shown in the above figure, if the retrieval frequency ratio of the PNO and SQUANTITY columns to the PNAME and PRICE columns is 9:1, normalization of the STOCK table should result in a table (STOCK2) consisting of the columns that are retrieved frequently and a table (PRODUCT) consisting of the columns that are retrieved infrequently.

Assume that 10,000 physical input/output operations are required in order to retrieve all entries in the STOCK table. When the STOCK table is divided into the two tables STOCK2 and PRODUCT, the numbers of physical input/output operations required in order to retrieve all entries in these two tables drops to 4,500 (5,000 x 0.9) and 500 (5,000 x 0.1), respectively. As a result, only 5,000 physical input/output operations are required to retrieve all entries, thereby improving the overall table processing efficiency.