What is dimensional data?
Traditional relational databases, such as OLTP databases, are organized around a list of records. Each record contains related information that is organized into attributes (fields). The customer table of the stores_demo demonstration database, which includes fields for name, company, address, phone, and so forth, is a typical example. While this table has several fields of information, each row in the table pertains to only one customer. If you wanted to create a two-dimensional matrix with customer name and any other field, for example, phone number), you would realize that there is only a one-to-one correspondence. The following table is an example of a database table with fields that have only a one-to-one correspondence.
Customer | Phone number ---> | ||
---|---|---|---|
Ludwig Pauli | 408-789-8075 | ---------------- | ---------------- |
Carole Sadler | ---------------- | 415-822-1289 | ---------------- |
Philip Currie | ---------------- | ---------------- | 414-328-4543 |
You could put any combination of fields from the preceding customer table in this matrix, but you would always end up with a one-to-one correspondence, which shows that this table is not multidimensional and would not be well suited for a dimensional database.
However, consider a relational table that contains more than a one-to-one correspondence between the fields of the table. Suppose you create a table that contains sales data for products sold in each region of the country. For simplicity, the company has three products that are sold in three regions. The following table shows how you might store this data in a table, using a normalized data model. This table lends itself to multidimensional representation because it has more than one product per region and more than one region per product.
Product | Region | Unit Sales |
---|---|---|
Football | East | 2300 |
Football | West | 4000 |
Football | Central | 5600 |
Tennis racket | East | 5500 |
Tennis racket | West | 8000 |
Tennis racket | Central | 2300 |
Baseball | East | 10000 |
Baseball | West | 22000 |
Baseball | Central | 34000 |
Although this data can be forced into the three-field relational table, the data fits more naturally into the two-dimensional matrix in the following table. This matrix better represents the many-to-many relationship of product and region data shown in the previous table.
Region | Central | East | West | |
---|---|---|---|---|
Product | Football | 5600 | 2300 | 4000 |
Tennis Racket | 2300 | 5500 | 8000 | |
Baseball | 34000 | 10000 | 22000 |
The performance advantages of the dimensional model over the normalized model can be great. A dimensional approach simplifies access to the data that you want to summarize or compare. For example, using the dimensional model to query the number of products sold in the West, the database server finds the West column and calculates the total for all row values in that column. To perform the same query on the normalized table, the database server has to search and retrieve each row where the Region column equals 'West' and then aggregate the data. In queries of this kind, the dimensional table can total all values of the West column in a fraction of the time it takes the relational table to find all the 'West' records.