Use the snowflake schema for hierarchical dimension tables
A snowflake schema is a variation on the star schema,
in which very large dimension tables are normalized into multiple
tables. Dimensions with hierarchies can be decomposed into a snowflake
structure when you want to avoid joins to big dimension tables when
you are using an aggregate of the fact table. For example, if you
have brand information that you want to separate out from a product dimension
table, you can create a brand snowflake that consists of a single
row for each brand and that contains significantly fewer rows than
the product dimension table. The following figure shows a snowflake
structure for the brand and product line elements and the brand_agg aggregate
table.
If you create an aggregate table, brand_agg, that consists
of the brand code and the total revenue per brand, you can use the
snowflake schema to avoid the join to the much larger sales table.
For example, you can use the following query on the brand and brand_agg tables:
SELECT brand.brand_name, brand_agg.total_revenue
FROM brand, brand_agg
WHERE brand.brand_code = brand_agg.brand_code
AND brand.brand_name = 'Anza'
Without a snowflaked dimension table, you use a SELECT UNIQUE or SELECT DISTINCT statement on the entire product table (potentially, a very large dimension table that includes all the brand and product-line attributes) to eliminate duplicate rows.
While snowflake schemas are unnecessary when the dimension tables are relatively small, a retail or mail-order business that has customer or product dimension tables that contain millions of rows can use snowflake schemas to significantly improve performance.
If an aggregate table is not available, any joins to a dimension
element that was normalized with a snowflake schema must now be a
three-way join, as the following query shows. A three-way join reduces
some of the performance advantages of a dimensional database.
SELECT brand.brand_name, SUM(sales.revenue)
FROM product, brand, sales
WHERE product.brand_code = brand.brand_code
AND brand.brand_name = 'Alltemp'
GROUP BY brand_name