Overview of data warehousing
Data warehouse databases provide a decision support system (DSS) environment in which you can evaluate the performance of an entire enterprise over time.
In the broadest sense, the term data warehouse is used to refer to a database that contains very large stores of historical data. The data is stored as a series of snapshots, in which each record represents data at a specific time. By analyzing these snapshots you can make comparisons between different time periods. You can then use these comparisons to help make important business decisions.
Data warehouse databases are optimized for data retrieval. The duplication or grouping of data, referred to as database denormalization, increases query performance and is a natural outcome of the dimensional design of the data warehouse. By contrast, traditional online transaction processing (OLTP) databases automate day-to-day transactional operations. OLTP databases are optimized for data storage and strive to eliminate data duplication. Databases that achieve this goal are referred to as normalized databases.
An enterprise data warehouse (EDW) is a data warehouse that services the entire enterprise. An enterprise data warehousing environment can consist of an EDW, an operational data store (ODS), and physical and virtual data marts.
A data warehouse can be implemented in several different ways. You can use a single data management system, such as HCL OneDB™, for both transaction processing and business analytics. Or, depending on your system workload requirements, you can build a data warehousing environment that is separate from your transactional processing environment.
HCL OneDB uses the umbrella terms data warehousing and data warehousing environment to encompass any of the following forms that you might use to store your data:
- Data warehouse
- A database that is optimized for data retrieval to facilitate
reporting and analysis. A data warehouse incorporates information
about many subject areas, often the entire enterprise. Typically you
use a dimensional data model to design a data warehouse. The data
is organized into dimension tables and fact tables using star and
snowflake schemas. The data is denormalized to improve query performance.
The design of a data warehouse often starts from an analysis of what
data already exists and how to collected in such a way that the data
can later be used. Instead of loading transactional data directly
into a warehouse, the data is often integrated and transformed before
it is loaded into the warehouse.The primary advantage of a data warehouse is that it provides easy access to and analysis of vast stores of information on many subject areas.
- Data mart
- A database that is oriented towards one or more specific subject areas of a business, such as tracking inventories or transactions, rather than an entire enterprise. A data mart is used by individual departments or groups. Like a data warehouse, you typically use a dimensional data model to build a data mart. For example the data mart might use a single star schema comprised of one fact table and several dimension tables. The design of a data mart often starts with an analysis of what data the user needs rather than focusing on the data that already exists.
- Operational data store
- A subject-oriented system that is optimized for looking up one or two records at a time for decision making. An operational data store (ODS) is a hybrid form of data warehouse that contains timely, current, integrated information. Including the ODS in the data warehousing environment enables access to more current data more quickly, particularly if the data warehouse is updated by one or more batch processes rather than updated continuously. The data typically is of a higher level granularity than the transaction. You can use an ODS for clerical, day-to-day decision making. This data can serve as the common source of data for data warehouses.