 |
 |
| Oracle Tips: |
|
 |
 |

DATA WAREHOUSING & BUSINESS INTELLIGENCE
Core principles of data warehouse design
Tobey Teorey 02.27.2006
Rating: -3.04- (out of 5)




|
The following is an excerpt from Chapter 8 of Database
Modeling and Design: Logical Design, Fourth Edition by Toby Teorey, published by Elsevier in 2006. Toby Teorey is a professor in the Electrical Engineering and Computer Science Department at the University of Michigan, Ann Arbor. Professor Teorey's current research focuses on database design and data warehousing, OLAP, advanced database systems and performance of computer networks.
Click here to read the full chapter.
A data warehouse is a large repository of historical data that can be integrated
for decision support. The use of a data warehouse is markedly different
from the use of operational systems. Operational systems contain
the data required for the day-to-day operations of an organization. This
operational data tends to change quickly and constantly. The table sizes
in operational systems are kept manageably small by periodically purging
old data. The data warehouse, by contrast, periodically receives historical
data in batches, and grows over time. The vast size of data warehouses
can run to hundreds of gigabytes, or even terabytes. The problem
that drives data warehouse design is the need for quick results to queries
posed against huge amounts of data. The contrasting aspects of data
warehouses and operational systems result in a distinctive design
approach for data warehousing.
A data warehouse contains a collection of tools for decision support
associated with very large historical databases, which enables the end
user to make quick and sound decisions. Data warehousing grew out of
the technology for decision support systems (DSS) and executive information
systems (EIS). DSSs are used to analyze data from commonly
available databases with multiple sources, and to create reports. The
report data is not time critical in the sense that a real-time system is, but
it must be timely for decision making. EISs are like DSSs, but more powerful,
easier to use, and more business specific. EISs were designed to provide
an alternative to the classical online transaction processing (OLTP)
systems common to most commercially available database systems.
OLTP systems are often used to create common applications, including
those with mission-critical deadlines or response times.
Let us now take a look at the core requirements and principles that guide
the design of data warehouses (DWs) [Simon, 1995; Barquin and Edelstein,
1997; Chaudhuri and Dayal, 1997; Gray and Watson, 1998]:
- DWs are organized around subject areas. Subject areas are analogous
to the concept of functional areas, such as sales, project
management, or employees, as discussed in the context of ER diagram
clustering in Section 4.5. Each subject area has its own conceptual
schema and can be represented using one or more entities
in the ER data model or by one or more object classes in the
object-oriented data model. Subject areas are typically independent
of individual transactions involving data creation or manipulation.
Metadata repositories are needed to describe source
databases, DW objects, and ways of transforming data from the
sources to the DW.
- DWs should have some integration capability. A common data
representation should be designed so that all the different individual
representations can be mapped to it. This is particularly useful if
the warehouse is implemented as a multidatabase or federated
database.
- The data is considered to be nonvolatile and should be mass
loaded. Data extraction from current databases to the DW
requires that a decision should be made whether to extract the
data using standard relational database (RDB) techniques at the
row or column level or specialized techniques for mass extraction.
Data cleaning tools are required to maintain data quality—for
example, to detect missing data, inconsistent data, homonyms,
synonyms, and data with different units. Data migration, data
scrubbing, and data auditing tools handle specialized problems in
data cleaning and transformation. Such tools are similar to those
used for conventional relational database schema (view) integration.
Load utilities take cleaned data and load it into the DW,
using batch processing techniques. Refresh techniques propagate
updates on the source data to base data and derived data in the
DW. The decision of when and how to refresh is made by the DW
administrator and depends on user needs (e.g., OLAP needs) and
existing traffic to the DW.
- Data tends to exist at multiple levels of granularity. Most important,
the data tends to be of a historical nature, with potentially
high time variance. In general, however, granularity can vary
according to many different dimensions, not only by time frame
but also by geographic region, type of product manufactured or
sold, type of store, and so on. The sheer size of the databases is a
major problem in the design and implementation of DWs, especially
for certain queries, updates, and sequential backups. This
necessitates a critical decision between using a relational database
(RDB) or a multidimensional database (MDD) for the implementation
of a DW.
- The DW should be flexible enough to meet changing requirements
rapidly. Data definitions (schemas) must be broad enough
to anticipate the addition of new types of data. For rapidly changing
data retrieval requirements, the types of data and levels of
granularity actually implemented must be chosen carefully.
- The DW should have a capability for rewriting history, that is,
allowing for "what-if" analysis. The DW should allow the administrator
to update historical data temporarily for the purpose of
"what-if" analysis. Once the analysis is completed, the data must
be correctly rolled back. This condition assumes that the data are
at the proper level of granularity in the first place.
- A usable DW user interface should be selected. The leading
choices today are SQL, multidimensional views of relational data,
or a special-purpose user interface. The user interface language
must have tools for retrieving, formatting, and analyzing data.
- Data should be either centralized or distributed physically. The
DW should have the capability to handle distributed data over a
network. This requirement will become more critical as the use of
DWs grows and the sources of data expand.
Printed with permission from Morgan Kaufmann, a division of Elsevier.
Copyright 2006. Database Modeling and Design: Logical Design, Fourth
Edition by Toby Teorey. For more information about this title and other
similar books, please visit elsevier.com.
 |

|
Rate this Tip
|
To rate tips, you must be a member of SearchOracle.com. Register now
to start rating these tips. Log in if you are already a member.
|


');
// -->
DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.
|
 |
|
|
 |
|
 |
 |
 |
 |
| TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of . |
|
| | |
All Rights Reserved, , TechTarget |
|
|
|
|
|