Home > Oracle Tips > Data Warehousing & Business Intelligence > Core principles of data warehouse design
Oracle Tips:
EMAIL THIS
 TIPS & NEWSLETTERS TOPICS 

DATA WAREHOUSING & BUSINESS INTELLIGENCE

Core principles of data warehouse design


Tobey Teorey
02.27.2006
Rating: -3.04- (out of 5)


Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


The following is an excerpt from Chapter 8 of Database Modeling and Design: Logical Design, Fourth Edition by Toby Teorey, published by Elsevier in 2006. Toby Teorey is a professor in the Electrical Engineering and Computer Science Department at the University of Michigan, Ann Arbor. Professor Teorey's current research focuses on database design and data warehousing, OLAP, advanced database systems and performance of computer networks.

Click here to read the full chapter.


A data warehouse is a large repository of historical data that can be integrated for decision support. The use of a data warehouse is markedly different from the use of operational systems. Operational systems contain the data required for the day-to-day operations of an organization. This operational data tends to change quickly and constantly. The table sizes in operational systems are kept manageably small by periodically purging old data. The data warehouse, by contrast, periodically receives historical data in batches, and grows over time. The vast size of data warehouses can run to hundreds of gigabytes, or even terabytes. The problem that drives data warehouse design is the need for quick results to queries posed against huge amounts of data. The contrasting aspects of data warehouses and operational systems result in a distinctive design approach for data warehousing.

A data warehouse contains a collection of tools for decision support associated with very large historical databases, which enables the end user to make quick and sound decisions. Data warehousing grew out of the technology for decision support systems (DSS) and executive information systems (EIS). DSSs are used to analyze data from commonly available databases with multiple sources, and to create reports. The report data is not time critical in the sense that a real-time system is, but it must be timely for decision making. EISs are like DSSs, but more powerful, easier to use, and more business specific. EISs were designed to provide an alternative to the classical online transaction processing (OLTP) systems common to most commercially available database systems. OLTP systems are often used to create common applications, including those with mission-critical deadlines or response times.

Let us now take a look at the core requirements and principles that guide the design of data warehouses (DWs) [Simon, 1995; Barquin and Edelstein, 1997; Chaudhuri and Dayal, 1997; Gray and Watson, 1998]:

  1. DWs are organized around subject areas. Subject areas are analogous to the concept of functional areas, such as sales, project management, or employees, as discussed in the context of ER diagram clustering in Section 4.5. Each subject area has its own conceptual schema and can be represented using one or more entities in the ER data model or by one or more object classes in the object-oriented data model. Subject areas are typically independent of individual transactions involving data creation or manipulation. Metadata repositories are needed to describe source databases, DW objects, and ways of transforming data from the sources to the DW.

  2. DWs should have some integration capability. A common data representation should be designed so that all the different individual representations can be mapped to it. This is particularly useful if the warehouse is implemented as a multidatabase or federated database.

  3. The data is considered to be nonvolatile and should be mass loaded. Data extraction from current databases to the DW requires that a decision should be made whether to extract the data using standard relational database (RDB) techniques at the row or column level or specialized techniques for mass extraction. Data cleaning tools are required to maintain data quality—for example, to detect missing data, inconsistent data, homonyms, synonyms, and data with different units. Data migration, data scrubbing, and data auditing tools handle specialized problems in data cleaning and transformation. Such tools are similar to those used for conventional relational database schema (view) integration. Load utilities take cleaned data and load it into the DW, using batch processing techniques. Refresh techniques propagate updates on the source data to base data and derived data in the DW. The decision of when and how to refresh is made by the DW administrator and depends on user needs (e.g., OLAP needs) and existing traffic to the DW.

  4. Data tends to exist at multiple levels of granularity. Most important, the data tends to be of a historical nature, with potentially high time variance. In general, however, granularity can vary according to many different dimensions, not only by time frame but also by geographic region, type of product manufactured or sold, type of store, and so on. The sheer size of the databases is a major problem in the design and implementation of DWs, especially for certain queries, updates, and sequential backups. This necessitates a critical decision between using a relational database (RDB) or a multidimensional database (MDD) for the implementation of a DW.

  5. The DW should be flexible enough to meet changing requirements rapidly. Data definitions (schemas) must be broad enough to anticipate the addition of new types of data. For rapidly changing data retrieval requirements, the types of data and levels of granularity actually implemented must be chosen carefully.

  6. The DW should have a capability for rewriting history, that is, allowing for "what-if" analysis. The DW should allow the administrator to update historical data temporarily for the purpose of "what-if" analysis. Once the analysis is completed, the data must be correctly rolled back. This condition assumes that the data are at the proper level of granularity in the first place.

  7. A usable DW user interface should be selected. The leading choices today are SQL, multidimensional views of relational data, or a special-purpose user interface. The user interface language must have tools for retrieving, formatting, and analyzing data.

  8. Data should be either centralized or distributed physically. The DW should have the capability to handle distributed data over a network. This requirement will become more critical as the use of DWs grows and the sources of data expand.

Printed with permission from Morgan Kaufmann, a division of Elsevier. Copyright 2006. Database Modeling and Design: Logical Design, Fourth Edition by Toby Teorey. For more information about this title and other similar books, please visit elsevier.com.


Rate this Tip
To rate tips, you must be a member of SearchOracle.com.
Register now to start rating these tips. Log in if you are already a member.




Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us   


RELATED CONTENT
Data Warehousing & Business Intelligence
Tip: Exposing logical attributes with virtual columns
Top five data management buzzwords
A bit wiser with Oracle technology
RFID: Metadata in motion
Oracle vs. SAP
Learning Guide: Data warehousing and business intelligence
Saving lives with Oracle
Do people really understand data warehousing?
The BI application consolidation challenge
MetaBase scripting for the Oracle data warehousing DBA

Oracle BI (business intelligence)
Oracle's business intelligence unit scores a win at the polls, signs Gallup
Oracle's Fusion Applications: Way behind or right on track?
Oracle updates its enterprise performance management system, details Hyperion integration
Enterprise search a key component of Oracle portal project
Data warehouse appliances demand diligence
Will Oracle outbid SAP for Business Objects?
Oracle unveils BI suite for small companies
Top five data management buzzwords
SQL analytics: A self-service model
New Oracle BI tool adds color to compliance efforts
Oracle BI (business intelligence) Research

Oracle data warehousing
IOUG chief discusses OpenWorld, DBA pay and security
Oracle enters the appliance market with Exadata
Tip: Exposing logical attributes with virtual columns
Oracle users prepare for MDM
Data modeling tools no substitute for hard work
eHarmony spurns Microsoft, finds match with Oracle 10g
Separate database for each client in Oracle 10g
Help with data warehouse disaster recovery planning
ORA-600 error and slow server response after upgrade
Extracting data from a specific period

RELATED GLOSSARY TERMS
Terms from Whatis.com − the technology online dictionary
bioinformatics  (SearchOracle.com)

RELATED RESOURCES
2020software.com, trial software downloads for accounting software, ERP software, CRM software and business software systems
Search Bitpipe.com for the latest white papers and business webcasts
Whatis.com, the online computer dictionary

DISCLAIMER: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.



Oracle Development Solutions - SQL, J2EE, XML, SOA
HomeNewsTopicsTipsAsk the ExpertsMultimediaWhite PapersProductsBlogs
About Us  |  Contact Us  |  For Advertisers  |  For Business Partners  |  Site Index  |  RSS
SEARCH 
TechTarget provides enterprise IT professionals with the information they need to perform their jobs - from developing strategy, to making cost-effective IT purchase decisions and managing their organizations' IT projects - with its network of technology-specific Web sites, events and magazines.

TechTarget Corporate Web Site  |  Media Kits  |  Reprints  |  Site Map




All Rights Reserved, Copyright 2003 - 2008, TechTarget | Read our Privacy Policy
  TechTarget - The IT Media ROI Experts