levselector.com New York
home > Data Warehouse  Email

Data Warehousing,
OLAP & Reporting.
 
On This Page: Other Pages:
- Operations vs. Analysis
- Star & Snowflake configurations
- data mart & data warehouse
- multi-dimentional analysis, OLAP
- software
- books
- MDX (Multi-Dimensional Expressions)
- Informatica
- ERP, CRM, etc.
- ETL
- Teradata

(Executive Information Systems,  Decision Support Systems,  Statistics and Technical Data Analysis,  Neural Networks,  End-User Query and Reporting,  Data Warehousing,  Mapping and Visualization,  Data Mining and OLAP,  )

Operations vs Analysis home - top of the page - email

Let's distinguish between databases optimized for 2 types of work: operations and research.
 
-- operational processing - OLTP(On-Line-Transaction-Processing) Research and Analysis - DSS (Decision Support System),  OLAP (On-Line-Analytical-Processing), Data mining
Optimized for  inserts, updates, and deletes queries
Frequency of updates Frequently (may be every second) Usually once a day. Data for analysis is prepared once a day (at night) at a staging area, then loaded into the main OLAP database - and then used during the day.
Number of indexes Few indexes Many indexes
Level of normalizing the database Normalized to some reasonable degree Heavily de-normalized for easier and faster querying

Some analysis (for example, Multi-Dimensional Analysis) is really much better done using specialized software instead of standard RDBMS.

Star schema & Snowflake configurations home - top of the page - email

One of the difficulties of querying a normalized database is in the big number of tables you may need to join sequentially in one query.  You can easily have to chain 10 and more tables.  This is difficult for a user (he must know his tables really well), and it may have very poor performance. Or even crash the database.

The common approach to resolve this problem is to try to restructure the data. You denormalize your tables. You also restructure them into so-called "star"-configuration to avoid long chains.  This means that you create one big  "facts" table (the center of the star) surrounded by 10-15 "dimension" tables.  This way you avoid long chains. Then you basically query one central "fact" table - and narrow your scope by joining it with some "dimension" tables.  Your chain length =1. Sometimes you may add extra 2-nd layer (chain length=2 - details) - this is called the Snowflake configuration - see images:
 
Star Configuration Snow Flake Configuration

The Star Schema is also known as a "star-join schema", "data cube", and "multidimensional schema".

The main benefit of Star schema configuration is that it makes easy for users to to make reports/queries, especially implementing multi-dimensional views of data with different granularity for different dimensions. The applications (reports) become simplier and easier to understand for the user.

To optimize queries, it is necessary to create corresponding indexes. For example, for a report using data from 4 dimensions (d1,d2,d3,d4) you may use the following index:

          create index i1 on MyFactTable (id_d1, id_d2, id_d3, id_d4)

Database servers may have special built-in optimizations to perform a query on a star schema. For example, in Oracle you can set STAR_TRANSFORMATION_ENABLED=TRUE and create bitmap indexes on keys in the fact table.

Fact table is usually very big (tens of millions of rows - and larger) and gets more and more new data on a daily basis. It make sense to partition it by some time interval, for example by a month or quater of the year. To do this you need to add a column to the fact table (for example, month) by which you will do the partitionning.

There are ways to bulk-load data in parallel - methods are different depending on the database brand (DB2, Oracle, ...). For example, for speed you can disable indexes and logging while doing the load - and then recreate the indexes after the load. Note - as rebuilding of an index for a big table takes a lot of time, there are ways to speed up this as well (partition the index, thus rebuilding index locally, use parallel calculations, turn off logging while rebuilding the index).


Data Mart & Data Warehouse home - top of the page - email

Data mart is usually a set of partiular data designed to meet some specific knowledge needs. It may be one set of tables in Star or Snowflake configuration. But may be something else. A big database with thousands of tables can be represented (for analysis) by tens of Data Marts. Together these Data Marts constitute a "Data Warehouse".

Note, that several different star systems can re-use the same dimension tables.
 

Multi-Dimensional Analysis home - top of the page - email

Example - 4 dimensions:  stores, time periods, products, measures.
Thus data relationships can not be fit in a 2-dimentional table. It is more like a multi-dimensional cube (hyper-cube).
You can slice this cube in different directions - and show a 2-dimensional slice (or 3-dimensional picture).
Note, that some of cube dimensions correspond to Dimensions of the data mart (something you would group by - stores, time periods, products), whereas others may be calculated measures (total, average, etc.).
Each dimension usually has some hierarchy - multiple levels of granularity (for example for time: hour, day, week, month, etc.).

The OLAP existed for ~30 years, but the term OLAP (On-Line Analytical Processing) was introduced only in 1993 by Dr. Codd et al. in paper "Providing OLAP to User-Analysts: An IT Mandate".  Yes, the same Codd who formulated database principles in 70s. The work was sponsored by Arbor Software, the creators of ESSBASE.

OLAP calculations should be fast (under 30 seconds).

ROLAP - analysis system based on Relational database.
MOLAP - analysis system using multidimensional database (much faster).
HOLAP - Hybrid OLAP.

Formulas
Links.

When designing your model, adding hierarchies takes less computer resources than adding dimensions.

Search google for olap freethink, praxis olap software, infocube, oracle express, etc.
- http://altaplana.com/olap/olap.html - collection of information related to Online Analytical Processing (OLAP)
- http://www-users.cs.york.ac.uk/~kimble/research/ak/info.html -
- www.infogoal.com/dmc/dmcdwh.htm - Directory of Data Warehouse, Data Mining, and Decision Support Resources
- http://bprc.warwick.ac.uk/bp-tool.html - hardware/software vendors
- www.praxisint.com/solution/businessintelligence/index.asp - Praxis International (FreeThink)
- www.credata.com/research/dwsites.html - Hot Sites on Data Warehousing
- http://www.dmreview.com/master.cfm?NavID=152&AuthorID=1573 - OLAP Market Review
- www.dmreview.com/ - DM Review Magazine

Search Google for OLAP:
- http://directory.google.com/Top/Computers/Software/Databases/OLAP/?tc=1 -
- http://directory.google.com/Top/Computers/Software/Databases/Data_Warehousing/ -
- http://directory.google.com/Top/Computers/Software/Databases/Data_Mining/ -

software home - top of the page - email

There are so many specialized software packages for data mining and OLAP - more than enough to have your head spin. For a sample list go to
- www.knowledgestorm.com/ - and search for OLAP - you will get  ~200s of software solutions. Here are some (not properly sorted, and I am sure that some important ones are missed - please let me know at - selector@pipeline.com) :
 
 
- www.businessobjects.com/ - BusinessObjects - powerful reporting system, a BI (Business Intelligence) platform and integrated BI toolset.  Have designer application, deliver via browser. Has effective and optimized subscribtion/distribution mechanism. Allows to do multidimensional analysis, slicing/dicing, drill down/up.

- www.cognos.com/products/powerplay/ - Cognos PowerPlay - by Cognos, Inc. - the world's best-selling OLAP software lets users explore volumes of data with sub-second response.  Can use browser or Excel as a client. Has data store and application server to serve thosands of users simultaneously.

- Microsoft Analysis Services for MS SQL Server. Took big share of the market. MDX language (Multidimensional Expression (MDX) - like SQL, but works with multidimensional cubes).  ProClarity (www.proclarity.com/) offers third party OLAP CLient (Knosys).

- www.hyperion.com/ - Hyperion's Essbase OLAP server and Analysis Tools - used by millions of users.

- DB2 OLAP server - IBM's Essbase clone,  IBM Red Brick® Warehouse - a high-performance database server (from Informix)

- http://www.teradata.com - Teradata (spin-off from NCR in 2007) - database engine optimized for Datawarehousing (for reading, not for inserting).  Used for huge data warehouses (Fortune 100 companies, for example, major arilines). NCR = National Cash Register.

- www.oracle.com/olap/ - Oracle Express - Multidimensional Server (MOLAP), the 4GL SPL (Server Programming Language), Oracle Express Objects toolkit (to handle multi-dimensional data), many utilities for administraiont, monitoring, loading the data, OWB (Oracle Warehouse Builder - ROLAP), Oracle Express Analyzer, Oracle Discoverer, Express Web Publisher, Spreadsheet Add-ins, Oracle Darwin (datamining software that finds meaningful patterns hidden within corporate data), Oracle Financials, etc.

- www.crystaldecisions.com - by Crystal Decisions (Former Seagate Software) - web-based infrastructure  for analysis and reporting accross the enterprise (Crystal Reports, Crystal Enterprise, Crystal Applications, Crystal Analysis, Holos Analytic System, Seagate Info, BI Technology Spectrum). Simple distribution of pdf reports.

- www.applix.com/solutions/ - Applix ITM1

- www.MicroStrategy.com - MicroStrategy - OLAP, etc.

- www.brio.com/products/overview.html - Brio Intelligence, Brio Portal, Brio Reports - build business intelligence, enterprise reporting, analytics. Slow performance. Powerful query analyser tool which works over the web (via plugin).  Good Javascript model which allows the building of "EIS screens" to give pushbutton / search box / sort etc. without the user having to understand how to use pivottables and drill-down graphs.

- www.sap.com/solutions/bi/ - mySAP Business Intelligence, Business Warehouse, etc. - by SAP America, Inc.

- www.actuate.com - Actuate - The Actuate e.Reporting Suite 5. Report server for report generation - produces ROI (proprietary format), PDF and spreadsheet.  Reports have rich content and can be hyper-linked. Actuate runs on both Unix and NT servers. Example of usage: generate 30,000 reports each night.

* www.hammerman.com/faq.html - database reporting FAQ
* www.livewarepub.com/ - R&R Report Writer - Liveware Publishing, Inc. (X-base and SQL versions (formerly known as Arpeggio))
* www.srs-inc.com/Products/ReportSmith/RSFeatures.asp - ReportSmith - (Strategic Reporting Systems, Inc.)
 
 
- www.informatica.com/ - Informatica - good product for data preparation in the staging area.
- www.whitelight.com/ - WhiteLight analytic platform
- www.siebel.com - Siebel eMarketing - by Siebel Systems, Inc.  - Siebel eMarketing enables organizations to create, execute, and assess Web-based marketing campaigns.
-  -  ADAPT (Application Design for Analytical Processing Technologies) by Symmetry Corporation - design methodology specifically to represent the objects and operations found in OLAP applications.
-  - OutlookSoft Enterprise Analytic Portal - by OutlookSoft
-  - WebFOCUS Business Intelligence Dashboard & Developers studiio - by Information Builders, Inc. - reporting
- www.PeopleSoft.com - PeopleSoft Tree and Cube Manager - by PeopleSoft -  users can define the data they want to extract into an OLAP cube.

-  - KPMG's Interpreter - transfer pricing software -  by KPMG LLP
-  - Cardinal Supervisor and Cardinal Explorer - by Symtrax - automatically captures, formats and distributes output from Mainframe, ERP, database and other systems.
-  - Outfront CRM[tm] by Primal Solutions, Inc. - helps you identify who your high-value customers are and why, and how to ...
-  - PolyAnalyst - by Megaputer Intelligence Inc.  - a complete high-end data mining tool.
-  - Datapan manufacturerAgent - by Askfor Business Solutions - real-time pdf customized reports
-  - DI-Atlantis - by Dimensional Insight, Inc.- multidimensional software package for intuitive access and analysis.
-  - eEnterprise Business Analytics Series - by Microsoft Great Plains Business Solutions

-  - DocuAnalyzer - by Mobius Management Systems, Inc.
-  - Infinium Business Intelligence Analytic Server (BIAS) - by Infinium Software, Inc. - a multi-dimensional business analysis
-  - Horizon OLAP Reporting and Analysis - by Walker Interactive Systems, Inc. - OLAP Reporting and Analysis is the first reporting application built specifically for financial reporting, analysis and modeling from a multi-dimensional database.
 

-  - DataTracker - by Silvon Software, Inc. - OLAP
-  - MARS - by Salford Systems  - an automated regression software tool which is highly competitive to Neural Nets in the solution but the insights are more clear.  See also CART (decision tree tool to search for patterns)
-  - Adaytum e.Planning - by Adaytum Software Inc  - an enterprise business planning solution

-  - IDENTITY SYSTEMS (IDS) - by Search Software America - Persons, Organizations and Addresses can be searched, matched and retrieved despite the error, variation or format of the data.
-  - BizWorks - by interBiz - an eBusiness Process Management Suite

-  - WizWhy - by WizSoft, Inc.  - a data mining tool for revealing trends, rules, errors, and interesting phenomena in the data
-  - Hummingbird BI/Suite - by Hummingbird Ltd.  - a fully integrated, scalable business intelligence solution (OLAP)
-  - CorManage - by CorVu Corporation - Balanced Scorecard applications, OLAP Query and Reporting, executive dashboard alerts, forecasting and what-if analysis.
-  - AnalyzerSuite - by ChiCor Information Management, Inc.  - OLAP-based AnalyzerSuite  - create graphical representations of multi-dimensional data.
-  - Comshare Decision - by Comshare, Inc.  - Enables rapid development of custom business intelligence applications such as balanced scorecard, customer and market analysis and BI portals.

-  - NetWORKS ONEview - by Manugistics, Inc.  - a set of applications based on industry-standard OLAP technology
-  - iM:Analyzer - by iMedeon, Inc. - a Results Performance Manager (RPM) with a knowledge repository
-  - Monarch/ES Report Portal: Business Intelligence & Web Enterprise Reporting - by Datawatch Corporation -
-  - Nuggets - by Data Mining Technologies, Inc.  - Data Mining Software Toolkit - Rule Induction Engine, Prediction, Segmentation, Validation. Uses state of the art proprietary rule induction without statistics.
-  - WebFOCUS OLAP - by Information Builders, Inc.  - automatically OLAP-enables any report, so users can slice-and-dice information and detect patterns in any data-with or without cubes.
-  - BayesOn-Line - by Gensym Corporation  -  BayesOn-Line (BOL) is a software tool for reasoning about uncertainty and for learning relationships among variables using a graphical modeling technology.
-  - Deap - by Deap Systems, Inc.  - identification, extraction, transformation, and transportation of operational SAP R/3 data into alternate RDBMSs.
-  - KnowledgeAccess, KnowledgeSeeker, KnowledgeSTUDIO - by ANGOSS Software Corp  - data miningwith Excel data

-  - Monarch Data Pump - by Datawatch Corporation - extracts data from reports and/or database - and passes it further in various formats
-  - Commerce Intelligence[tm] 4.0
-  - Proactive[, Accelerators, Essentials - by Personify, Inc. -
-  - Affinium Model - by Unica Corporation - award-winning data mining software
-  - Intelli-Mine Business Intelligence Software for Retail - by Intelli-Mine - an analytical platform
-  - Data Warehouse and Data Mart Services - by Appix, Inc. - allow you to design, execute, and implement data warehouses.
-  - DQmart - by Decision Support Inc.  - a data mart solution for financial institutions that use the MISER system.

-  - PROPHIX - by EPS Software - Multi-user financial software application used for Budgeting, analysis, planning, reporting.
-  - Synergen Series? EAM/CMMS Solution - by Synergen, Inc.  - enterprise asset management EAM/CMMS software solution
-  - WebFOCUS Developers Studio
-  - SPSS Base, SmartViewer - by SPSS, Inc. - share report cubes, graphs and tables electronically
-  - Business Decision Manager, Baan Business Intelligence Suite (BaanBIS) - by Baan USA, Inc. manage real-time business knowledge across the entire enterprise

books home - top of the page - email

Books:
- Corporate Information Factory, 2nd Edition - by William H. Inmon, Claudia Imhoff, Ryan Sousa (2000)
- Building the Data Warehouse (3rd Edition) -- byWilliam H. Inmon (2002);
- Managing the Data Warehouse -- by William H. Inmon et al (1996)
- Data Warehouse Performance -- by William H. Inmon et al (1998)
- The Data Warehouse Lifecycle Toolkit : Expert Methods for Designing, Developing, and Deploying Data Warehouses -- by Ralph Kimball (1998)
- The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition) - by Ralph Kimball, et al (2002)
- OLAP Solutions: Building Multidimensional Information Systems - by Erik Thomsen (1997)
- Data Warehousing, Data Mining, and OLAP (Data Warehousing/Data Management) - by Alex Berson, Stephen J. Smith (Contributor) (????) 

- Data Warehouse Design Solutions - by Christopher Adamson, Michael Venerable (????)
- Data Warehousing and Decision Support : The State of the Art, Volume 1 - by Pam Roth (1995)
- Data Warehousing in the Real World : A Practical Guide for Building Decision Support Systems - by Sam Anahory, Dennis Murray (Contributor) (1997)
- Data Warehousing; Building the Corporate Knowledge Base - by Thomas Hammergren (????)
- Business Intelligence : The IBM Solution - by Mark Whitehorn, Mary Whitehorn (1999)

- Data Model Patterns : Conventions of Thought - by David C. Hay (1995)
- The Balanced Scorecard : Translating Strategy into Action - by Robert S. Kaplan, David P. Norton (1996)
- e-Business Intelligence: Turning Information into Knowledge into Profit - by Bernard Liautaud, Mark Hammond (Contributor) (????)
- Building the Customer-Centric Enterprise: Data Warehousing Techniques for Supporting Customer Relationship Management - by Claudia Imhoff, et al (????)
- Business Information Warehouse for SAP - by Naeem Hashmi (2000)

- Data Warehouse Project Management - by Sid Adelman, Larissa Terpeluk Moss (????)
- Data Warehousing For Dummies® - by Alan R. Simon (????)
- Data Warehouse : From Architecture to Implementation - by Barry Devlin (1996)
- Building and Managing the Meta Data Repository: A Full Lifecycle Guide - by David Marco (????)
- Data Mining and Business Intelligence : A Guide to Productivity - by Stephan Kudyba, Richard Hoptroff (????)

- Business Intelligence Using Smart Techniques : Environmental Scanning Using Text Mining and Competitor Analysis Using Scenarios and Manual Simulation - by Charles Halliman (????)
- Data Warehousing and Business Intelligence for E-Commerce - by Alan R. Simon, Steven L. Shaffer (????)
- Data Mining Techniques : For Marketing, Sales, and Customer Support - by Michael J. A. Berry, Gordon S. Linoff (????)
- Introduction to Data Mining and Knowledge Discovery, Third Edition - by Herbert A. Edelstein (????)
- Advances in Knowledge Discovery and Data Mining - by Usama M. Fayyad (Editor), et al (????)
- Neural Networks for Pattern Recognition - by Christopher M. Bishop (1995)
- Web Analytics : Translating Clicks into Business [DOWNLOAD: PDF from Amazon] - by Guy Crease et al. (????)

- Professional SQL Server 2000 DTS (Data Transformation Service) - by Mark Chaffin, et al (2000)
- MDX Solutions: With Microsoft SQL Server Analysis Services -- by George Spofford (????) - Book about the language (MDX) used for querying Microsoft OLAP databases.
- Microsoft(r) SQL Server(tm) 2000 Analysis Services Step by Step - by Olap Train (Editor), Reed Jacobson (????)

- Oracle Express Olap - by Sergei Arkhipenkov, Dmitri Golubev (2001)
- Microsoft(r) SQL Server(tm) 2000 Analysis Services Step by Step - by Olap Train (Editor), Reed Jacobson (????)
- SQL Server Developer's Guide to OLAP with Analysis Services - by Mike Gunderloy, Tim Sneath (????)
- Unlocking OLAP With SQL Server 7 and Excel 2000 - by Wayne S. Freeze (1999)
- Microsoft Olap Solutions - by Erik Thomsen, et al (1999)
- Professional Data Warehousing with SQL Server 7.0 and OLAP Services - by Sakhr Youness (2000)
- Microsoft OLAP Unleashed - by Timothy Peterson, et al (1997)
- Microsoft Olap/Analysis Services Unleashed - Timothy Peterson, Tim Peterson (2001)

- SQL Server 7 OLAP Developer's Guide (With CD-ROM) - by William C. Amo, Bill Amo (1999)
- A Practical Guide to Microsoft(R) OLAP Server - by John Shumate (1998)
- Data Warehousing, Data Mining, and OLAP (Data Warehousing/Data Management) - by Alex Berson, Stephen J. Smith (Contributor) (????)
- Visual Basic Developer's Guide to OLAP Services with ASP and SQL Server (With CD-ROM) - by Tim Sneath, Tim Sneath (2002)
- SQL Server Developer's Guide to OLAP with Analysis Services - by Mike Gunderloy, Tim Sneath (1998)