Skip to main content

How do I build a data warehouse?


The steps are as follows:
  1. Start with understanding the process followed by the org. / company for which you’re thinking of creating a Datawarehouse. Decide whether you need just a Database or a Data Federation or a complete Datawarehouse.
  2. Create a list of questions that you want to answer for the business (that’ll help the business analyse data better & make better decisions using the data) through the datawarehouse.
  3. Find out the Data Source Points & the kind of systems that the data resides in (For example, some data might be in files whereas some might be in Websites like twitter / facebook or databases such as Oracle / SQL Server / MySQL).
  4. Build a Requirement Specification Document & get it approved / reviewed by the Business Owners.
  5. Perform Data Modelling to create a set of tables that you’d need in the Datawarehouse / Database. Also define the kind of dimensions (Slowly Changing, Degenerated, Conformed, Static, etc) & facts (Early Arriving) in case you’re planning to go for Dimensional Data Modelling.
  6. Prepare a Technical Design Document (&/or Mapping Sheet). This should ideally have the exact information on how each column in your datawarehouse is loaded & what exactly are the business rules.
  7. Choose the ETL Tool (or programming language) that is best suited to your company’s budget & datawarehouse requirements.
  8. Create ETL Jobs / Mappings / Pipelines & schedule them depending on how often you want the data to be refreshed in the DataWarehouse.
  9. Re-iterate the process if there are changes (the model where changes are expected soon, is called the Agile Model; the other one where no changes are required anytime soon, is called the Waterfall Model). Don’t forget to prepare a timeline at each step & decide estimated time before you enter the first / third step.
  10. Optimise any step(s) implemented.
(B) Which are the different kinds of building?
There are 2 approaches to data warehousing:
1. Kimball or Bottom-up Approach
In this approach, the data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse.
2. Inmon or Top-down Approach
In this approach, a normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or specific departments are created from the data warehouse.
Note: William H. Inmon (Bill Inmon) is recognized by many as the father of the datawarehouse
(C) What is the process involved using an ETL tool like Informatica?
When using an ETL Tool like Informatica, ETL Developers use GUI components of Informatica to build mappings while referring the mapping sheet or technical design document. The advantage that these tools provide are 1. Visual flow 2. Structured system design 3. Operational resilience 4. Data-lineage and impact analysis 5. Advanced data profiling and cleansing 6. Performance. For detailed info. see: The 7 biggest benefits of ETL tools
(D) How do you test it?
The ETL Developers perform testings such as Unit Testing & Peer Review before sending the code to QA / Testing Team.
ETL Testers can either test it using GUI based ETL Testing Tools like Data Validation Option (DVO) or using SQL Queries or via Manual validations all three of these following the steps as mentioned below:
  1. Analyse Business Requirements & Understand the Business Process using documents like Business Requirement Specification Document, Technical Design Document / Mapping Sheet, etc.
  2. Prepare Test Cases
  3. Implement Test Cases
  4. Share Results with the team & Perform regression testings whenever required.
(E) How can you build data mart using this?
Data Marts are built depending on the requirements of the teams who want a specialised view on data for a specific purpose. Read this classic e-book on Data Marts for details: http://ftp://ftp.wiley.com/public/computer_books/updates/guide.pdf
F. Is data mart going to be OLTP or OLAP? I believe , DWH will be and has to be OLAP.
OLTP (Online Transaction Processing) are the systems from which a Datawarehouse is built. A Datawarehouse is an OLAP (Online Analytical Processing) & so are the Data Marts built from it.
The difference between these systems can be read from: What is the difference between OLTP and OLAP?
(F) What will the DWH be, when it is in star schema?
When a DWH is in Star Schema, it’ll look like the diagram below:-
See details on the following link: Star schema
Hope this answer would be read completely atleast by one. If possible, next time please use a different thread for each of such questions :)

Comments

Popular posts from this blog

Problem: Date Formatting cannot be Changed in Microsoft Excel

In this article, we will learn how to change the date formatting. We will use “Text to Column” wizard to resolve the problem of change the date formatting in Microsoft Excel. Let’s understand the functions: - Text to Column:  “Text to Column” is used for separating the cell content which is depending on the way your data is arranged. You can divide the data on the basis of content in the cell such as space, comma, period, semicolon, etc. Let’s take an example and understand how we can convert the date into Text. We have dates, foramatted as text in column A. Now, we want to convert it into date format.     If we want to convert the formatting into numbers, then we need to follow below given steps:- Select the range A2:A11. Go to Data tab, and click on Text to Columns from the Data tools group.     Covert Text to Columns Wizard – Step1 of 3 dialog box will appear. Select fixed width, and click on Next button.     Skip step-2, and...
How To Remove Gphone Virus Well gphone.exe is nowadays one of the most dangerous virus spreading very rapidly. It is a Trojan and changes your IE homepage and sends tries to open gtalk and yahoo messenger. It even sends messages to gtalk contacts. Its icon is just like that of folder icon and people thinking of folder click on it get infected by the virus. Gphone virus basically is a 260 kb .exe file which looks like a folder and it can take any name of any other folder if you have clicked on the virus folder which looks like a folder but it is not. If you have a folder name ―movies‖ in your D drive it will make a exe file in the folder named movies.exe and if you click on that exe file it too work as a virus. It makes .exe files in all the folders you have with the name of the folder. How to remove this virus Method 1 1. Go to Task Manager then Processes and then click on gphone.exe and click on end process. 2. Manually go to folder where gphone.exe is present and delete it. 3...