The steps are as follows:
- Start with understanding the process followed by the org. / company for which you’re thinking of creating a Datawarehouse. Decide whether you need just a Database or a Data Federation or a complete Datawarehouse.
- Create a list of questions that you want to answer for the business (that’ll help the business analyse data better & make better decisions using the data) through the datawarehouse.
- Find out the Data Source Points & the kind of systems that the data resides in (For example, some data might be in files whereas some might be in Websites like twitter / facebook or databases such as Oracle / SQL Server / MySQL).
- Build a Requirement Specification Document & get it approved / reviewed by the Business Owners.
- Perform Data Modelling to create a set of tables that you’d need in the Datawarehouse / Database. Also define the kind of dimensions (Slowly Changing, Degenerated, Conformed, Static, etc) & facts (Early Arriving) in case you’re planning to go for Dimensional Data Modelling.
- Prepare a Technical Design Document (&/or Mapping Sheet). This should ideally have the exact information on how each column in your datawarehouse is loaded & what exactly are the business rules.
- Choose the ETL Tool (or programming language) that is best suited to your company’s budget & datawarehouse requirements.
- Create ETL Jobs / Mappings / Pipelines & schedule them depending on how often you want the data to be refreshed in the DataWarehouse.
- Re-iterate the process if there are changes (the model where changes are expected soon, is called the Agile Model; the other one where no changes are required anytime soon, is called the Waterfall Model). Don’t forget to prepare a timeline at each step & decide estimated time before you enter the first / third step.
- Optimise any step(s) implemented.
(B) Which are the different kinds of building?
There are 2 approaches to data warehousing:
1. Kimball or Bottom-up Approach
In this approach, the data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse.
2. Inmon or Top-down Approach
In this approach, a normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or specific departments are created from the data warehouse.
In this approach, a normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or specific departments are created from the data warehouse.
Note: William H. Inmon (Bill Inmon) is recognized by many as the father of the datawarehouse
(C) What is the process involved using an ETL tool like Informatica?
When using an ETL Tool like Informatica, ETL Developers use GUI components of Informatica to build mappings while referring the mapping sheet or technical design document. The advantage that these tools provide are 1. Visual flow 2. Structured system design 3. Operational resilience 4. Data-lineage and impact analysis 5. Advanced data profiling and cleansing 6. Performance. For detailed info. see: The 7 biggest benefits of ETL tools
(D) How do you test it?
The ETL Developers perform testings such as Unit Testing & Peer Review before sending the code to QA / Testing Team.
ETL Testers can either test it using GUI based ETL Testing Tools like Data Validation Option (DVO) or using SQL Queries or via Manual validations all three of these following the steps as mentioned below:
- Analyse Business Requirements & Understand the Business Process using documents like Business Requirement Specification Document, Technical Design Document / Mapping Sheet, etc.
- Prepare Test Cases
- Implement Test Cases
- Share Results with the team & Perform regression testings whenever required.
(E) How can you build data mart using this?
Data Marts are built depending on the requirements of the teams who want a specialised view on data for a specific purpose. Read this classic e-book on Data Marts for details: http://ftp://ftp.wiley.co m/public/computer_books/u pdates/guide.pdf
F. Is data mart going to be OLTP or OLAP? I believe , DWH will be and has to be OLAP.
OLTP (Online Transaction Processing) are the systems from which a Datawarehouse is built. A Datawarehouse is an OLAP (Online Analytical Processing) & so are the Data Marts built from it.
The difference between these systems can be read from: What is the difference between OLTP and OLAP?
(F) What will the DWH be, when it is in star schema?
When a DWH is in Star Schema, it’ll look like the diagram below:-
See details on the following link: Star schema
Hope this answer would be read completely atleast by one. If possible, next time please use a different thread for each of such questions :)
Comments