Skip to main content

How do I build a data warehouse?


The steps are as follows:
  1. Start with understanding the process followed by the org. / company for which you’re thinking of creating a Datawarehouse. Decide whether you need just a Database or a Data Federation or a complete Datawarehouse.
  2. Create a list of questions that you want to answer for the business (that’ll help the business analyse data better & make better decisions using the data) through the datawarehouse.
  3. Find out the Data Source Points & the kind of systems that the data resides in (For example, some data might be in files whereas some might be in Websites like twitter / facebook or databases such as Oracle / SQL Server / MySQL).
  4. Build a Requirement Specification Document & get it approved / reviewed by the Business Owners.
  5. Perform Data Modelling to create a set of tables that you’d need in the Datawarehouse / Database. Also define the kind of dimensions (Slowly Changing, Degenerated, Conformed, Static, etc) & facts (Early Arriving) in case you’re planning to go for Dimensional Data Modelling.
  6. Prepare a Technical Design Document (&/or Mapping Sheet). This should ideally have the exact information on how each column in your datawarehouse is loaded & what exactly are the business rules.
  7. Choose the ETL Tool (or programming language) that is best suited to your company’s budget & datawarehouse requirements.
  8. Create ETL Jobs / Mappings / Pipelines & schedule them depending on how often you want the data to be refreshed in the DataWarehouse.
  9. Re-iterate the process if there are changes (the model where changes are expected soon, is called the Agile Model; the other one where no changes are required anytime soon, is called the Waterfall Model). Don’t forget to prepare a timeline at each step & decide estimated time before you enter the first / third step.
  10. Optimise any step(s) implemented.
(B) Which are the different kinds of building?
There are 2 approaches to data warehousing:
1. Kimball or Bottom-up Approach
In this approach, the data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse.
2. Inmon or Top-down Approach
In this approach, a normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or specific departments are created from the data warehouse.
Note: William H. Inmon (Bill Inmon) is recognized by many as the father of the datawarehouse
(C) What is the process involved using an ETL tool like Informatica?
When using an ETL Tool like Informatica, ETL Developers use GUI components of Informatica to build mappings while referring the mapping sheet or technical design document. The advantage that these tools provide are 1. Visual flow 2. Structured system design 3. Operational resilience 4. Data-lineage and impact analysis 5. Advanced data profiling and cleansing 6. Performance. For detailed info. see: The 7 biggest benefits of ETL tools
(D) How do you test it?
The ETL Developers perform testings such as Unit Testing & Peer Review before sending the code to QA / Testing Team.
ETL Testers can either test it using GUI based ETL Testing Tools like Data Validation Option (DVO) or using SQL Queries or via Manual validations all three of these following the steps as mentioned below:
  1. Analyse Business Requirements & Understand the Business Process using documents like Business Requirement Specification Document, Technical Design Document / Mapping Sheet, etc.
  2. Prepare Test Cases
  3. Implement Test Cases
  4. Share Results with the team & Perform regression testings whenever required.
(E) How can you build data mart using this?
Data Marts are built depending on the requirements of the teams who want a specialised view on data for a specific purpose. Read this classic e-book on Data Marts for details: http://ftp://ftp.wiley.com/public/computer_books/updates/guide.pdf
F. Is data mart going to be OLTP or OLAP? I believe , DWH will be and has to be OLAP.
OLTP (Online Transaction Processing) are the systems from which a Datawarehouse is built. A Datawarehouse is an OLAP (Online Analytical Processing) & so are the Data Marts built from it.
The difference between these systems can be read from: What is the difference between OLTP and OLAP?
(F) What will the DWH be, when it is in star schema?
When a DWH is in Star Schema, it’ll look like the diagram below:-
See details on the following link: Star schema
Hope this answer would be read completely atleast by one. If possible, next time please use a different thread for each of such questions :)

Comments

Popular posts from this blog

The Best Web Hosting Services

Are you looking for the best web  hosting  services for your needs? Whether you need a place to host your small personal blog or a major corporate website, the following list will help you identify the best hosts to use. Finding the best web hosting service isn’t quite as straightforward as searching Google and choosing the one with the lowest price. There are a lot of issues to consider, including the reasons for  why  you need hosting and  how  you intend to use it. Once you have a handle on that, finding the right host becomes much easier. Choose one that’s undersized and you’ll end up with website outages and slow page loads, but choose one that’s oversized and you’ll be throwing money away. Defining Your Web Hosting Needs Before choosing your web host, you’ll need to think about your requirements. Consider the following concerns and decide the importance of each item on a scale of 0 to 10 (with 0 being not at all...

Now You Can Use Reliance Jio 4G Services On 2G And 3G Smartphones

Indians will always be at the top in availing any free internet facility. As, Reliance Jio aims to offer free 4G internet to the 90% of Indians, hence, with one of its services now you can use Jio 4G services on 2G and 3G smartphones. Now You Can Use Reliance Jio 4G Services On 2G And 3G Smartphones Who doesn’t want the free internet? Of course, we all want, Indians will always be at the top in availing any free internet facility. Jio, which is also known as Reliance Jio and officially as Reliance Jio Infocomm Limited has already given its users free unlimited 4G data for 90 days. As the Reliance Jio aims to offer free 4G internet to the 90% of Indians along with the free voice calls and messaging services. So, we all must agree that Indians are always at the peak when it comes about available any internet facility. We all know Jio, which is also known as Reliance Jio Infocomm Limited has previously given its users free unlimited 4G data for 90 days. Not only that but even th...

10 Essential Tips To Keep In Mind While Surfing The Internet

BENGALURU: Internet is an amazing resource which is stapled in many people’s day-to-day lives. It is very much informative but at the same time encircled with many dangers. Listed below are the strategies you can follow to stay safe on the internet as stated by Tech Radar India. Use your discretion on social media Online scams crops up almost everywhere in webpage such as an e-mail, tweet, Facebook post, or many other places. Never click on links that do not look like a real address or pop-ups that claim you have won millions of dollars—all these are scams in which one can easily get trapped. Also don’t fall prey to e-mails which ask you to help someone transfer a large amount of money out of their country delineating their long sad story. Hackers can access data through various innovative ways like sharing links of content that they feel the targeted person is likely to click. Be careful of what you are sharing Limit the content that you share on social media. Facebook...