Skip to main content

How do I build a data warehouse?


The steps are as follows:
  1. Start with understanding the process followed by the org. / company for which you’re thinking of creating a Datawarehouse. Decide whether you need just a Database or a Data Federation or a complete Datawarehouse.
  2. Create a list of questions that you want to answer for the business (that’ll help the business analyse data better & make better decisions using the data) through the datawarehouse.
  3. Find out the Data Source Points & the kind of systems that the data resides in (For example, some data might be in files whereas some might be in Websites like twitter / facebook or databases such as Oracle / SQL Server / MySQL).
  4. Build a Requirement Specification Document & get it approved / reviewed by the Business Owners.
  5. Perform Data Modelling to create a set of tables that you’d need in the Datawarehouse / Database. Also define the kind of dimensions (Slowly Changing, Degenerated, Conformed, Static, etc) & facts (Early Arriving) in case you’re planning to go for Dimensional Data Modelling.
  6. Prepare a Technical Design Document (&/or Mapping Sheet). This should ideally have the exact information on how each column in your datawarehouse is loaded & what exactly are the business rules.
  7. Choose the ETL Tool (or programming language) that is best suited to your company’s budget & datawarehouse requirements.
  8. Create ETL Jobs / Mappings / Pipelines & schedule them depending on how often you want the data to be refreshed in the DataWarehouse.
  9. Re-iterate the process if there are changes (the model where changes are expected soon, is called the Agile Model; the other one where no changes are required anytime soon, is called the Waterfall Model). Don’t forget to prepare a timeline at each step & decide estimated time before you enter the first / third step.
  10. Optimise any step(s) implemented.
(B) Which are the different kinds of building?
There are 2 approaches to data warehousing:
1. Kimball or Bottom-up Approach
In this approach, the data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse.
2. Inmon or Top-down Approach
In this approach, a normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or specific departments are created from the data warehouse.
Note: William H. Inmon (Bill Inmon) is recognized by many as the father of the datawarehouse
(C) What is the process involved using an ETL tool like Informatica?
When using an ETL Tool like Informatica, ETL Developers use GUI components of Informatica to build mappings while referring the mapping sheet or technical design document. The advantage that these tools provide are 1. Visual flow 2. Structured system design 3. Operational resilience 4. Data-lineage and impact analysis 5. Advanced data profiling and cleansing 6. Performance. For detailed info. see: The 7 biggest benefits of ETL tools
(D) How do you test it?
The ETL Developers perform testings such as Unit Testing & Peer Review before sending the code to QA / Testing Team.
ETL Testers can either test it using GUI based ETL Testing Tools like Data Validation Option (DVO) or using SQL Queries or via Manual validations all three of these following the steps as mentioned below:
  1. Analyse Business Requirements & Understand the Business Process using documents like Business Requirement Specification Document, Technical Design Document / Mapping Sheet, etc.
  2. Prepare Test Cases
  3. Implement Test Cases
  4. Share Results with the team & Perform regression testings whenever required.
(E) How can you build data mart using this?
Data Marts are built depending on the requirements of the teams who want a specialised view on data for a specific purpose. Read this classic e-book on Data Marts for details: http://ftp://ftp.wiley.com/public/computer_books/updates/guide.pdf
F. Is data mart going to be OLTP or OLAP? I believe , DWH will be and has to be OLAP.
OLTP (Online Transaction Processing) are the systems from which a Datawarehouse is built. A Datawarehouse is an OLAP (Online Analytical Processing) & so are the Data Marts built from it.
The difference between these systems can be read from: What is the difference between OLTP and OLAP?
(F) What will the DWH be, when it is in star schema?
When a DWH is in Star Schema, it’ll look like the diagram below:-
See details on the following link: Star schema
Hope this answer would be read completely atleast by one. If possible, next time please use a different thread for each of such questions :)

Comments

Popular posts from this blog

The Best Web Hosting Services

Are you looking for the best web  hosting  services for your needs? Whether you need a place to host your small personal blog or a major corporate website, the following list will help you identify the best hosts to use. Finding the best web hosting service isn’t quite as straightforward as searching Google and choosing the one with the lowest price. There are a lot of issues to consider, including the reasons for  why  you need hosting and  how  you intend to use it. Once you have a handle on that, finding the right host becomes much easier. Choose one that’s undersized and you’ll end up with website outages and slow page loads, but choose one that’s oversized and you’ll be throwing money away. Defining Your Web Hosting Needs Before choosing your web host, you’ll need to think about your requirements. Consider the following concerns and decide the importance of each item on a scale of 0 to 10 (with 0 being not at all important and 10 being critically important): Speed  — H

Google Photos can now stabilize all your shaky phone camera video

G oogle Photos is where all my photos are. Long ago I was a man of SmugMug, and then Flickr, and then at some point spent days and days copying years of images to iCloud Photo Library before eventually disregarding that and switching to Google. What can I say? I’m a simple person who can be easily delighted and swayed by automatic GIF creation and reliable backups. And Google Photos keeps getting better. Here’s the latest example: now the mobile app can automatically stabilize videos in your camera roll with a tap. A lot of flagship smartphones offer optical image stabilization when shooting video, a hardware feature that helps keep footage smooth. Others, like Google’s Pixel, use software to try and stabilize jerky movements. Putting stabilization inside the Google Photos app could enhance results further if you’re already working with hardware OIS, or improve recordings significantly if your phone lacks any means of steadying things out of the box. The stabilized video is croppe