Basics of data warehousing and options with Power BI

Anyone dealing with the topic of business intelligence is quickly confronted with the term data warehouse. A question that often follows is whether a data warehouse is still needed with Power BI. In this article, I will explain the most important terms in this area. I will also show you how a data warehouse fits into a BI landscape and what role Power BI plays in this context.

    Add a header to begin generating the table of contents

    A look into the world of data warehousing

    In the first part, we first ignore Power BI and take a look at the data warehouse itself. In doing so, we answer the questions:

    • What is a data warehouse?
    • What does a classic data warehouse architecture look like?
    • What are the key technologies involved?
    • What are the advantages and disadvantages of a data warehouse?

    First, a brief recap: The core activities of a reporting solution

    In a previous Post I have explained the essential core activities of a reporting solution. These are:

    • Data creation
    • Data integration
    • Data modeling
    • Data presentation

    These core activities form the basis for the description of the different data warehousing technologies. Below you will find again the image from the article, which shows the different activities.

    Core activities of a reporting solution and classification of the Power BI basic components

    If the terms don't mean anything to you yet, I recommend you to take a look at the Post.

    What is a data warehouse?

    A data warehouse is a central location for the storage of company-wide harmonized data, which is kept for the construction of analytical data models. With a data warehouse, the risk is minimized that different data models output different figures for the same facts.

    The more data integration logic is transferred to a centrally controlled data warehouse, the higher the probability that e.g. in different data models the same sales or the same number of employees is will be returned.

    In the following chapters, we will now successively go through different versions of a BI architecture, taking into account various technologies.

    The traditional data warehouse architecture

    As a result of various technical innovations and cloud computing, the data warehouse has undergone significant change in recent years. However, the new technologies often complement the data warehouse and do not replace it. Therefore, we will start our journey through the worlds of data warehousing with a traditional architecture, which still occurs regularly in exactly this form today.

    Presentation of the basic components

    A traditional data warehouse architecture consists of three different core components:

    • data warehouse
    • Data Marts
    • ETL tool
    However, we will get to know more components in the course of the article.

    The following figure shows the interaction between these components based on the core activities of a reporting solution. I then go into more detail about the terms data warehouse, data mart and ETL.

    Traditional data warehouse architecture with data marts

    As you can see in the picture, the reporting core activities data integration and data modeling take place twice, once in the scope of the data warehouse and once in the scope of the data marts. To better understand the differences, the core components are now described in more detail.

    Term Description
    Data Mart In principle, a data mart corresponds exactly to the analytical data model that you would build even without a data warehouse and which I describe in the post about the core activities of a reporting solution explained above. This data model is a data-based representation of the use case you want to analyze. A good data model follows the star schema, which is why I've given the Star scheme a separate post . Once the data model is available, additional calculations (metrics and KPIs) will be added so that they can be used in the final phase, data presentation.
    data warehouse The main objective of the data warehouse is the company-wide harmonization of data for analytical purposes. The data warehouse is usually based on a relational database. Accordingly, the data is stored in the form of tables and linked to each other by using keys and relationships. Typically, the concepts of the star schema are already taken into account here as well in order to reduce the data integration activities in the scope of the data mart as far as possible and to increase the data integrity between several data marts.
    ETL Tools With Power Query, Power BI Desktop already has ETL functionalities, but there are of course also specialized ETL tools with a greater range of functions for covering data integration requirements, which are used in data warehouse architectures. Furthermore, with Power Query, data always ends up in Power BI. The highly specialized tools, on the other hand, are able to load data to a wide variety of destinations.

    Comparison of the data warehouse and the data marts

    In order to better understand the differences between data warehouse and data mart, you will find a comparison based on various criteria below.

    Criterion data warehouse Data Mart
    Destination Provision of company-wide harmonized data as a basis for analytical purposes Provision of a data model incl. KPIs as the basis for reports for a specific use case or department
    Quantity In the best case, a single data warehouse Multiple data marts, for example per use case or department
    Data source Operational source systems Data warehouse (company-wide harmonized data) as well as operational source systems via direct connection (see next chapter)
    Granularity Often highest possible level of detail, or at the transaction level Frequently aggregated data (for better performance) and only details where needed
    Rollers Data architect, database administrator and ETL specialist Data Analyst

    Hybrid filling of data marts

    In a data warehouse architecture, data marts are often not filled exclusively by the data warehouse. This is because often only data with a certain relevance is processed and provided in the data warehouse. The evaluation of relevance can be different in each company. 

    For this reason, it is common for a data analyst to use financial data from the data warehouse, for example, and also to import further data from the source systems via a direct connection, which is not taken into account by the data warehouse. Another classic case would be a separate Excel file where certain planning figures, mapping tables or comments are still kept. The picture below illustrates the scenario described here.

    Hybrid data filling of a data mart

    Other technologies on the rise

    In the previous chapters, the focus was on the traditional data warehouse, which is based on relational databases. However, you have most likely heard of the term "data lake".

    DWH component Description
    Data Lake A data lake is a highly flexible, cost-effective and multifunctional data store. The most significant difference to the data warehouse is that the data is not stored in the form of tables and relationships (structured data), but directly as a file. Classic file types are e.g. CSV, JSON and Parquet (semi-structured data). However, image, video or PDF files (unstructured data) can also be uploaded without any problems and without any transformation. A data lake is, simplified, comparable to the file explorer on your laptop, except that a data lake has enormous amounts of more memory, more power as well as additional functions. A data lake complements rather than replaces a data warehouse.
    Data Lakehouse Just for starters, the above terms are already a lot of input. Therefore, we will keep it short here. Relational databases and data lakes each have each their strengths and weaknesses. Based on this, new all-in-one solutions have been established over time, which map functionalities from relational databases, data lakes and ETL tools together in one solution, so-called data lakehouses.
    ELT We have always used the term ETL so far and now comes the term ELT? Just as ETL stands for Extract, Transform, Load, ELT stands for Extract, Load, Transform. So it's really just a different order of activities. This approach has become established, especially with the advance of the Data Lake, because here data can be extracted flexibly and cost-effectively directly from the source systems and stored directly, regardless of the file format. At the same time, it ensures more stability and performance for the end-users working in the source system, since the computationally intensive transformations are only performed after the data has been stored separately. To keep it simple, I generally use the term ETL, which can mean both.

    In the following picture we no can see the additional data lake in our architecture. A data lakehouse would accordingly be a combination of all components (data warehouse, data lake and ETL) in one solution.

    Modern Data Warehouse Architecture

    Term "Data Warehouse Architecture Since we have now seen that behind the term data warehousing you can find a wide variety of technologies, I will simply use the term DWH . By this I mean the collection of all the tools presented above for collecting, integrating and holding data with the purpose of making it available to data analysts for building analytical data models.

    Do I need a data warehouse? Advantages and disadvantages at a glance

    Perhaps you are now asking yourself whether a data warehouse is worthwhile for your company. Perhaps the following overview of advantages and disadvantages will help you to assess this. No distinction is made between data warehouse, data lake or data lakehouse.

    But of a certain size or complexity, it is no longer a question of whether, but how? With all the pros and cons, it's important to understand that once you reach a certain complexity of requirements and company size, you can't avoid a separate data integration layer (in whatever form). In this case, it is no longer a question of whether you need it, but only how you set it up.

    Data Warehousing and Power BI

    After the general introduction to the world of data warehousing, we will now look at the whole thing in the context of Power BI. In doing so, we will look at two different scenarios:

    1.  Power BI as a data mart with separate data warehouse
    2.  Data Warehousing with Power BI Dataflows

    Power BI also works without a data warehouse A data warehouse is not a mandatory requirement for Power BI. Thanks to the extensive functionalities of Power BI, effective reporting solutions can be implemented even without a data warehouse. This is especially true when the previous reporting is based on Excel only. However, as soon as several Power BI datasets and reports have been established, implementing a data warehousing layer should be considered.

    Scenario 1: Using Power BI with separate data warehouse

    Before we look at scenarios where we use Power BI for data warehouse activities, let's first look at the classic scenario where Power BI is used for building the data marts which are linked to a separate data warehouse. This data warehouse can exist on different technologies from different vendors. Here, I'll introduce you to the main technologies from the Microsoft Azure stack. Microsoft Azure is the cloud computing platform from Microsoft. With Microsoft Azure it is possible to set up entire IT infrastructures flexibly and to integrate a variety of software solutions directly. The setup and maintenance of these solutions is typically handled by the IT department or by a service provider, i.e. not by the business department.

    Please note that the technologies presented can be used different combinations. Since this article does not focus on Microsoft Azure, I will keep the explanations really short.

    Microsoft Azure component Description
    Azure SQL A service for the provision of a relational database as the basis for the traditional data warehouse
    Azure Data Lake Storage A service for providing a data lake for the storage of various media and file types
    Azure Data Factory An ETL solution to meet simple and complex data integration requirements
    Azure Synapse Microsoft's own Data Lakehouse, which combines a variety of features from Azure SQL, Data Lake Storage and Data Factory into one solution

    The following figure shows how the Azure technologies fit into the DWH architecture presented so far where Power BI is fully used to cover the data mart and data presentation layer. The dataflows, which are presented as optional, will be introduced in the following chapter.

    Modern Data Warehouse Architecture with Microsoft Azure and Power BI

    Scenario 2: Data warehousing with Power BI Dataflows: cost-efficient, easy to use and effective, but with limits

    Just as there is the Power Query module in Power BI Desktop, there are the so-called Dataflows in the Power BI Service. A dataflow is basically a stand-alone variant of a Power Query module with the difference that dataflows are located directly in the Power BI Service. Dataflows can be created within all workspaces except the Personal Workspaces. 

    Architecture of the solution

    Just like with Power Query, when you create a dataflow, you connect to one or more data sources using the familiar connectors. Once connected, you experience the same look and feel as in the Power Query module in Power BI Desktop, even with additional features. The special thing about a dataflow is that the result of the query is also stored directly in the dataflow. Multiple dataflows can also be linked together. This is also the reason why a central data integration layer can be built using dataflows. Because after the data has been run through a dataflow, you can connect to the dataflow with Power BI Desktop, for example, and build your solution based on the data harmonized in the dataflow. You can also perform additional transformation in Power Query before loading the data into your data model (dataset).

    Layer for enterprise-wide data integration using Power BI Dataflows

    In this example, the enterprise-wide integration activities that would otherwise take place in a DWH are covered within a Power BI Workspace with Premium license (Premium per user also possible). Power BI Dataflows are used to perform the ETL steps. 

    The Power BI Premium Workspace is the single source of truth for enterprise-wide data. From here, the data is distributed to the data analysts in the various areas (e.g. Finance or HR) via further data flows. This ensures that the respective data analysts only see the data they are supposed to see. 

    Below you will find key arguments why this approach is an option worth considering for implementing a central data integration layer:

    * requires Power BI Premium, or Power BI Premium per user

    Power BI Data Marts In May 2022, Microsoft introduced a new component within Power BI Services, Power BI Data Marts. Briefly described, Power BI Data Marts combine the components Dataflow and Dataset. In addition, data can also be queried via SQL, for example. Personally, I have not yet used Power BI Data Marts in a productive scenario, and a look at the Power BI Community shows that Power BI Data Marts are often still being piloted. For this reason, I have excluded Power BI Data Marts here. At this point, you should first take away that "Data Mart" is a common BI architecture term (as described above) and "Power BI Data Mart" is a Power BI Service product component. If you want to learn more about Power BI Data Marts, you can find an introductory Microsoft Learn post here. Introduction to datamarts - Power BI | Microsoft Learn.

    Limits of the solution

    However, as mentioned in the title, this solution also has its limits and cannot be scaled endlessly. With increasing data, number of people and complexity of requirements, highly specialized tools for data integration, data storage and data management are needed. Below you will find a list of limitations that are not or only moderately covered in the Power BI. Dataflow scenario.

    For some people, this may seem like a lot of disadvantages at first. But not all of these disadvantages have to affect your company. Therefore, it is all the more important to really evaluate what is needed in the end and in what time frame. Because one should not forget that the special tools require significantly more extensive know-how, both technically and organizationally. In addition, the introduction of one of the special tools is significantly more complex and requires a significantly higher investment with a longer implementation time.

    However, precisely for this reason, the approach presented here can be a highly efficient and effective solution or an intermediate step on the way to a future DWH architecture. 



    This post gave you an insight into the basic architectural components of data warehousing and placed Power BI in that context. I hope you were able to glean the following points from the article.

    • A data warehouse is a central location for the storage of company-wide harmonized datawhich are used for the construction analytical data models be held
    • a data warehouse is often based on a relational database and is now often supplemented by a data lake
    • For the filling of data warehouses and data lakes come special ETL tools for use
    • A Data Lakehouse is a all-in-one solutionwhich combines the previously mentioned solutions together
    • Power BI also works without a data warehouse. But with increasing company size, number of stakeholders and amount/variety, the topic should be data warehousing evaluated and addressed
    • If you want to perform data warehousing in conjunction with Power BI, there are two basic scenarios1) Data warehousing with Power BI Premium and Dataflows 2) Power BI with separate DWH solution 
    • Power BI Dataflows is a very cost-efficient, effective solution, directly feasible solution, but eventually reaches its limits, especially with increasing company size
    • Power BI with separate DWH solution enables the mapping of the most complex scenarios and can be scaled almost endlessly, but comes with high costs and longer implementation times
    • Due to the high flexibility and low implementation effort are Power BI Dataflows also a solid option for an intermediate step during your journey torwards a "real" DWH architecture.

    I'm so glad you made it this far! Stay datenhungrig and see you soon!


    WordPress Cookie Notice by Real Cookie Banner