No scaling without a solid data model: an introduction to the star schema

In this article we will focus on one of the most important topics in data modeling, namely the star schema. If you want to build scalable models with Power BI, you can't avoid the star schema.

    Add a header to begin generating the table of contents

    Introduction to the star schema

    The star schema is a long-established best practice for building dimensional data models. It is used in relational data warehouses as well as in analytical data models in the context of reporting. The star schema and other methods were coined in particular by Ralph Kimball, a luminary in the field of dimensional modeling.

    Why should I bother with the star schema?

    Especially people starting with Power BI tend to bypass the application of the star schema, even if they have already heard about it. Please do not make this mistake and try to include the methods presented here in your data model as early as possible. 

    Below is a selection of key reasons why your data models should follow the star schema. By the way, the counterpart of a star schema is a single flat table (Flat Table), which contains all the required data. So when we talk about the advantages of a star schema, we compare it to the flat table approach.

    Traceability: When using the star schema, the underlying data is split into different tables and related to each other. By splitting it up, both you and third parties can understand the contents of the resulting data model much better, which also simplifies collaboration.

    Scalability: The star schema has been a widely known practice for many years. I dare to say that almost all advanced BI developers know and can apply this model. Such standardized methods, which are also widely known and accepted, are the driver for real scaling.

    Better DAX Measures: With very small models and simple requirements, this aspect honestly has less weight. But from each model further requirements can follow with the time and sometime you are at the point where the DAX coding becomes more and more cumbersome if the solution is not based on a clean data model. But even applying it to simple models is goal-oriented, because this way you always use the same methods.

    Performance: Using the star schema minimizes the amount of data in your model. As a rule of thumb, the larger the underlying data set and the more extensive the calculations based on it, the greater the performance improvement of a star schema model compared to a simple flat table approach. 

    Structure of the star schema

    Now let's look at the essential components of a star schema. The following figure shows a classic star schema. As you can see, it consists of a fact table, several dimension tables, and relationships between the fact and dimension tables.

    Core components of the star schema

    Now let's go down one level and take a look using the Date-Dimension table (Dim_Date) and the fact table (Fact_Sales) the main components of the star schema. Below you will find the short description of each component.

    Core components of the star schema
    No. Component Brief description
    1 Dimension tables represents a specific master data category to be used for the analysis of the data, e.g. products, customers or time
    2 Entity is a row within the dimension table and corresponds to an object within the dimension table. For example, in the Dim_Date dimension table, a tag is an entity and thus has its own row. In the Dim_Product table, each individual product is its own entity and thus listed as its own row.
    3 Primary key column a column in a dimension table that holds a unique value for each entity. This value is also called primary key. The primary key uniquely identifies an entity and thus also a row in the dimension table.
    4 Attributes are additional columns within the dimension table, which are used for the description, categorization and grouping of the entities within the dimension table. For the Dim_Date table, for example, these are the columns Month, Quarter and Year. For example, 01.03.2023 has the value "March" for the attribute "Month". For example, the product "Banana" could have the value "Fruit" for the attribute "Product group".
    5 Fact table is the table that contains the value columns to be analyzed and, in addition, so-called foreign key columns. The fact table forms the basis for the calculations.
    6 Value column is a column in the fact table that contains the values for the calculations, for example, the turnover, account balance or the quantity of a sold product. Accordingly, a fact table can have several columns of values.
    7 Foreign key columns are columns contained in the fact table, each of which is linked to the primary key column of a dimension table via a relationship. A foreign key column uses the values of the assigned primary key column. In this way it is determined to which entity a row of the fact table (and thus also the value in the value column) belongs.
    8 1:n relationship the technical connection between a primary key column and a foreign key column. In a classic star schema, the relationship is a 1:n relationship, since each value occurs only once in the primary key column and values can occur multiple times in the foreign key column.

    How do I get data into star schema format?

    In an article on the Core activities of a reporting solution I addressed the fact that source systems usually hold the data for transactional processing (OLTP), but we would like to have the data in a format optimized for analytical processing (OLAP). Transferring the data from an OLTP format to an OLAP format requires data integration using ETL activities. 

    The star schema is also a form of OLAP format and you should always be interested in having the data in this form while working with Power BI before performing calculations and visualizing your results. 

    There are two basic scenarios in conjunction with Power BI for how you get your star schema format.

    1. You connect to the source system and use Power Query to bring the data into the format using ETL rules
    2. A data warehouse already exists in your company. In the best case, this should also be set up in consideration of the star schema methods, so that you only need access to the relevant tables.

    Of course, there is also the hybrid variant, where the data is available in the data warehouse, but you still add ETL logic in Power Query. This often happens, because the data warehouse provides the data in such a way that as many people as possible benefit from it, while you may have individual special requirements for your situation.

    More article on star schema and ETL in the works! This article is first about the star schema itself. The conversion of existing data into the star schema using Power Query is covered in a separate article.

    What does it all look like in Power BI?

    Now let's take a look at how the star schema is mapped in Power BI. In addition, I'll show you common errors in this context and how to recognize them. Since we will cover the topic of data integration and ETL in a separate article, we will assume here that the dimension and fact tables are already available via a data warehouse.

    Build star schema in Power BI

    After connecting to the data warehouse via Power Query, we have a fact table called Fact_Sales and two dimension tables called Dim_Date and Dim_Product. 

    In the Dim_Date table, one row corresponds to exactly one day in the calendar year. Thus, we use the date column named PK_Date as the primary key column.

    In the Dim_Product table, each row corresponds to exactly one specific product. In the data warehouse, a product primary key (PK_ProductKey) has been generated for each row using an ascending sequence of numbers so that each product has a unique key value.

    If you look closely, you will see that in the Dim_Product table there is also a column called ProductLabel, which shows the "real" product number, which is also used in day-to-day business. This could theoretically also be used as a primary key column. However, the schema in the data warehouse was set up in such a way that new keys were generated for the products.

    A technically generated primary key, as in our product table, is called a surrogate key. A primary key used in day-to-day business is called a natural key. Other examples of natural keys are the personnel number, account number or customer number.

    Dimension tables with unique primary key column

    Now let's have a look at the fact table. As you can see, there are the foreign key columns ProductKey and FK_Date. In contrast to the dimension tables, the key values can occur redundantly there, since several products can be purchased on one day. And when this happens, a row is created in the fact table with each sale and the same tag is used as a foreign key reference to the Dim_Date table.

    In addition, we see the value column called SalesAmount, which maps the sales revenue of each sale. For example, the first row reflects that on 02.01.20 the product with primary key value 956 was sold, generating a revenue of 1,544.40 Euros.

    Fact table with foreign key columns in Power Query

    Since our tables have already been prepared in the data warehouse, we continue to load them directly into the Data Model View.

    In the Data Model View, the relationships between the primary key columns of the dimension tables and the foreign key columns of the fact table are now set.

    When loading the data via Power Query into the Data Model View, Power BI tries to set the relationships on its own based on the column names. If this is not the case, you can set the relationships yourself via drag & drop by clicking on a primary key field and dragging it to the corresponding foreign key field in the fact table.

    Star Schema in the Power BI Data Model View

    Relationships have two main properties in Power BI, cardinality and filter direction.

    Cardinality of the relationship:

    Notice on the image the ends of the relationships. The asterisk symbol shows the n-side of the 1:n relationship. The 1 symbol shows the 1 side of the 1:n relationship. If it looks different in your model, the values in the primary key column of the dimension table are not unique and/or the values in the foreign key column of the fact table are not ambiguous. The latter can happen and is theoretically ok, but is rather rarely the case.

    Filter direction of the relationship:

    In addition to cardinality, a relation in Power BI has a filter direction, which is indicated by the arrow. In our case, Dim_Date can filter Fact_Sales, for example. Fact_Sales, however, cannot filter Dim_Date. The same is true for Dim_Product and Fact_Sales. Even though Dim_Product can filter the Fact_Sales table, it cannot filter Dim_Date.

    Using the star schema in Power BI

    Once the data model is ready, we can start building the report in the Report View. In the following example, I use a matrix visual to analyze sales by time and product color. 

    Using the star schema in a matrix visual

    As you can see, I have dragged columns from the various tables into the visual.

    • The sales from the Fact_Sales table are shown in the value field of the matrix visual
    • The attributes Year, Quarter and Month from the Dim_Date table are in the row field of the matrix visual
    • The attribute for the product color from the Dim_Product table is in the column field of the matrix visual

    Although the columns come from different tables, it is possible to evaluate them with each other. This works solely via the set 1:n relationships. Although the sales are in the fact table, they can be evaluated via the key columns including the relationship using the dimension table attributes.

    Classic errors when using the star schema in Power BI

    In connection with the star schema, there are typical errors that you will encounter more often, especially in the beginning. Therefore, I will now show you which errors these are and when they occur.

    Missing values in the primary key column (dimension table)

    Sometimes it happens that you have key values in a foreign key column in the fact table, but they do not appear in the linked primary key column. For example, in the example below, the fact table has values for 2020, but my Dim_Date dimension table only lists date values from 2021. Therefore, Power BI cannot derive a link between the date values in the fact table and those in the dimension table.

    This is reflected in the fact that some expressions show meaningful results, but additional values are displayed without assignment (see image below).

    Star schema error: missing values in dimension table

    Wrong columns for relationship or different data types

    Another classic source of error is that the relationship between the wrong columns was set by you or by Power BI (in case of automatic assignment). If this happens and the columns do not have matching key values, Power BI will not be able to find values for the values in the dimension table. The result will look like in the image below. You will also get the same problem if you link the correct columns, but these columns do not have the same data type.

    Notice: Check carefully if the correct columns are linked and make sure that the linked columns have the same data type.

    Star schema error: different data types or wrong fields related

    Missing relationship

    For example, if you are working with multiple fact tables (see next chapter) and not every dimension table is linked to every fact table, then it can also happen that you accidentally merge a fact table with an unlinked dimension table into a visual. Under certain circumstances, Power BI then identifies that there is no relationship between these tables and returns the following error message.


    Star Schema Error: No Relationship

    Notice: In connection with the use of measures (which you should start early), there are advanced scenarios in which tables without a relationship to a fact table are deliberately used as an auxiliary table for calculations. In these scenarios, no error occurs, since a calculation behavior is provided via the DAX logic.

    Expansion stages of the star schema

    We have looked at the classic star schema in this article. However, as requirements increase, the need arises to expand the model to include other aspects. Even though we are focusing on the basics in this article, I would like to give you initial impetus for further topics related to the star schema with the following overview.

    Expansion stage Brief description
    Snowflakes schema a form of star schema in which several dimension tables are connected in series. For example, there is a dimension table for products and one for product groups. The product table is linked to the fact table and the product group table is linked to the product table. Personally, I try to avoid this form in my models. However, there are some scenarios where it does add value.
    Multiple fact tables This is a very common and, above all, effective means for cross-functional evaluation. In our example we have only looked at one fact table and several dimension tables. Of course, there could be a Fact_Temperature table in addition to our Fact_Sales table. If now both fact tables are linked with a Dim_Date table it is possible to analyze the sales in connection with the weather, although the data are in different tables.
    Junk Dimension In our example, the dimension tables were selective and had a number of other attributes for a grouped evaluation. However, there are regularly also single-column categories without further relevant attributes according to which one would like to evaluate data. To avoid having to create a new dimension table for each of these categories, there is the principle of the junk dimension. The junk dimension combines several foreign categories in one table, whereby a separate row is generated for all combinations of category attributes. This junk dimension can then be linked to the fact table and filtered via the junk dimension without inflating the data model with several single-column dimensions.
    Slowly Changing Dimensions Dimensions represent the master data of a data model. In our simple example, however, we assume that an entity has a certain value related to an attribute. But what if an attribute value for an entity changes and I also want to evaluate this change historically? The requirement to map history in dimension tables is called Slowly Changing Dimensions, and there are four different types that describe approaches for dealing with historical analysis of dimension tables.


    This article gave you an introduction to the basics of the star schema. I hope you were able to take the following points from the article.

    • The Structure of the star schema is one of the main principles for data modeling with Power BI
    • It promotes the Traceability, Scalability, the DAX Coding and the Performance
    • A star schema consists of Dimension tables, one or more fact tables, primary key columns, foreign key columns, attributes, relationships, and value columns.
    • Dimension tables represent the Master data and include Attributes for the Categorization
    • With the help of the Dimension tables the Fact Tables filtered. whereby the therein contained Values on the basis of the Attributes grouped evaluated can be
    • You know how a Star Schema in Power BI looks like and how you can quickly detect errors in your model
    • You should use the described methods for both small as well as large data models apply

    I'm so glad you made it this far! Stay datenhungrig and see you soon!

    WordPress Cookie Notice by Real Cookie Banner