Designing an OLAP Solution using SSAS


Overview

This guide has been created for people who want step by step instructions on evaluating a set of requirements or a problem statement and converting it into an OLAP solution using SSAS. In the process, you will learn how to

1. Organize a Data Source View using Diagrams.

2. Determine and create Calculated Columns and Named Queries.

3. Determine the Attributes required for each dimension.

4. Design the Attribute properties.

5. Design Attribute Relationships.

6. Design User defined Hierarchies.

7. Review Dimension Usage (Relationships between Dimensions and Measure Groups)

8. Determine and create the measures required in the cube.

This guide assumes basic knowledge of SSAS and OLAP like what are dimensions, measures, cubes, how to create them using SSAS etc.

The source project accompanying this guide can be downloaded from here. The project uses the AdventureWorks2008R2 sample databases which can be downloaded from here.

Scenario

You are the lead solution designer for the multidimensional design team at Adventure Works. Adventure Works sells bicycles and components, through an Internet site and retail outlets. Although the company has been tracking sales for the last several years using its online transaction processing (OLTP) database, it now also wants to analyze sales data to compare the results of the two sales approaches (sales thru its website and sales thru retailers). It also wants to identify key sales trends.

Interview transcripts with the following stakeholders are provided:

Sales Manager—Retail Division
Sales Manager—Internet Division
Marketing Director

Interview Transcript for Stephen Jiang: Sales Manager—Reseller Division

The Reseller Sales division covers all of our worldwide resellers, and we’re responsible for setting sales quotas for all of our sales employees, and for tracking sales by resellers. The Sales Management team doesn’t really do much in the way of detailed analysis, other than reviewing printed reports.

The exception to that is a small team of sales analysts, led by José Saraiva. Usually they show us sales totals for each quarter by geography and by product category or sub category, but they’re able to drill down into specific resellers and products, so for example if one region shows a particularly low sales total one quarter, we can find out whether sales were low across all resellers and products. The sales are totaled by quarter, but I’d like for us to be able to break that down into monthly totals and sometimes even daily totals if necessary. Although not very frequently, but sometimes we may also want to see a breakup of sales figures against Sales Order Numbers.

We define sales quotas on a quarterly basis for our sales people, and I’d like to be able to track each employee’s performance against the quota to see whether I need to provide any additional support or training to help them meet their target.

Interview Transcript for Linda Mitchell: Sales Manager—Internet Division

The Internet Sales division processes all orders placed through our Web site. I need to be able to see what products are bought through our Internet site, and information about the customers who buy the products. At the moment, I get a monthly report showing me order totals for each product that month. I can also use Excel to export a list of all orders for the month, but the list is generally too big to be useful for anything other than grouping by customer to find high-spending repeat customers. It also takes ages to export the data, and I often just cancel part way through.

I know that we sell products all over the world, but I’d like to be able to compare order patterns and figures in geographical locations. I’d be really interested in being able to see the different results by country or region, but I’d also like to be able to drill down to state or province level, then into cities, and right to postal code levels.

The monthly order totals are useful, but I’d like to be able to find patterns in Internet orders at a lower level of granularity. For example, it would be useful to know what proportion of our Internet orders are placed at weekends, and to be able to see how Internet order totals are affected by holidays. I’d also like to be able to see the cost associated with each sale, so that we can get an idea of which items are most profitable.

Interview Transcript for David Bradley: Marketing Director

The Marketing department tries to promote sales for both the Internet and reseller channels. Our main strategy is to create time-limited, special offer campaigns to promote sales of specific products, but we’d like to be able to gather customer data that would help us take a more direct marketing approach.

We currently look at marketing campaigns by financial periods. For example, we need to see aggregated sales totals for the company’s financial year, and be able to drill into financial quarters, months, and right down to individual days. We need to compare these campaigns based on the financial period, whether the sales were from the Retail or Internet Sales division, several different customer demographic properties and products. To justify marketing campaign costs we also want to see the average revenue earned per customer in a particular time period or for a new product launch. The product information needs to be broken down by category and subcategory. Currently, we are using several complex, static reports in Excel in addition to Reporting Services. No one report can give us the option to easily slice and dice all of the properties we need to determine the effectiveness of different campaigns.

We’d like to be able to analyze Internet sales by customer, based on demographics such as age, address, education level, marital status, gender, occupation, number of children, yearly income, car ownership, and so on. We know that some of our Internet customers have provided this information, but it’s “locked up” somewhere in the sales system. We’d like to be able to get at that data and use it to identify buying patterns and trends that will help us create effective direct marketing campaigns. For example, we have recently identified that a customer’s gender and commute distance between home and office plays a very important role in his buying patterns. In the near future we would like to see most of our sales figures sliced by commute distance and gender.

For the reseller channel, we need to be able to see a breakdown of sales by region, and we need to be able to drill into individual resellers to see how the type of reseller (such as specialty bike shop, value-added reseller, or warehouse) affects what the reseller buys from us.

Additionally, when we analyze customer sales, we’d like to be able to see the total number of discrete sales, not just the total amount of money spent. Also we may frequently want to see consolidated sales figures of the reseller as well as the internet sales department.

Task 1: Organizing the Data Source View

1. Open a solution named “Module4_Starter.sln” from the source code accompanying this guide.

2. Open the “AdventureWorks” data source view and spend some time reviewing the tables that are included in it.

3. Go to the Diagram organizer and create 3 diagrams and name them as “Reseller Sales Division”, “Internet Sales Division” and “Marketing Division”.

4. Add the tables related to each division in the respective diagram. You may re-arrange the tables or choose the “Auto Arrange” option to make the diagram more legible. Make sure the following tables are placed in the “Reseller Sales Division” diagram –

DimProduct, DimProductCategory, DimProductSubCategory, DimReseller, DimEmployee, DimSalesTerritory, DimDate, DimGeography, FactResellerSales.

Make sure the following tables are placed in the “Internet Sales Division” diagram –

DimProduct, DimProductCategory, DimProductSubCategory, DimDate, DimCustomer, DimSalesTerritory, DimGeography, FactInternetSales.

Make sure the following tables are placed in the “Marketing Division” diagram –

DimPromotion, DimEmployee, DimDate, DimGeography, FactInternetSales, FactResellerSales.

Task 2: Creating Named Columns and Queries in Data Source View

1. Based on the interview transcripts, we will need to make some changes to the tables that are currently present in the DSV.

2. Firstly, we need to concatenate the FirstName, MiddleName and LastName columns in the DimCustomer table, so that we can return a complete customer name from our cube. Right click on the DimCustomer table and select the “New Named Calculation” option. Specify the column name as “FullName” and the following query as the Expression –

CASE
WHEN MiddleName IS NULL THEN
FirstName + ' ' + LastName
ELSE
FirstName + ' ' + MiddleName + '.' + ' ' + LastName
END

3. Repeat step no. 2 for the DimEmployee table as well.

4. In the DimTime table, we have columns that return a Month Number or Month Name like April, May, June etc. However when a month is used to slice a measure, the user won’t be able to tell in which year the month is falling. For example, if you look at the figure below, it is impossible to tell in which year the month of July is falling under. The same is applicable for Quarters as well. The other problem with this data the way it currently is, is that there is no uniqueness in the Month attribute of the Date dimension. Without uniqueness, we will not be able to create attribute relationships (and attribute relationships are a very critical part of SSAS project design).

5. To address these issues, we will need to create new calculated columns in the DimTime table. For example, we will create a new calculated column (say MonthName) and assign a concatenated value of EnglishMonthName + CalendarYear. The result of this can be seen in the figure below –

6. Add a new calculated column to the DimTime table.

7. Specify MonthName as the Column name and assign the following expression to it.

EnglishMonthName+’ ‘+ CONVERT(CHAR (4), CalendarYear)

8. Create additional calculated columns in the DimTime table as per the table below –

Column Name Expression
CalendarQuarterOfYear ‘CY Q’ + CONVERT(CHAR (1), CalendarQuarter)
FiscalQuarterOfYear ‘FY Q’ + CONVERT(CHAR (1), FiscalQuarter)
CalendarQuarterDesc Q’ + CONVERT(CHAR (1), CalendarQuarter) +
‘ CY ‘ +
CONVERT(CHAR (4), CalendarYear)
FiscalQuarterDesc Q’ + CONVERT(CHAR (1), FiscalQuarter) +’ ‘+
‘FY ‘ +
CONVERT(CHAR (4), FiscalYear)
CalendarYearDesc ‘CY ‘+ CONVERT(CHAR (4), CalendarYear)
FiscalYearDesc ‘FY ‘+ CONVERT(CHAR (4), FiscalYear)

9. In the interview transcripts, David Bradley mentions – “Also we may frequently want to see consolidated sales figures of the reseller as well as the internet sales department.” At the moment, Reseller Sales data is coming from the FactResellerSales table and the Internet data is coming from the FactInternetSales table. To be able to consolidate the data and present it, we can create a named query (analogous to a view in a database) and use it to combine the data from both the tables.

To create a named query, right click in the Data Source View and click New Named Query. Enter FactSalesSummary as the name of the query and assign the following SQL UNION query to it.

SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, ResellerKey, NULL AS CustomerKey, EmployeeKey, PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, 'Reseller' AS SalesChannel
FROM dbo.FactResellerSales
UNION
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, NULL AS ResellerKey, CustomerKey, NULL AS EmployeeKey, PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, 'Internet' AS SalesChannel
FROM dbo.FactInternetSales

Task 3: Implementing Dimensions

1. The next step is to create the dimensions. The interview transcripts should help you in determining and designing the Dimensions that will be required in this OLAP solution. A good Dimension design is very critical for the success of a SSAS solution. To help you out, the keywords highlighted in red are the Dimensions that will be required and the keywords highlighted in blue are the attributes that will be required. As a recommendation, when it comes to adding attributes to a dimension, it is always better to be conservative rather than liberal. That means, if you are not sure if a particular attribute is going to be useful or not, it is better not to add it to the dimension.

2. The table below gives a list of all the attributes that we will need to add to the dimensions along with the Key Column. In some situations, the attribute has a different Name Column assigned to it. As a good dimension design, you also need to pay attention to certain properties like AttributeHierarchyEnabled, AttributeHierarchyVisible, AttributeHierarchyOptimized and GroupingBehavior. For a more detailed discussion on the different properties that can be used to define attributes in a dimension, refer to the following post.

Dimensions Attribute Name Key Column Name Column Enabled Optimized Visible Grouping
Behaviour
Geography
Geography Key GeographyKey TRUE TRUE FALSE Discourage
Country CountryRegionCode EnglishCountryRegionName TRUE TRUE FALSE Encourage
State-Province StateProvinceCode StateProvinceName TRUE TRUE FALSE Encourage
City City TRUE TRUE FALSE Encourage
Postal Code PostalCode TRUE TRUE FALSE Discourage
Promotion
Promotion Key PromotionKey EnglishPromotionName TRUE TRUE FALSE Discourage
Promotion Type EnglishPromotionType TRUE TRUE FALSE Encourage
Promotion Category EnglishPromotionCategory TRUE TRUE FALSE Encourage
Product
Product ProduckKey EnglishProductName TRUE TRUE FALSE Discourage
Sub Category ProductSubCategoryKey EnglishProductSubCategoryName TRUE TRUE FALSE Encourage
Category CategoryKey EnglishProductCategoryName TRUE TRUE FALSE Encourage
Model Name ModelNameAssembly TRUE TRUE FALSE Encourage
Customer
Customer CustomerKey FullName TRUE TRUE TRUE Discourage
CommuteDistance CommuteDistance TRUE TRUE TRUE Encourage
Education EnglishEducation TRUE TRUE TRUE Encourage
Gender Gender TRUE TRUE TRUE Encourage
Home Owner HouseOwnerFlag TRUE TRUE TRUE Encourage
Marital Status MaritalStatus TRUE TRUE TRUE Encourage
Number of Children NumberChildrenAtHome TRUE TRUE TRUE Encourage
Occupation EnglishOccupation TRUE TRUE TRUE Encourage
Phone Phone FALSE FALSE TRUE Discourage
Email EmailAddress FALSE FALSE TRUE Discourage
Birth Date BirthDate FALSE FALSE FALSE Discourage
Yearly Income YearlyIncome TRUE TRUE TRUE Encourage
Reseller
Reseller ResellerKey ResellerName TRUE TRUE FALSE Discourage
Business Type BusinessType TRUE TRUE FALSE Encourage
Geography Key GeographyKey FALSE TRUE FALSE Discourage
Sales Territory
Region SalesTerritoryKey SalesTerritoryRegion TRUE TRUE FALSE Discourage
Group SalesTerritoryGroup TRUE TRUE FALSE Encourage
Country SalesTerritoryCountry TRUE TRUE FALSE Encourage
Employee
Employee Key EmployeeKey FullName TRUE TRUE FALSE Discourage
Parent Employee ParentEmployeeKey TRUE TRUE TRUE Discourage
Department DepartmentName TRUE TRUE TRUE Encourage
Title Title TRUE TRUE TRUE Encourage
Reseller Sales Order Details
Reseller Sales Order SalesOrderNumber + SalesOrderLineNumber SalesOrderNumber TRUE TRUE TRUE Encourage
Internet Sales Order Details
Internet Sales Order SalesOrderNumber + SalesOrderLineNumber SalesOrderNumber TRUE TRUE TRUE Encourage

3. To add an attribute to a Dimension, drag the Key column from the Data Source View pane and drop it in the Attributes pane.

4. To change the Name Column of an attribute, click on the attribute and then edit its NameColumn property. Select the alternate column as the new Name Column.

5. Optionally change the other properties like AttributeHierarchyEnabled, AttributeHierarchyOptimized etc. as per the table above.

Task 4: Designing Attribute Relationships and User Defined Hierarchies

1. In the previous task, when we added attributes to the Dimension, a default set of Attribute Relationships would’ve been automatically created by SSAS. The default set of attribute relationships would link every attribute of the dimension to its key attribute. For example, in the figure below, all the attributes of the Geography dimension are linked to its key attribute which is Geography Key. While there is nothing wrong in this set of relationships, it is not the most optimized. Creating proper attribute relationships is a key design factor to improve and optimize the performance of your OLAP solutions.

2. Since we can deduce that every Geography Key can be linked to a Postal Code and every Postal Code can have only one City and every City can have only one State and every State can have only one Country, we need to modify the default set of attribute relationships to reflect it. The new set of attribute relationships should be as shown below

3. To design a new set of attribute relationships, make sure you are in the Attribute Relationships tab of the Dimension designer. Delete all the current relationships from the Attribute Relationships panel.

4. Right click on the Key Attribute and select the New Attribute Relationship option.

5. In the Create Attribute Relationship dialog, select the relevant Related Attribute and click OK.

6. Then using this Related attribute as the Source, create another relationship to the next attribute in the chain.

7. The attribute relationships for each of the Dimensions in the OLAP solution should be follows –

Dimension Attribute Relationships
Promotion
Product
Customer
Country
Reseller
Sales Territory
Employee
Date

8. By default all attribute relationships created by SSAS are flexible relationships. In the attribute relationship diagram, a flexible relationship is depicted using a hollow arrow, whereas a Rigid relationship is depicted using a solid black arrow as shown below. A Rigid relationship is a relationship that will not change over a period of time. For example, a particular City will always remain under a State and is never likely to change (other than in some rare and extreme circumstances). Also a given Month will always fall in a particular Quarter. Such relationships between attributes can be changed to Rigid. Rigid relationships are always better than flexible relationships and should be used wherever applicable. Review the above table to see which relationships are Rigid and which ones are flexible.

9. After creating attribute relationships, the next thing to concentrate on are User Defined Hierarchies. Just like attribute relationships, it is not mandatory to create UDHs, but it is strongly recommended for a number of reasons. Most of the time your UDHs will match the attribute relationships, however that is not necessary. When a UDH matches an attribute relationship, it is called a natural user defined hierarchy. To create a UDH, go to the Dimension Structure tab of the Dimension designer. In the Hierarchies pane, drag and drop the topmost level of the UDH. Drag and drop the other subsequent levels accordingly. For example, in the figure below, a UDH has been created for the Geography dimension. The name of the UDH is Geography. The first level is Country; State-Province is at the second level etc.

10. Each dimension should have UDHs as shown below –

Dimension User Defined Hierarchies
Promotion
Product
Customer. Note the warning reported by BIDS. It is warning you that the UDH is unnatural. While a natural hierarchy is always recommended, there can be situations where you may create an unnatural hierarchy as well, as in this case.
Geography
Reseller
Sales Territory
Employee
Date. There is no limit to the number of UDHs that can be created.

Task 5: Designing Measures

1. Open the AdventureWorks cube.

  1. In the “Dimensions” pane in the lower left corner, right click on the “Adventure Works” node and select the “Add cube dimension” option. Shift select all the Dimensions and click “Ok”.
  1. Determine the measures that will be required by reviewing the interview transcripts. The keywords highlighted in green indicate the measures that will be required in this OLAP solution. One of the measures – average revenue per customer will need to be defined as a Calculated Measure.
  2. In the “Measures” pane in the upper left corner of the cube designer, right click and select the “New Measure” option. Add the following measures as shown in the table below –
Measure Name Fact Table Aggregation Function
Internet Sales Amount FactInternetSales Sum
Internet Sales Quantity FactInternetSales Sum
Internet Transaction Count FactInternetSales Count
Number of Customers FactInternetSales Distint Count
Reseller Sales Amount FactResellerSales Sum
Reseller Sales Quantity FactResellerSales Sum
Reseller Transaction Count FactResellerSales Count
Total Sales Amount FactSalesSummary Sum
Total Sales Quantity FactSalesSummary Sum
Sales Amount Quota FactSalesQuota Sum

5. The Measures pane should appear as shown below –

6. To create a calculated measure, go to the Calculations tab in the cube designer and click on the New Calculated Member button.

7. Create a new calculated member named Avg Revenue Per Customer and add it to the Measures Hierarchy. The formula for the member should be Internet Sales Amount divided by the Number of Customers. Pay attention to the Non-empty behavior property.

Task 6: Dimension Usage

1. The Dimension Usage tab in the cube designer can be used to review the relationships that exist between dimensions and measures (via measure groups). For example, from the figure below we can see that currently the Geography dimension is not related to any measure group and hence any measure. So we will never be able to effectively slice any measure by the any attribute of the Geography dimension. This dimension is currently practically useless. The relationships that have already been created are Regular Relationships. These relationships are automatically created by BIDS on the basis of the primary-foreign key constraints existing in the tables that were used in the Data Source View. A couple of the relationships (highlighted in the figure below) are Fact Relationships. Fact Relationships are automatically created by BIDS whenever the same table is used as a dimension as well as a fact table. For example, the FactInternetSales table is a fact table, because the Internet Sales Amount, Internet Order Quantity etc. measures are coming from it. The same table is also a dimension table because the Internet Sales Order Details Dimension is based on it. In OLAP terminology, such a dimension would be known as a degenerate dimension.

2. Coming back to the Geography dimension, we would like to use it to slice measures coming from the Reseller Sales measure group. Since there is no common column between the FactResellerSales table and the DimGeography table, BIDS has not been to create a Regular Relationship. However we can see from the DSV that the FactResellerSales table is related to the DimReseller table, which in turn is related to the DimGeography table. Hence an indirect (or referenced) relationship can exist between FactResellerSales and DimGeography.

3. To create a new referenced relationship, click on the cell intersecting the Reseller Sales measure group and the Geography Dimension. In the Define Realtionship dialog, select Referenced as the Relationship type and Reseller as the Intermediate Dimension. Geography Key is the common attribute in Reference (Geography) as well as the Intermediate (Reseller) dimension.

4. The referenced relationship is created as follows –

5. The cube and the OLAP solution are now complete as per the requirements.

As a next step, you can refer to this guide on optimizing an SSAS solution using Partitions, Proactive Caching and Aggregations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: