Optimizing an OLAP Solution using SSAS


Overview

This guide has been created for people who want step by step instructions on evaluating a set of requirements or a problem statement and implementing it in an OLAP solution using SSAS. In the process, you will learn how to –

1. Design a cube partitioning strategy for measure groups.

2. Choose an appropriate storage setting and location for each partition.

3. Design pro-active caching and latency and refresh intervals.

4. Design an aggregation design strategy.

The source code accompanying this guide can be downloaded from here. The source project uses the AdventureWorks2008R2 sample databases, which can be downloaded from here.

Scenario

You are the lead solution designer for the multidimensional design team at Adventure Works. Adventure Works sells bicycles and components in America and Europe. Due to regional differences and requirements, both regions use a different OLTP system for maintaining their sales information. Inspite of the differences in the systems and data formats used in these two regions, the SSIS team has done a great job of standardizing the data structures when the data is loaded in the warehouse. The ETL package developed by the team loads the daily transactional data into two different tables having the same structure viz FactAmericaSales and FactEuropeSales. The team has assured the management that when the company diversifies into other regions, they will be able to replicate the same strategy.

The management team at AdventureWorks Headquarters (HQ) uses a SSAS cube for their analysis and reporting requirements. The team at the HQ is more interested in consolidated information from all the regions (Europe and America at the moment) rather than data that are sliced region-wise.

Problem Statement 1 – The management team at the AdventureWorks Headquarters (HQ) currently uses a SSAS cube for their reporting and analysis requirements. The cube contains two measure groups, each pointing to the fact tables for the Europe and American region. This forces the management to analyze the data on a per region basis which is not very efficient and useful. You have been assigned to fix this on a top priority basis.

Problem Statement 2 – The management is also very concerned about the latency involved in seeing up-to-date information in the cube. Sales happen round the clock and the transactional data is loaded into the warehouse on an hourly basis. The management would ideally like to analyze order information within a window of 3 hours. They have instructed the database administration team to schedule cube processing at a time interval of not more than 3 hours. As part of the company policy, changes to certain sales related information is allowed only in the current month. Some financial related data related to orders can be changed until the financial year ends, but these changes are rare and infrequent.

Problem Statement 3 – The database administration team is skeptical on whether the current hardware resources will be able to satisfy the management’s latency requirements. They feel that frequent processing of all the sales information will strain the resources to a great extent. They feel the problem will be compounded after the multidimensional design team completes their task of consolidating the regional sales data. They are also worried about whether the storage capacities on the current SSAS server will be able to handle the data volume once AdventureWorks expands to more regions. They have suggested in the past that the multidimensional design team try to use some unused servers to store data which is more than two years old which is very infrequently accessed. They hope that this time with the new design decisions being taken to improve cube efficiency, their suggestions would be given a serious thought. They also want the development team to take into account the pressures on the storage capacity while designing aggregations (if any).

Preparing the AdventureWorksDW database to be used with this guide

This guide uses a modified version of the AdventureWorksDW2008R2 database. There are two additional tables viz FactAmericaSales and FactEuropeSales created in this modified database. These tables are just copies of the FactInternetSales tables (both in terms of structure as well as data). These two tables are used to provide solutions based on the above problem statements. The source project accompanying this guide contains a file called PrepareDB.sql. Open and execute the contents of this file.

Task 1: Designing and Implementing a Partitioning strategy

1. Open a solution named Module5_Starter.sln from the source project accompanying this guide.

2. Open the cube named Module5 and you will notice that there are two measure groups viz. Sales-America and Sales-Europe.

3. Deploy the project. You may need to change the deployment properties or connection information of the data source.

4. Go to the cube browser and drag the measures from the Sales – America measure group to the data region of the cube browser. Note the values of the measures. Repeat the same for the measures in the Sales – Europe measure group. You will use these values to validate the partitions you implement in the subsequent tasks.

5. Since the AdventureWorksDW2008R2 database contains sample data from the year 2005 to 2008, we will consider the current financial year as 2008. Also the current month would be assumed to be June.

Solving Problem Statement 1

6. For the first problem statement, the ideal solution would be to create a single measure group pointing to one of the Fact tables and then change the source query of the partition to union all the data from the other fact tables for the other regions. This way since there is only one measure group, users will not have to view multiple measures for things like Sales Amount, Order Quantity etc.

7. Since we have decided to create one consolidated measure group, delete the existing measure groups present in the cube.

8. You should also delete the three time dimensions – DueDate, ShipDate and Date from the Dimensions pane. Your cube design should appear as shown below.

9. Right click the now empty “Module5” node in the Measures pane and choose the “New measure” option. From the “Source Table” drop down, choose the “FactAmericaSales” table. From the “Source Columns” listbox, choose the “OrderQuantity” column. Ensure that the “Usage” dropdown has a value of “Sum” and click “OK”.

10. Repeat the above step and add a measure using the SalesAmount column from the same table.

11. Add a measure called Sales Count. For usage, choose Count of Rows.

12. Rename the measure group by right clicking on “Fact America Sales” node and choosing the “Rename” option. Change the name of the measure group to “Sales Summary”.

13. Deploy the project. Go to the “Browser” tab in the cube designer and drag the “Order Quantity” measure from the Measures node to the data region of the pivot table. Since the measure group is only referring to one of the Fact tables (FactAmericaSales, the value displayed should be 60398).

Solving Problem Statement 2 and 3

14. In problem statement 2, it is mentioned that most of the updates happen to data that is valid for the current month (which in our case is June 2008). Updates to data for the current year (which in our case is 2008) are also possible but rare and infrequent. Data older than the current year cannot be updated. Keeping this in mind, we can create atleast 3 partitions for our measure group. One of them could be called Sales_Summary_CurrentMonth and the other Sales_Summary_CurrentYear. We could store all the data for all the previous years in a separate partition, but that could make the partition extremely large and difficult to maintain. A better strategy would be to store data for each financial year in its own partition. Since we have data from the year 2005 onwards, the total number of partitions that we could create would be – Sales_Summary_2005, Sales_Summary_2006, Sales_Summary_2007, Sales_Summary_CurrentYear and Sales_Summary_CurrentMonth.

15. The other properties of these partitions would be as follows –

Partition Name Storage mode Storage Location Rationale
Sales_Summary_2005 MOLAP Remote Since the data for this period never changes, MOLAP is the best storage mode. Also since this data is likely to be rarely used in reports, we can store this partition on remote servers
Sales_Summary_2006 MOLAP Remote Since the data for this period never changes, MOLAP is the best storage mode. Also since this data is likely to be rarely used in reports, we can store this partition on remote servers
Sales_Summary_2007 MOLAP Local Since the data for this period never changes, MOLAP is the best storage mode. Since this data is likely to be used frequently in reports, we can store this partition on local SSAS servers
Sales_Summary_CurrentYear HOLAP Local Since the data for this period changes, but rarely, we can try and use HOLAP as the storage mode. Raw data would be stored in the relational system and aggregations would be stored in SSAS. Since this data is likely to be used frequently in reports, we can store this partition on local SSAS servers
Sales_Summary_CurrentMonth ROLAP Local Since the data for this period changes very frequently and latency is critical, we can use ROLAP as the storage mode. Raw data would be stored in the relational system. Since this data is likely to be used frequently in reports, we can store this partition on local SSAS servers

 

Partition Name Proactive Caching Silence Interval Latency Rationale
Sales_Summary_2005 False NA NA Since the data for this period never changes, proactive caching is useless and should not be enabled.
Sales_Summary_2006 False NA NA Since the data for this period never changes, proactive caching is useless and should not be enabled.
Sales_Summary_2007 False NA NA Since the data for this period never changes, proactive caching is useless and should not be enabled.
Sales_Summary_CurrentYear True 1 hour 3 hours We can take advantage of proactive caching for this partition so that data and aggregations can be stored in memory rather than being retrieved from the RDBMS everytime. Since a latency of 3 hours is permitted as per business rules, we can change the silence interval accordingly and latency accordingly. A silence interval of 1 hour means that SSAS will continue to hold outdated data in its cache, upto 1 hour after the data has changed in the source system. The cache will be rebuilt after 1 hour. While the new cache is being built, SSAS will continue to use the old outdated cache. If rebuilding the cache does not complete within 3 hours, then we want SSAS to drop the outdated cache and access the underlying source database directly until the new cache is built. This is taken care of by the latency property.
Sales_Summary_CurrentMonth True 1 hour 3 hours Same as above.

 

Partition Name Aggregations Enabled Aggregation Percentage Rationale
Sales_Summary_2005 False NA Since the data for this period is going to be rarely used in reports, we can turn off aggregations.
Sales_Summary_2006 False NA Since the data for this period is going to be rarely used in reports, we can turn off aggregations.
Sales_Summary_2007 True 30% Since users may frequently compare data in the current time period versus previous year, we aggregate about 30% of the data in this partition.
Sales_Summary_CurrentYear True 50% Since data in this time period is going to be frequently used, we aggregate about 50% of the data in this partition.
Sales_Summary_CurrentMonth False 0% Designing aggregations for ROLAP partitions is rarely useful and recommended.

 

16. To implement these decisions, go to the Partitions tab and you should see that there is a default partition created for the “Sales Summary” measure group. The source of the partition is the “FactAmericaSales” table. Delete this partition.

17. Click on the “New Partition” link to create new partitions. In the “Specify Source Information” step of the Partition wizard, enable the checkbox next to the “FactAmericaSales” table and click on the “Next” button.

18. In the “Restrict Rows” step of the Partition Wizard, enable the “Specify a query to restrict rows” option. Delete the query that has been automatically populated in the Query box and replace it with the following query. The query restricts the partition to contain only data for the year 2005 and it also combines the data from multiple fact tables using a Union join.

Select * from dbo.FactAmericaSales WHERE OrderDateKey >= '20050101' and OrderDateKey <= '20051231'
 UNION ALL
 Select * from dbo.FactEuropeSales WHERE OrderDateKey >= '20050101' and OrderDateKey <= '20051231'

19. Click on the “Check” button to ensure that the query syntax is correct. If not correct the errors if any before proceeding ahead. If the query check completes successfully, click on the “Next” button.

20. In the “Processing and Storage Locations” page, leave the default options the way they are. For the sake of simplicity, we will be creating and processing all partitions locally. For step by step instruction on how to set up partitions on remote servers refer to the following link – http://msdn.microsoft.com/en-us/library/ms174837(SQL.90).aspx

21. In the “Completing the Wizard” step of the partition wizard, enter “Sales_Summary_2001” as the name of the partition and select the “Design Aggregations later” option. Click on the “Finish” button. A new partition which contains data only for the year 2001 is created.

22. Repeat steps 17 to 21 for all the partitions mentioned below –

Partition Name SQL Query
Sales_Summary_2006 Select * from dbo.FactAmericaSales WHERE OrderDateKey >= ‘20060101’ and OrderDateKey <= ‘20061231’
UNION ALL
Select * from dbo.FactEuropeSales WHERE OrderDateKey >= ‘20060101’ and OrderDateKey <= ‘20061231’
Sales_Summary_2007 Select * from dbo.FactAmericaSales WHERE OrderDateKey >= ‘20070101’ and OrderDateKey <= ‘20071231’
UNION ALL
Select * from dbo.FactEuropeSales WHERE OrderDateKey >= ‘20070101’ and OrderDateKey <= ‘20071231’
Sales_Summary_CurrentYear Select * from dbo.FactAmericaSales WHERE OrderDateKey >= ‘20080101’ and OrderDateKey <= ‘20080531’
UNION ALL
Select * from dbo.FactEuropeSales WHERE OrderDateKey >= ‘20080101’ and OrderDateKey <= ‘20080531’
Sales_Summary_CurrentMonth Select * from dbo.FactAmericaSales WHERE OrderDateKey >= ‘20080601’
UNION ALL
Select * from dbo.FactEuropeSales WHERE OrderDateKey >= ‘20080601’

23. Your partitions list should appear as shown below

24. Deploy your solution. Go to the “Browser” tab and click on the “Reconnect” button on the toolbar. If the “Order Quantity” measure is not already present in the pivot table, drag it from the “MeasuresSales Summary” node. The value should now be 120796. This is the correct consolidated value from both regions – America and Europe (60398 in each).

Task 2: Implementing ProActive Caching

1. In the Partitions tab of the cube designer, right click on the “Sales_Summary_CurrentYear” partition and select the “Storage Settings” option.

2. Change the slider to “Realtime HOLAP” and click on the “Options” button.

3. Change the “Silence Interval” to 1 hour and “Latency” to 3 hours.

4. Click on the “Notifications” tab. Ensure that the “SQL Server” option is enabled and enable the checkbox for the “Specify tracking table” option. Enable “FactEuropeSales” and “FactAmericaSales” as the tracking tables. Click on the “Ok” button to close the dialog.

5. Right click on the “Sales_Summary_CurrentMonth” partition and select the “Storage Settings” option. Change the slider to “Realtime ROLAP” and click on the “Options” button. Leave the default values for proactive caching. Click on the “Notifications” tab. Ensure that the “SQL Server” option is enabled and enable the checkbox for the “Specify tracking table” option. Enable “FactEuropeSales” and “FactAmericaSales” as the tracking tables. Click on the “Ok” button to close the dialog.

6. Deploy the project.

Task 3: Testing ROLAP Partitions

1. Open the file “updatesales.sql” from the source project accompanying this guide. Execute the query. The query deletes a SalesOrder (4 rows) from the “FactAmericaSales” table.

2. Go back to the cube designer in BIDS and open the “Browser” tab. Check the value of the “Order Quantity” measure group. You may need to click on the “Reconnect” button on the toolbar. The value of the measure should now be 120792. This validates our ROLAP partition and ensures that we do not need to reprocess the cube every time the source data is updated.

Advertisements

One Response to Optimizing an OLAP Solution using SSAS

  1. Pingback: Designing an OLAP Solution using SSAS « nauzadk

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: