Proactive Caching Explained


What is Proactive Caching?

SSAS provides us with different storage modes for cube partitions and dimensions – MOLAP, HOLAP and ROLAP. In MOLAP, the source data as well as the aggregations are stored in a multidimensional structure. Hence this mode gives us the best performance, but very high latency. At the other end, the ROLAP storage mode does not cause a copy of the source data to be stored in the multidimensional database and leaves it in the underlying relational database itself. Because of this, performance is not as good as MOLAP, but there is a misconception that is produces less latency and users can see the data as it changes in the source system. Proactive caching comes into the picture when you want the best of both worlds – users to be able to see the most recent data, but at the same time take advantages of performance benefits of the MOLAP mode. As the name suggests – data and aggregations are cached by SSAS. The cache gives us the performance boost. At the same time, SSAS can monitor the source system for changes to the data. At a set schedule or whenever the data in the underlying system changes, SSAS can invalidate the old cache and build a new one with the most up-to-date data from the source system.

Proactive caching can be configured on a partition in a cube or on a dimension. Also note that this feature is available only in the Enterprise edition of SSAS and not in the Standard edition.

Exploring Proactive Caching

In order to understand proactive caching a little better, I have put together a small demo thru which I will explain some other details. The demo is based on the AdventureWorks Analysis Services Project that ships as a sample with SQL Server.

If you look at the storage settings for any partition in a cube, you have the following standard settings –

With the exception of the MOLAP option (on the extreme right hand side), all the other options use Proactive caching (by default) in some way or another. The Scheduled MOLAP is the only option which uses proactive caching, but does not monitor the source system for changes. Rather it updates the cache on a fixed predefined interval. The Automatic, Medium Latency and Low Latency options are more or less the same and only differ in the values used for certain properties. To experiment with proactive caching, I am going to use one of these three options, let’s say the Automatic MOLAP option. Since this option depends on notifications sent by SQL Server whenever data in the source changes, we must specify one or more tracking tables. Alternatively we could use Scheduled polling instead of tracking tables.

If I click on the Options button, I am presented with the following dialog thru which I can enable or disable proactive caching, change its behavior and control notifications.

Right now, I am going to leave all these default values untouched and go to the Notifications tab. I have selected FactInternetSales as the tracking table (because the partition on which I am experimenting, is based on the FactInternetSales table).

Before I start browsing the cube, I am going to use SQL Server Profiler and start two profiler traces. One trace will run on the SSAS database and the other one will run on the AdventureWorksDW2008R2 relational database.

For the trace on SSAS, I have only taken events from the Queries and Query Processing Category.

For the trace on the AdventureWorksDW relational database, I have only taken the RPC:Completed, SQL:BatchStarting and SQL:BatchCompleted events. Furthermore, to reduce clutter, I have a column filter created on the DatabaseName column.

In the cube browser, I drag the Internet Order Quantity measure from the Internet Sales measure group. The current value displayed in 60398.

If I take a look at the Profiler trace for SSAS, I can see the presence of a number of Query Dimension and Calculate NonEmpty Begin events. The presence of these events indicates the building of the MOLAP cache in memory.

If I come back to the cube browser and refresh the window or drag some other measure from the same measure group, this time SSAS simply picks up the data from its internal cache.

This is evident from the Profiler trace which now shows the presence of the Get Date from Cache event.

All along, the SQL Profiler trace for the source relational database has been completely blank, proving that whatever queries we have fired so far have not needed to access the source system.

Now I am going to modify a row in the FactInternetSales table in the AdventureWorksDW2008R2 database. I have increased the value in the OrderQuantity column by one for one of the rows. If I come back to the cube browser, and refresh it, (and remember we have not re-processed the cube or the partition after changing the data), we will see the Internet Order Quantity incremented by one.

We can also now see a flurry of events generated in the trace window of the RDBMS related to the notifications sent by SQL Server to SSAS and the subsequent processing of the affected partitions triggered by SSAS.

This sequence of activities shows us the most basic capabilities of proactive caching. Cache the data, wait for notifications about changes in data, when notifications arrive, recreate the cache and serve hot fresh data.

Now let’s look at some other ways in which we can control proactive caching. There are 4 important properties –

  1. Silence Interval – This property allows you to instruct SSAS to ignore a change notification for the amount of time as specified by this property. For example, if I set Silence Interval to 5 mins, and if some data changes in the source system at 08:00 hrs. SQL Server will ignore this notification for the next 5 mins, i.e. until 08:05 hrs. This indirectly means that SSAS will not rebuild its internal cache until 08:05 hrs. This property is useful in scenarios where it is acceptable for users to see stale data up to a certain level (say 3 hours). So even if there is a change to the source system every 15 mins, SSAS will not rebuild its cache every 15 mins. It would wait until the silence interval (in this case – 3 hours) elapses before it starts to rebuild the cache.
  2. Silence Override Interval – This property allows you to instruct SSAS to ignore change notifications x units of time after it has started rebuilding its cache. For example, if SSAS takes an hour to rebuild its cache, and 45 mins into this rebuilding, it receives another data change notification, it would drop whatever it has done for the last 45 mins and again start from scratch to rebuild the cache. However if the Silence override interval is set to 10 mins, then once SSAS has spent 10 mins rebuilding the cache, it will ignore all other change notifications until the cache has been completely rebuilt.
  3. Latency – Latency is a property which allows us to control the duration of time for which stale data can be served by SSAS. For example, if SSAS receives a change notification at 08:00 hrs. and it starts rebuilding its cache. However rebuilding the cache may take an hour or so to complete. If as per the business requirements, the maximum latency that can be tolerated is 30 mins, then 30 mins would be the value assigned to the Latency property. When the time specified by the Latency property expires and SSAS has yet to finish rebuilding its cache, it will temporarily switch over to the ROLAP mode and directly access the data in the source RDBMS. For the duration of time that it takes to complete the rebuilding of the cache, the partition will act like a ROLAP partition. The Latency property works in conjunction with the Bring Online immediately property.
  4. Rebuild interval – If you want the cache to be rebuilt based on a fixed schedule, regardless of whether the data in the source system has changed or not, you can specify an appropriate value in the Rebuild interval property.

 

In the settings above, I have specified a silence interval of 2 minutes. The initial value of the Internet Order Quantity measure is 60398.

Using SQL Server management Studio, I increment the value of the OrderQuantity column by 1 for the 1st row in the FactInternetSales table. However SSAS still continues to return me the old value for the Internet Order Quantity measure.

I will have to wait for 2 minutes before refreshing the cube browser to be able to see the new updated value.

A look at the Profiler trace for SQL Server confirms our understanding. Below we can see that SQL Server Management Studio fired an update query at 19:01 hrs. and SSAS reprocessed our partition at 19:03, exactly 2 minutes after the update.

General mis-conceptions regarding ROLAP mode and caching.

During most of my SSAS training sessions, there is one point where I often find experienced SSAS professionals misunderstanding proactive caching and most of the misunderstanding has to do with the ROLAP storage mode. A lot of SSAS professionals will tell me that if I always (every MDX query) want SSAS to pick up data from the source tables, I should use ROLAP as the storage mode and disable proactive caching. This way nothing will be cached and the data delivered as output will always be the most up-to-date data.

That is wrong. While by definition, the ROLAP storage mode leaves data in the underlying relational system and does not make a copy of it in the multidimensional database (SSAS database). However before serving any data as the output of an MDX query, SSAS always caches the data. The cache is mandatory and currently to the best of my knowledge there is no way you can turn this caching behavior off. You can clear the cache whenever you want, but you cannot turn the caching feature off. The important thing to understand here is that proactive caching <> SSAS cache. When you turn off proactive caching (for any storage mode including ROLAP) you are simply turning off the feature that allows SSAS to detect changes in the source system and refresh the cache with fresh data when changes happen. When you disable proactive caching, you are not; as some people imagine, turning off the caching capability of SSAS. Proactive caching on or off, the SSAS cache still remains.

To prove this, I am going to change the storage mode of one of the partitions to ROLAP. I have also disabled proactive caching.

Now, the cube browser currently shows me a value of 60398 for Internet Order Quantity. I increment the value of OrderQuantity in the source system by 1. And no matter how long I wait, the output returned will always be 60398.

The profiler trace for the relational engine shows no trace of activity from SSAS no matter how many times I refresh my cube browser.

However the profiler trace for SSAS shows us the presence of the Get Data from Cache event, indicating that even through the storage mode is ROLAP; SSAS is still serving the query from its internal cache and not making a round trip to the relational engine everytime I run an MDX query.

And since proactive caching has been disabled, there are no notifications sent to SSAS informing it about the data changes in the source system and SSAS continues to dish out stale data from its internal cache.

Taking a cue from this blog post by Jamie Thompson, you can clear the SSAS cache manually using a query as follows –

<ClearCache xmlns=”http://schemas.microsoft.com/analysisservices/2003/engine”&gt;

<Object>

<DatabaseID>Adventure Works DW 2008R2</DatabaseID>

</Object>

</ClearCache>

 

If I refresh my cube browser after executing the above XMLA query, I get the new data returned as output.

A quick glance at the RDBMS profiler trace confirms that SSAS did reconnect to the FactInternetSales table to rebuild its internal cache.

So this should effectively prove that even in the ROLAP mode, caching does take place and you are not making a round trip between SSAS and your relational system for every query that is fired.

Advertisements

Designing Dimensions and Attributes


 

Dimension design is one of the most critical factors in designing a high performance SSAS solution. A bad dimension design can very easily break an otherwise well designed cube. As you would already know, dimensions are composed of attributes and these attributes can be related to one another thru User Defined Hierarchies as well as Attribute Relationships. Understanding and mastering the difference between Hierarchies and Relationships is important and can immensely help improve cube performance.

KeyColumn

The most important property that can be used to define an attribute is the KeyColumn property. The KeyColumn property can point to a single column in the source table or even multiple columns and is needed to uniquely identify each member of the attribute. Keep in mind the following recommendations while working with the KeyColumn property of an attribute.

  1. KeyColumns should hold as small a value as possible. When given a choice between pointing the KeyColumn of an attribute like Country to a string column called CountryName or a numeric column called CountryID, it is always better to go with CountryID. This reduces the size of the dimension and the processing time.
  2. Another benefit of going with the CountryID column instead of CountryName is that an ID is always constant over different locales, whereas Names can change. For example, a country name of India may be used wherever the locale is English, but if the default regional settings of an end-user is Hindi, then we would like the country name returned to him to be भारत. The KeyColumn needs to be a constant value not dependent on factors like locale, culture, etc.
  3. Sometimes you may need to pay special attention to the uniqueness of the values returned by the KeyColumn. For example, when choosing the KeyColumn for the ZipCode attribute of the Geography dimension, you may be tempted to choose the zipcode column itself. However there is every possibility that two cities in different corners of the world may have the same zipcode. So zipcode simply may not be enough to enforce uniqueness between different members of the ZipCode attribute. You may have to concatenate the ZipCode column with the City column, to enforce uniqueness.
  4. Special care needs to be taken with attributes of the Time dimension. For example, if I point the KeyColumn of the Month attribute to the MonthNumber or MonthName column in my source table, then the output produced by the cube would be something like this. There would be no way the user will be able to differentiate between the same month falling in different years or quarters.

The solution would be to have a composite key, comprising of Month and Year as the KeyColumn for the Month attribute. Composite keys should be used only when required as they increase the size of the dimension and the processing time.

NameColumn

The NameColumn property specifies the column whose value will be displayed to users. By default, if the NameColumn is not explicitly assigned, it is automatically set to the value of the KeyColumn property. In situations where the value returned by the KeyColumn is the final value that you want to show to your end-users, this behavior is fine and works well. However in most situations if your KeyColumn is something like CategoryID, then you probably want CategoryName to be returned and shown in your reports.

The NameColumn property becomes mandatory, whenever the KeyColumn is composite. The reason for this should be self-explanatory. If the KeyColumn is a composite value, then which one of those numerous values that it is composed of should be returned to the enduser?

The source column that NameColumn points to is valid only for the default locale. Using Dimension or Cube translations, the NameColumn of an attribute can be assigned to different source columns for different locales. For example, in the AdventureWorksDW database, the Product table has columns like EnglishProductName, SpanishProductName, FrenchProductName etc. The NameColumn can point to EnglishProductName for the default locale, but point to SpanishProductName for the Spanish locale.

Since every dimension must have a Key Attribute (analogous to the primary key in a table), the Key attribute mostly points to the primary key in the underlying table. For example, the Product dimension will use ProductID as its Key Attribute. However ProductID in most cases would be of no use to end-users. They would rather see the names of the products rather than some cryptic id or a running integer value. The Key attributes in such cases are excellent candidates for making effective use of the NameColumn property. The Key attribute of the Product Dimension (ProductID) can have its KeyColumn point to ProductId, but have its NameColumn property point to ProductName. This way we continue to use ProductID as the identifying column, but show a much more useful value to the end-user. This also allows us to eliminate the ProductName attribute from the dimension. Reducing the number of attributes in a dimension is always welcome.

ValueColumn

The ValueColumn property is perhaps one of least understood and used property. It allows us to hold some additional information about the attribute. For example, for an ProductName attribute, the KeyColumn could be ProductId, NameColumn could be ProductName and the ValueColumn could be ProductPhoto. The ValueColumn is typically used for calculations and its value can be accessed using the MemberValue MDX function.

AttributeHierarchyEnabled

In SSAS, every attribute added to a dimension results in the creation of attribute hierarchy. It also automatically creates member properties for every attribute that is directly related to the Key Attribute of the dimension. Attribute hierarchies can be used in the query axis of MDX queries, but member properties cannot. They can be used only by querying the attribute (mostly the key attribute) that contains the member property. For example, a employees date of birth of email address may be needed for informational purposes in reports, but not to slice or dice a measure on a query axis. Hence date of birth and email address can be classified as member properties rather than attribute hierarchies. So if we only want to access an attribute as a member property, we can disable the automatically generate attribute hierarchy by setting the AttributeHierarchyEnabled property to false.

Disabling AttributeHierarchyEnabled improves the processing performance of the cube. It also decreases the cube size because the attribute will no longer be indexed or aggregated. However this turn into a drawback as well since filtering measures using member properties will be much slower than attribute hierarchies.

AttributeHierarchyOptimizedState

The AttributeHierarchyOptimizedState property can be used to turn off the creation of bitmap indexes for an attribute. The following section is taken from the SSAS 2008R2 Performance Guide.

“During processing of the primary key attribute, bitmap indexes are created for every related attribute. Building the bitmap indexes for the primary key can take time if it has one or more related attributes with high cardinality. At query time, the bitmap indexes for these attributes are not useful in speeding up retrieval, because the storage engine still must sift through a large number of distinct values. This may have a negative impact on query response times.

For example, the primary key of the customer dimension uniquely identifies each customer by account number; however, users also want to slice and dice data by the customer’s social security number. Each customer account number has a one-to-one relationship with a customer social security number. You can consider removing the creation of bitmaps for the social security number.

You can also consider removing bitmap indexes from attributes that are always queried together with other attributes that already have bitmap indexes that are highly selective. If the other attributes have sufficient selectivity, adding another bitmap index to filter the segments will not yield a great benefit.

For example, you are creating a sales fact and users always query both date and store dimensions. Sometimes a filter is also applied by the store clerk dimension, but because you have already filtered down to stores, adding a bitmap on the store clerk may only yield a trivial benefit. In this case, you can consider disabling bitmap indexes on store clerk attributes.”

The full guide can be downloaded from here.

AttributeHierarchyOrdered

The AttributeHierarchyOrderded property controls whether members of an attribute hierarchy are sorted in a particular order or not. The OrderBy and OrderByAttribute property can be further used to control the exact way in which sorting is desired. In case there are certain attributes where the order of members is not important, then setting AttributeHierarchyOrdered to false can save processing time for the dimension.

IsAggregatable

The IsAggregatable property indicates whether an attribute can be aggregated in a hierarchy. If this property is true, then SSAS will automatically add an “ALL” member to the attribute.

In a majority of dimensions, an aggregation of all the descendants which rolls up to the ALL level is desired, but there are certain cases where it could be detrimental. The best example for this can be found in cubes which contain data that deals in multiple currencies. In such cases you will have a currency dimension. Performing a summation across all currencies would not make any business sense. So in order to prevent queries from accessing the ALL level, you can disable the ALL member by setting IsAggregatable to false.

Still to come – Attribute Relationships and User Defined hierarchies

Designing a Security strategy for an OLAP Solution using SSAS


Overview

This guide has been created for people who want step by step instructions on evaluating a set of requirements or a problem statement and implementing it into an OLAP solution using SSAS. Using this guide, you will learn how to –

1. Create various roles in the SSAS database

2. Assign users and permissions to each role.

3. Control access to a cube using Dimension Data, Cell data permissions

4. Control access to a cube using an external .net assembly

This guide assumes basic knowledge of SSAS and OLAP like what are dimensions, measures, cubes, how to create them using SSAS etc.

The source project accompanying this guide can be downloaded from here. The project uses the AdventureWorks2008R2 sample databases which can be downloaded from here.

Scenario

You are the lead solution designer for the multidimensional design team at Adventure Works. The company has started using a SSAS cube for its analysis and reporting requirements. Most of the employers in the organization use Excel for running ad-hoc queries against the database. At the moment, all employees have unrestricted access to the information present in the cubes. The AdventureWorks management wants to change this and has requested you to design and implement the security features that only allow employees to see data that they are authorized to.

The following table displays a list of users who need access to the cube.

Name Windows Username Designation Department Reports To
Ken Sanchez Ken CEO
Brian Welcker Brian COO Ken Sanchez
Amy Alberts Amy Marketing Manager Marketing Brian Welcker
Stephen Jiang Stephen Marketing Manager Marketing Brian Welcker
Syed Abbas Syed Marketing Manager Marketing Brian Welcker
Jae Pak Jae Marketing Executive Marketing Amy Alberts
Rachel Valdez Rachel Marketing Executive Marketing Amy Alberts
David Campbell Davidc Marketing Executive Marketing Stephen Jiang
Lynn Tsoflias Lynn Marketing Executive Marketing Syed Abbas
David Bradley Davidb CFO Finance Brian Welcker
John Wood John Accounts Executive Finance David Bradley
Jill Williams Jill HR Manager Human Resources Brian Welcker
Kevin Brown Kevin CTO IT Brian Welcker
James Hamilton James DBA IT Kevin Brown

Interview transcripts with the following stakeholders are provided:

Brian Welcker

Amy Alberts

David Bradley

Kevin Brown

Interview Transcript for Brian Weckler – COO

However much I would want to provide unrestricted access to information and data across our organization, there are certain compulsions and organizational boundaries that need to be respected. Hence our top priority at the moment is making sure that people don’t get to see data that they are not responsible for or not authorized to access. Of particular importance here, is our Sales and Finance related data. Only I, Ken, and people from the HR department should be able to get access to sales related information for all the employees. Finance folks should not have any access to this data. The people from the Marketing department should be able to see only information of Sales Persons who report to them directly or indirectly. To put it in plain and simple words, Amy should not be able to see Stephen’s sales figures and Syed should not be able to see Rachel’s sales figures. There are some other restrictions that we need to enforce too, but you should speak to Amy to get more clarity on those.

Interview Transcript for Amy Alberts – Marketing Executive

I am sure Brian must’ve already told you that he doesn’t want the Marketing department looking at data related to Sales Persons that don’t report to them directly or indirectly. Apart from that there are some other restrictions that we want to enforce. Since we operate in a number of geographical locations across the world, we don’t want Sales People assigned to one territory peeking at data that comes from another territory. I believe information about which Territory has been assigned to a Sales Person is already stored somewhere in our system.

Interview Transcript for David Bradley: CFO

I don’t have much to add here apart from what Brian has already discussed with you. One thing that I require is the ability to allow only people from the Finance department to have access to financial data. However Jill from the HR department is a little sensitive about disclosing detail level data related to an account heading called “Labor Expenses”. She wants to ensure that Accounts Executives like John Wood not have access to the values under it.

Interview Transcript for Kevin Brown: CTO

A lot of executives in the organization have questioned me if the DBAs and other engineers working in the IT department have access to all the sensitive information that is present in the cube. I know they do at the moment but I don’t want them to. I also know that they periodically need to carry out administrative related tasks on the database, but that’s all they should be having permissions for. At the moment I would like to ensure that I am the only person with administrative privileges as far as the cube is concerned.

Task 1: Creating Roles

1. Open a solution named Module6_Starter.sln from the source code accompanying this guide.

2. Based on the requirements presented earlier, the list of roles that would need to be created in SSAS and the users who would need to be added as members of those roles are mentioned below. A further explanation about the choices is given in the subsequent points.

Role Members Cube Administrator Process Dimensions Measure groups the role should not have permission for Hierarchies that should be restricted for this role Security Level
Senior Managers Ken SanchezBrian Welcker False False
Marketing Amy AlbertsStephen JiangSyed Abbas False False Financial Reporting Employee.EmployeesGeography.Geography Dimension data
Finance Managers David Bradley False False Reseller SalesInternet Sales
Finance Users John Wood False False Reseller SalesInternet Sales Account.Accounts Cell data
Cube Admins Kevin Brown True True
IT Engineers James Hamilton False True Reseller SalesInternet SalesFinancial Reporting

3. As per Kevin Brown’s interview transcript, only he should have administrative access to the cube. The other Engineers should have permissions to perform administrative related activities like processing dimensions and partitions, but not have access to dimension data or measures.

4. As per Brian Welcker’s interview transcript, Finance guys should not have any access to Sales data. Hence both Finance Managers and Finance Users are denied access to the Reseller and Internet Sales measure groups. Also only Senior Managers are allowed unrestricted access to Sales data. The folks from the Marketing department are only allowed to see data for the Employees that report directly or indirectly to them. Hence the marketing role will need to be given selective permissions to the Employee.Employees hierarchy.

5. As per Amy Albert’s interview transcript, Sales people should be allowed to see data only for the geographical location that they are assigned to. Hence the marketing role will also need to be restricted access to the Geography.Geography dimension.

6. As per David Bradley’s interview transcript, only Finance folks should be allowed to see financial data. Hence the marketing role should be denied access to the Financial Reporting measure group. Also the Finance Users role should not be able to see measures under an Account head called Labor Expenses. Hence Cell Data permissions would need to be configured for the Account.Accounts hierarchy.

  1. In the Analysis Service project that you opened earlier, right click on the “Roles” node in solution explorer and select the “New Role” option. A new file called “Role.role” should be created and added under the node.
  2. Rename the file to “Senior Managers.role”. When you are prompted to change the name of the object as well, click on the “Yes” button.
  3. In the “General” tab of the role designer, enable the “Read Definition” checkbox.

  1. Click on the “Data Sources” tab and select the “Read” permission under the “Access” column for the “AdventureWorks DW” data source
  2. Click on the “Cubes” tab and select the “Read” permission under the “Access” column for the “Module6_Starter” cube.
  3. Repeat steps 7 to 11 for all the other roles as well. At the end, you should have the following roles created – Senior Executives, Marketing, Finance Managers, Finance Users, Cube Administrators, IT Engineers.

Task 2: Assigning Permissions to Roles

1. Open the “Cube Administrators” role and enable the “Full Control (Administrator)” checkbox.

2. Open the “IT Engineers” role and enable the “Process database” checkbox.

3. Open the “Marketing” role and click on the “Dimension Data” tab. Select the “Measures Dimension” option in the “Dimension”.

4. Since members of the Marketing role are not allowed access to financial figures, remove the checkbox next to the “Amount” measure. Close the role.

5. Open the “Finance Managers” role and click on the “Dimension Data” tab. Select the “Measures Dimension” option in the “Dimension”. Since members of the Finance department are not allowed to see Sales Information, remove the checkbox next to all the measures except the “Amount” measure.

6. Repeat the above step for the “Finance Users” role as well.

7. Open the “IT Engineers” role and click on the “Cubes” tab. Since IT Engineers can only perform administrative tasks and are not allowed access to any data in the cube, change the permissions to “None” under the “Access” column for the “Module6_Starter” cube.

8. Deploy the project. Then open the “Module6_Starter” cube and go to the “Browser” tab.

9. Click on the “Change User” button on the toolbar and select one of the roles you have created. Ensure that the role selected can only view measures as per the scenario document. For example, if we try to browse the cube using the Finance Users role, then we can have access to only the Financial Reporting measure group.

Task 3: Assigning Dimension Data permissions to roles.

1. Open the “Marketing” role and click on the “Dimension Data” tab. Select the “Employee” dimension in the “Dimension” dropdown. Select the “Employees” hierarchy in the “Attribute” dropdown.

2. Click the “Advanced” tab and in the “Allowed Member Set” textbox, enter the following MDX expression. This MDX expression filters the members of the Employees hierarchy to include only those members whose LoginID attribute is equal to the current logged-in userid. This will ensure that each employee can only view Sales related information for Employees in the hierarchy above him or below him.

FILTER( [Employee].[Employees].Members, [Employee].[Employees].CurrentMember.Properties(“Login ID”) = UserName)

3. Go to the Membership tab of the Marketing role and add your current logged in account as a member in this role.

4. Deploy the solution.

5. In order to test this functionality, you need to take your system username and assign it to one of the users in the Marketing role eg. Amy Alberts. Open SQL Server Management Studio and execute the following sql query against the AdventureWorksDW2008R2 database. This query updates the LoginID column of Amy Alberts (employeeid = 290) to my system login. It also assigns a different sales territory (Australia) to her, which will help us in the subsequent tasks.

Update dbo.DimEmployee Set LoginID=’dellxpsnauzadk’, SalesTerritoryKey=9 where employeekey=290

6. Go back to the cube browser and click on the “Change User” button. Enable the Roles radio button and select the Marketing Role.

7. Drag the Employee.Employees hierarchies in the columns section of the cube browser. Drag the “Reseller Sales Amount” measure to the data section of the cube browser.

8. Without implementing the Dimension Data permissions for the Marketing role, this is the output that Amy Alberts would have seen.

9. However, if the Dimension Data permissions have been successfully implemented, you should only be able to see sales information of only Amy Alberts and the people who report to her and her superiors. Sales information of other marketing executives like Stephen Jiang or Syed Abbas should not be displayed.

Task 4: Assigning Cell Data Permissions

1. Open the “Finance Users” role and click on the “Cell Data” tab.

2. Enable the “Enable Read Permissions” check box and enter the following MDX expression in the textbox. This expression will allow members of the role to see only those cell values as long as the Account member is not under a parent member named “Labor Expenses”.

[Account].[Accounts].CurrentMember.Parent.Name <> “Labor Expenses”

3. Deploy the solution. Go to the cube browser and click on the “Change User” button. Enable the “Role” dropdown and select the “Finance Users” role.

4. Drag the Account.Accounts hierarchies in the columns section of the cube browser. Drag the “Amount” measure (from the Financial Reporting measure group) to the data section of the cube browser. Expand the accounts until you reach the child members under the “Labor Expenses” account. You should not see any cell values for the child accounts as shown below.

Task 5: Using a .NET Assembly to control permissions

1. Open the “Marketing” role and click on the “Dimension Data” tab. Select the “Geography” dimension in the “Dimension” dropdown. Select the “Geography Key” attribute in the “Attribute” dropdown.

2. Click the “Advanced” tab and in the “Allowed Member Set” textbox, enter the following MDX expression.

Filter([Geography].[Country].Children,StrToMember([SSASLibrary].[SSASLibrary].[Class1].GetEmployeeTerritory(UserName)))

3. This MDX expression filters the Geography dimension to include only those members whose Sales Territory Key is equal to the Sales Territory of the current logged in user. This will ensure that each member of the Marketing Role can only view Sales related information for the geographical locations that are assigned to him. This MDX expression uses a managed stored procedure called “GetEmployeeTerritory” to retrieve the Sales Territory that is assigned to a Sales Person. The source code for the stored procedure is located in the SSASLibrary folder in the source code accompanying this guide.

4. The source code connects to the AdventureWorksDW2008R2 database and queries the DimEmployee table and retrieves the Territory assigned to the loginid (passed to it as an input parameter). The sales territory is then formatted as a Member expression and returned as output from the function. The MDX expression uses the STRToMember function to convert the string value to a Member expression. Note that this requirement could even have been tackled without an external .net assembly, but I have used it just to illustrate one more way in which complex access control requirements could be implemented.

5. Deploy the solution. Go to the cube browser and click on the “Change User” button. Enable the “Role” dropdown and select the “Marketing” role. Drag the Geography.Geography hierarchy in the columns section of the cube browser. Drag the “Reseller Sales Amount” measure to the data section of the cube browser.

6. You should only be able to see sales information of the country assigned to the current logged in user (provided the current logged in user is a member of the Marketing Role). For example, in the figure below, since Amy Alberts has been assigned a Sales Territory of 9 (Australia), she can see only data for that location and no other.

Optimizing an OLAP Solution using SSAS


Overview

This guide has been created for people who want step by step instructions on evaluating a set of requirements or a problem statement and implementing it in an OLAP solution using SSAS. In the process, you will learn how to –

1. Design a cube partitioning strategy for measure groups.

2. Choose an appropriate storage setting and location for each partition.

3. Design pro-active caching and latency and refresh intervals.

4. Design an aggregation design strategy.

The source code accompanying this guide can be downloaded from here. The source project uses the AdventureWorks2008R2 sample databases, which can be downloaded from here.

Scenario

You are the lead solution designer for the multidimensional design team at Adventure Works. Adventure Works sells bicycles and components in America and Europe. Due to regional differences and requirements, both regions use a different OLTP system for maintaining their sales information. Inspite of the differences in the systems and data formats used in these two regions, the SSIS team has done a great job of standardizing the data structures when the data is loaded in the warehouse. The ETL package developed by the team loads the daily transactional data into two different tables having the same structure viz FactAmericaSales and FactEuropeSales. The team has assured the management that when the company diversifies into other regions, they will be able to replicate the same strategy.

The management team at AdventureWorks Headquarters (HQ) uses a SSAS cube for their analysis and reporting requirements. The team at the HQ is more interested in consolidated information from all the regions (Europe and America at the moment) rather than data that are sliced region-wise.

Problem Statement 1 – The management team at the AdventureWorks Headquarters (HQ) currently uses a SSAS cube for their reporting and analysis requirements. The cube contains two measure groups, each pointing to the fact tables for the Europe and American region. This forces the management to analyze the data on a per region basis which is not very efficient and useful. You have been assigned to fix this on a top priority basis.

Problem Statement 2 – The management is also very concerned about the latency involved in seeing up-to-date information in the cube. Sales happen round the clock and the transactional data is loaded into the warehouse on an hourly basis. The management would ideally like to analyze order information within a window of 3 hours. They have instructed the database administration team to schedule cube processing at a time interval of not more than 3 hours. As part of the company policy, changes to certain sales related information is allowed only in the current month. Some financial related data related to orders can be changed until the financial year ends, but these changes are rare and infrequent.

Problem Statement 3 – The database administration team is skeptical on whether the current hardware resources will be able to satisfy the management’s latency requirements. They feel that frequent processing of all the sales information will strain the resources to a great extent. They feel the problem will be compounded after the multidimensional design team completes their task of consolidating the regional sales data. They are also worried about whether the storage capacities on the current SSAS server will be able to handle the data volume once AdventureWorks expands to more regions. They have suggested in the past that the multidimensional design team try to use some unused servers to store data which is more than two years old which is very infrequently accessed. They hope that this time with the new design decisions being taken to improve cube efficiency, their suggestions would be given a serious thought. They also want the development team to take into account the pressures on the storage capacity while designing aggregations (if any).

Preparing the AdventureWorksDW database to be used with this guide

This guide uses a modified version of the AdventureWorksDW2008R2 database. There are two additional tables viz FactAmericaSales and FactEuropeSales created in this modified database. These tables are just copies of the FactInternetSales tables (both in terms of structure as well as data). These two tables are used to provide solutions based on the above problem statements. The source project accompanying this guide contains a file called PrepareDB.sql. Open and execute the contents of this file.

Task 1: Designing and Implementing a Partitioning strategy

1. Open a solution named Module5_Starter.sln from the source project accompanying this guide.

2. Open the cube named Module5 and you will notice that there are two measure groups viz. Sales-America and Sales-Europe.

3. Deploy the project. You may need to change the deployment properties or connection information of the data source.

4. Go to the cube browser and drag the measures from the Sales – America measure group to the data region of the cube browser. Note the values of the measures. Repeat the same for the measures in the Sales – Europe measure group. You will use these values to validate the partitions you implement in the subsequent tasks.

5. Since the AdventureWorksDW2008R2 database contains sample data from the year 2005 to 2008, we will consider the current financial year as 2008. Also the current month would be assumed to be June.

Solving Problem Statement 1

6. For the first problem statement, the ideal solution would be to create a single measure group pointing to one of the Fact tables and then change the source query of the partition to union all the data from the other fact tables for the other regions. This way since there is only one measure group, users will not have to view multiple measures for things like Sales Amount, Order Quantity etc.

7. Since we have decided to create one consolidated measure group, delete the existing measure groups present in the cube.

8. You should also delete the three time dimensions – DueDate, ShipDate and Date from the Dimensions pane. Your cube design should appear as shown below.

9. Right click the now empty “Module5” node in the Measures pane and choose the “New measure” option. From the “Source Table” drop down, choose the “FactAmericaSales” table. From the “Source Columns” listbox, choose the “OrderQuantity” column. Ensure that the “Usage” dropdown has a value of “Sum” and click “OK”.

10. Repeat the above step and add a measure using the SalesAmount column from the same table.

11. Add a measure called Sales Count. For usage, choose Count of Rows.

12. Rename the measure group by right clicking on “Fact America Sales” node and choosing the “Rename” option. Change the name of the measure group to “Sales Summary”.

13. Deploy the project. Go to the “Browser” tab in the cube designer and drag the “Order Quantity” measure from the Measures node to the data region of the pivot table. Since the measure group is only referring to one of the Fact tables (FactAmericaSales, the value displayed should be 60398).

Solving Problem Statement 2 and 3

14. In problem statement 2, it is mentioned that most of the updates happen to data that is valid for the current month (which in our case is June 2008). Updates to data for the current year (which in our case is 2008) are also possible but rare and infrequent. Data older than the current year cannot be updated. Keeping this in mind, we can create atleast 3 partitions for our measure group. One of them could be called Sales_Summary_CurrentMonth and the other Sales_Summary_CurrentYear. We could store all the data for all the previous years in a separate partition, but that could make the partition extremely large and difficult to maintain. A better strategy would be to store data for each financial year in its own partition. Since we have data from the year 2005 onwards, the total number of partitions that we could create would be – Sales_Summary_2005, Sales_Summary_2006, Sales_Summary_2007, Sales_Summary_CurrentYear and Sales_Summary_CurrentMonth.

15. The other properties of these partitions would be as follows –

Partition Name Storage mode Storage Location Rationale
Sales_Summary_2005 MOLAP Remote Since the data for this period never changes, MOLAP is the best storage mode. Also since this data is likely to be rarely used in reports, we can store this partition on remote servers
Sales_Summary_2006 MOLAP Remote Since the data for this period never changes, MOLAP is the best storage mode. Also since this data is likely to be rarely used in reports, we can store this partition on remote servers
Sales_Summary_2007 MOLAP Local Since the data for this period never changes, MOLAP is the best storage mode. Since this data is likely to be used frequently in reports, we can store this partition on local SSAS servers
Sales_Summary_CurrentYear HOLAP Local Since the data for this period changes, but rarely, we can try and use HOLAP as the storage mode. Raw data would be stored in the relational system and aggregations would be stored in SSAS. Since this data is likely to be used frequently in reports, we can store this partition on local SSAS servers
Sales_Summary_CurrentMonth ROLAP Local Since the data for this period changes very frequently and latency is critical, we can use ROLAP as the storage mode. Raw data would be stored in the relational system. Since this data is likely to be used frequently in reports, we can store this partition on local SSAS servers

 

Partition Name Proactive Caching Silence Interval Latency Rationale
Sales_Summary_2005 False NA NA Since the data for this period never changes, proactive caching is useless and should not be enabled.
Sales_Summary_2006 False NA NA Since the data for this period never changes, proactive caching is useless and should not be enabled.
Sales_Summary_2007 False NA NA Since the data for this period never changes, proactive caching is useless and should not be enabled.
Sales_Summary_CurrentYear True 1 hour 3 hours We can take advantage of proactive caching for this partition so that data and aggregations can be stored in memory rather than being retrieved from the RDBMS everytime. Since a latency of 3 hours is permitted as per business rules, we can change the silence interval accordingly and latency accordingly. A silence interval of 1 hour means that SSAS will continue to hold outdated data in its cache, upto 1 hour after the data has changed in the source system. The cache will be rebuilt after 1 hour. While the new cache is being built, SSAS will continue to use the old outdated cache. If rebuilding the cache does not complete within 3 hours, then we want SSAS to drop the outdated cache and access the underlying source database directly until the new cache is built. This is taken care of by the latency property.
Sales_Summary_CurrentMonth True 1 hour 3 hours Same as above.

 

Partition Name Aggregations Enabled Aggregation Percentage Rationale
Sales_Summary_2005 False NA Since the data for this period is going to be rarely used in reports, we can turn off aggregations.
Sales_Summary_2006 False NA Since the data for this period is going to be rarely used in reports, we can turn off aggregations.
Sales_Summary_2007 True 30% Since users may frequently compare data in the current time period versus previous year, we aggregate about 30% of the data in this partition.
Sales_Summary_CurrentYear True 50% Since data in this time period is going to be frequently used, we aggregate about 50% of the data in this partition.
Sales_Summary_CurrentMonth False 0% Designing aggregations for ROLAP partitions is rarely useful and recommended.

 

16. To implement these decisions, go to the Partitions tab and you should see that there is a default partition created for the “Sales Summary” measure group. The source of the partition is the “FactAmericaSales” table. Delete this partition.

17. Click on the “New Partition” link to create new partitions. In the “Specify Source Information” step of the Partition wizard, enable the checkbox next to the “FactAmericaSales” table and click on the “Next” button.

18. In the “Restrict Rows” step of the Partition Wizard, enable the “Specify a query to restrict rows” option. Delete the query that has been automatically populated in the Query box and replace it with the following query. The query restricts the partition to contain only data for the year 2005 and it also combines the data from multiple fact tables using a Union join.

Select * from dbo.FactAmericaSales WHERE OrderDateKey >= '20050101' and OrderDateKey <= '20051231'
 UNION ALL
 Select * from dbo.FactEuropeSales WHERE OrderDateKey >= '20050101' and OrderDateKey <= '20051231'

19. Click on the “Check” button to ensure that the query syntax is correct. If not correct the errors if any before proceeding ahead. If the query check completes successfully, click on the “Next” button.

20. In the “Processing and Storage Locations” page, leave the default options the way they are. For the sake of simplicity, we will be creating and processing all partitions locally. For step by step instruction on how to set up partitions on remote servers refer to the following link – http://msdn.microsoft.com/en-us/library/ms174837(SQL.90).aspx

21. In the “Completing the Wizard” step of the partition wizard, enter “Sales_Summary_2001” as the name of the partition and select the “Design Aggregations later” option. Click on the “Finish” button. A new partition which contains data only for the year 2001 is created.

22. Repeat steps 17 to 21 for all the partitions mentioned below –

Partition Name SQL Query
Sales_Summary_2006 Select * from dbo.FactAmericaSales WHERE OrderDateKey >= ‘20060101’ and OrderDateKey <= ‘20061231’
UNION ALL
Select * from dbo.FactEuropeSales WHERE OrderDateKey >= ‘20060101’ and OrderDateKey <= ‘20061231’
Sales_Summary_2007 Select * from dbo.FactAmericaSales WHERE OrderDateKey >= ‘20070101’ and OrderDateKey <= ‘20071231’
UNION ALL
Select * from dbo.FactEuropeSales WHERE OrderDateKey >= ‘20070101’ and OrderDateKey <= ‘20071231’
Sales_Summary_CurrentYear Select * from dbo.FactAmericaSales WHERE OrderDateKey >= ‘20080101’ and OrderDateKey <= ‘20080531’
UNION ALL
Select * from dbo.FactEuropeSales WHERE OrderDateKey >= ‘20080101’ and OrderDateKey <= ‘20080531’
Sales_Summary_CurrentMonth Select * from dbo.FactAmericaSales WHERE OrderDateKey >= ‘20080601’
UNION ALL
Select * from dbo.FactEuropeSales WHERE OrderDateKey >= ‘20080601’

23. Your partitions list should appear as shown below

24. Deploy your solution. Go to the “Browser” tab and click on the “Reconnect” button on the toolbar. If the “Order Quantity” measure is not already present in the pivot table, drag it from the “MeasuresSales Summary” node. The value should now be 120796. This is the correct consolidated value from both regions – America and Europe (60398 in each).

Task 2: Implementing ProActive Caching

1. In the Partitions tab of the cube designer, right click on the “Sales_Summary_CurrentYear” partition and select the “Storage Settings” option.

2. Change the slider to “Realtime HOLAP” and click on the “Options” button.

3. Change the “Silence Interval” to 1 hour and “Latency” to 3 hours.

4. Click on the “Notifications” tab. Ensure that the “SQL Server” option is enabled and enable the checkbox for the “Specify tracking table” option. Enable “FactEuropeSales” and “FactAmericaSales” as the tracking tables. Click on the “Ok” button to close the dialog.

5. Right click on the “Sales_Summary_CurrentMonth” partition and select the “Storage Settings” option. Change the slider to “Realtime ROLAP” and click on the “Options” button. Leave the default values for proactive caching. Click on the “Notifications” tab. Ensure that the “SQL Server” option is enabled and enable the checkbox for the “Specify tracking table” option. Enable “FactEuropeSales” and “FactAmericaSales” as the tracking tables. Click on the “Ok” button to close the dialog.

6. Deploy the project.

Task 3: Testing ROLAP Partitions

1. Open the file “updatesales.sql” from the source project accompanying this guide. Execute the query. The query deletes a SalesOrder (4 rows) from the “FactAmericaSales” table.

2. Go back to the cube designer in BIDS and open the “Browser” tab. Check the value of the “Order Quantity” measure group. You may need to click on the “Reconnect” button on the toolbar. The value of the measure should now be 120792. This validates our ROLAP partition and ensures that we do not need to reprocess the cube every time the source data is updated.

Designing an OLAP Solution using SSAS


Overview

This guide has been created for people who want step by step instructions on evaluating a set of requirements or a problem statement and converting it into an OLAP solution using SSAS. In the process, you will learn how to

1. Organize a Data Source View using Diagrams.

2. Determine and create Calculated Columns and Named Queries.

3. Determine the Attributes required for each dimension.

4. Design the Attribute properties.

5. Design Attribute Relationships.

6. Design User defined Hierarchies.

7. Review Dimension Usage (Relationships between Dimensions and Measure Groups)

8. Determine and create the measures required in the cube.

This guide assumes basic knowledge of SSAS and OLAP like what are dimensions, measures, cubes, how to create them using SSAS etc.

The source project accompanying this guide can be downloaded from here. The project uses the AdventureWorks2008R2 sample databases which can be downloaded from here.

Scenario

You are the lead solution designer for the multidimensional design team at Adventure Works. Adventure Works sells bicycles and components, through an Internet site and retail outlets. Although the company has been tracking sales for the last several years using its online transaction processing (OLTP) database, it now also wants to analyze sales data to compare the results of the two sales approaches (sales thru its website and sales thru retailers). It also wants to identify key sales trends.

Interview transcripts with the following stakeholders are provided:

Sales Manager—Retail Division
Sales Manager—Internet Division
Marketing Director

Interview Transcript for Stephen Jiang: Sales Manager—Reseller Division

The Reseller Sales division covers all of our worldwide resellers, and we’re responsible for setting sales quotas for all of our sales employees, and for tracking sales by resellers. The Sales Management team doesn’t really do much in the way of detailed analysis, other than reviewing printed reports.

The exception to that is a small team of sales analysts, led by José Saraiva. Usually they show us sales totals for each quarter by geography and by product category or sub category, but they’re able to drill down into specific resellers and products, so for example if one region shows a particularly low sales total one quarter, we can find out whether sales were low across all resellers and products. The sales are totaled by quarter, but I’d like for us to be able to break that down into monthly totals and sometimes even daily totals if necessary. Although not very frequently, but sometimes we may also want to see a breakup of sales figures against Sales Order Numbers.

We define sales quotas on a quarterly basis for our sales people, and I’d like to be able to track each employee’s performance against the quota to see whether I need to provide any additional support or training to help them meet their target.

Interview Transcript for Linda Mitchell: Sales Manager—Internet Division

The Internet Sales division processes all orders placed through our Web site. I need to be able to see what products are bought through our Internet site, and information about the customers who buy the products. At the moment, I get a monthly report showing me order totals for each product that month. I can also use Excel to export a list of all orders for the month, but the list is generally too big to be useful for anything other than grouping by customer to find high-spending repeat customers. It also takes ages to export the data, and I often just cancel part way through.

I know that we sell products all over the world, but I’d like to be able to compare order patterns and figures in geographical locations. I’d be really interested in being able to see the different results by country or region, but I’d also like to be able to drill down to state or province level, then into cities, and right to postal code levels.

The monthly order totals are useful, but I’d like to be able to find patterns in Internet orders at a lower level of granularity. For example, it would be useful to know what proportion of our Internet orders are placed at weekends, and to be able to see how Internet order totals are affected by holidays. I’d also like to be able to see the cost associated with each sale, so that we can get an idea of which items are most profitable.

Interview Transcript for David Bradley: Marketing Director

The Marketing department tries to promote sales for both the Internet and reseller channels. Our main strategy is to create time-limited, special offer campaigns to promote sales of specific products, but we’d like to be able to gather customer data that would help us take a more direct marketing approach.

We currently look at marketing campaigns by financial periods. For example, we need to see aggregated sales totals for the company’s financial year, and be able to drill into financial quarters, months, and right down to individual days. We need to compare these campaigns based on the financial period, whether the sales were from the Retail or Internet Sales division, several different customer demographic properties and products. To justify marketing campaign costs we also want to see the average revenue earned per customer in a particular time period or for a new product launch. The product information needs to be broken down by category and subcategory. Currently, we are using several complex, static reports in Excel in addition to Reporting Services. No one report can give us the option to easily slice and dice all of the properties we need to determine the effectiveness of different campaigns.

We’d like to be able to analyze Internet sales by customer, based on demographics such as age, address, education level, marital status, gender, occupation, number of children, yearly income, car ownership, and so on. We know that some of our Internet customers have provided this information, but it’s “locked up” somewhere in the sales system. We’d like to be able to get at that data and use it to identify buying patterns and trends that will help us create effective direct marketing campaigns. For example, we have recently identified that a customer’s gender and commute distance between home and office plays a very important role in his buying patterns. In the near future we would like to see most of our sales figures sliced by commute distance and gender.

For the reseller channel, we need to be able to see a breakdown of sales by region, and we need to be able to drill into individual resellers to see how the type of reseller (such as specialty bike shop, value-added reseller, or warehouse) affects what the reseller buys from us.

Additionally, when we analyze customer sales, we’d like to be able to see the total number of discrete sales, not just the total amount of money spent. Also we may frequently want to see consolidated sales figures of the reseller as well as the internet sales department.

Task 1: Organizing the Data Source View

1. Open a solution named “Module4_Starter.sln” from the source code accompanying this guide.

2. Open the “AdventureWorks” data source view and spend some time reviewing the tables that are included in it.

3. Go to the Diagram organizer and create 3 diagrams and name them as “Reseller Sales Division”, “Internet Sales Division” and “Marketing Division”.

4. Add the tables related to each division in the respective diagram. You may re-arrange the tables or choose the “Auto Arrange” option to make the diagram more legible. Make sure the following tables are placed in the “Reseller Sales Division” diagram –

DimProduct, DimProductCategory, DimProductSubCategory, DimReseller, DimEmployee, DimSalesTerritory, DimDate, DimGeography, FactResellerSales.

Make sure the following tables are placed in the “Internet Sales Division” diagram –

DimProduct, DimProductCategory, DimProductSubCategory, DimDate, DimCustomer, DimSalesTerritory, DimGeography, FactInternetSales.

Make sure the following tables are placed in the “Marketing Division” diagram –

DimPromotion, DimEmployee, DimDate, DimGeography, FactInternetSales, FactResellerSales.

Task 2: Creating Named Columns and Queries in Data Source View

1. Based on the interview transcripts, we will need to make some changes to the tables that are currently present in the DSV.

2. Firstly, we need to concatenate the FirstName, MiddleName and LastName columns in the DimCustomer table, so that we can return a complete customer name from our cube. Right click on the DimCustomer table and select the “New Named Calculation” option. Specify the column name as “FullName” and the following query as the Expression –

CASE
WHEN MiddleName IS NULL THEN
FirstName + ' ' + LastName
ELSE
FirstName + ' ' + MiddleName + '.' + ' ' + LastName
END

3. Repeat step no. 2 for the DimEmployee table as well.

4. In the DimTime table, we have columns that return a Month Number or Month Name like April, May, June etc. However when a month is used to slice a measure, the user won’t be able to tell in which year the month is falling. For example, if you look at the figure below, it is impossible to tell in which year the month of July is falling under. The same is applicable for Quarters as well. The other problem with this data the way it currently is, is that there is no uniqueness in the Month attribute of the Date dimension. Without uniqueness, we will not be able to create attribute relationships (and attribute relationships are a very critical part of SSAS project design).

5. To address these issues, we will need to create new calculated columns in the DimTime table. For example, we will create a new calculated column (say MonthName) and assign a concatenated value of EnglishMonthName + CalendarYear. The result of this can be seen in the figure below –

6. Add a new calculated column to the DimTime table.

7. Specify MonthName as the Column name and assign the following expression to it.

EnglishMonthName+’ ‘+ CONVERT(CHAR (4), CalendarYear)

8. Create additional calculated columns in the DimTime table as per the table below –

Column Name Expression
CalendarQuarterOfYear ‘CY Q’ + CONVERT(CHAR (1), CalendarQuarter)
FiscalQuarterOfYear ‘FY Q’ + CONVERT(CHAR (1), FiscalQuarter)
CalendarQuarterDesc Q’ + CONVERT(CHAR (1), CalendarQuarter) +
‘ CY ‘ +
CONVERT(CHAR (4), CalendarYear)
FiscalQuarterDesc Q’ + CONVERT(CHAR (1), FiscalQuarter) +’ ‘+
‘FY ‘ +
CONVERT(CHAR (4), FiscalYear)
CalendarYearDesc ‘CY ‘+ CONVERT(CHAR (4), CalendarYear)
FiscalYearDesc ‘FY ‘+ CONVERT(CHAR (4), FiscalYear)

9. In the interview transcripts, David Bradley mentions – “Also we may frequently want to see consolidated sales figures of the reseller as well as the internet sales department.” At the moment, Reseller Sales data is coming from the FactResellerSales table and the Internet data is coming from the FactInternetSales table. To be able to consolidate the data and present it, we can create a named query (analogous to a view in a database) and use it to combine the data from both the tables.

To create a named query, right click in the Data Source View and click New Named Query. Enter FactSalesSummary as the name of the query and assign the following SQL UNION query to it.

SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, ResellerKey, NULL AS CustomerKey, EmployeeKey, PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, 'Reseller' AS SalesChannel
FROM dbo.FactResellerSales
UNION
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, NULL AS ResellerKey, CustomerKey, NULL AS EmployeeKey, PromotionKey, CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber, OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount, ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight, CarrierTrackingNumber, CustomerPONumber, 'Internet' AS SalesChannel
FROM dbo.FactInternetSales

Task 3: Implementing Dimensions

1. The next step is to create the dimensions. The interview transcripts should help you in determining and designing the Dimensions that will be required in this OLAP solution. A good Dimension design is very critical for the success of a SSAS solution. To help you out, the keywords highlighted in red are the Dimensions that will be required and the keywords highlighted in blue are the attributes that will be required. As a recommendation, when it comes to adding attributes to a dimension, it is always better to be conservative rather than liberal. That means, if you are not sure if a particular attribute is going to be useful or not, it is better not to add it to the dimension.

2. The table below gives a list of all the attributes that we will need to add to the dimensions along with the Key Column. In some situations, the attribute has a different Name Column assigned to it. As a good dimension design, you also need to pay attention to certain properties like AttributeHierarchyEnabled, AttributeHierarchyVisible, AttributeHierarchyOptimized and GroupingBehavior. For a more detailed discussion on the different properties that can be used to define attributes in a dimension, refer to the following post.

Dimensions Attribute Name Key Column Name Column Enabled Optimized Visible Grouping
Behaviour
Geography
Geography Key GeographyKey TRUE TRUE FALSE Discourage
Country CountryRegionCode EnglishCountryRegionName TRUE TRUE FALSE Encourage
State-Province StateProvinceCode StateProvinceName TRUE TRUE FALSE Encourage
City City TRUE TRUE FALSE Encourage
Postal Code PostalCode TRUE TRUE FALSE Discourage
Promotion
Promotion Key PromotionKey EnglishPromotionName TRUE TRUE FALSE Discourage
Promotion Type EnglishPromotionType TRUE TRUE FALSE Encourage
Promotion Category EnglishPromotionCategory TRUE TRUE FALSE Encourage
Product
Product ProduckKey EnglishProductName TRUE TRUE FALSE Discourage
Sub Category ProductSubCategoryKey EnglishProductSubCategoryName TRUE TRUE FALSE Encourage
Category CategoryKey EnglishProductCategoryName TRUE TRUE FALSE Encourage
Model Name ModelNameAssembly TRUE TRUE FALSE Encourage
Customer
Customer CustomerKey FullName TRUE TRUE TRUE Discourage
CommuteDistance CommuteDistance TRUE TRUE TRUE Encourage
Education EnglishEducation TRUE TRUE TRUE Encourage
Gender Gender TRUE TRUE TRUE Encourage
Home Owner HouseOwnerFlag TRUE TRUE TRUE Encourage
Marital Status MaritalStatus TRUE TRUE TRUE Encourage
Number of Children NumberChildrenAtHome TRUE TRUE TRUE Encourage
Occupation EnglishOccupation TRUE TRUE TRUE Encourage
Phone Phone FALSE FALSE TRUE Discourage
Email EmailAddress FALSE FALSE TRUE Discourage
Birth Date BirthDate FALSE FALSE FALSE Discourage
Yearly Income YearlyIncome TRUE TRUE TRUE Encourage
Reseller
Reseller ResellerKey ResellerName TRUE TRUE FALSE Discourage
Business Type BusinessType TRUE TRUE FALSE Encourage
Geography Key GeographyKey FALSE TRUE FALSE Discourage
Sales Territory
Region SalesTerritoryKey SalesTerritoryRegion TRUE TRUE FALSE Discourage
Group SalesTerritoryGroup TRUE TRUE FALSE Encourage
Country SalesTerritoryCountry TRUE TRUE FALSE Encourage
Employee
Employee Key EmployeeKey FullName TRUE TRUE FALSE Discourage
Parent Employee ParentEmployeeKey TRUE TRUE TRUE Discourage
Department DepartmentName TRUE TRUE TRUE Encourage
Title Title TRUE TRUE TRUE Encourage
Reseller Sales Order Details
Reseller Sales Order SalesOrderNumber + SalesOrderLineNumber SalesOrderNumber TRUE TRUE TRUE Encourage
Internet Sales Order Details
Internet Sales Order SalesOrderNumber + SalesOrderLineNumber SalesOrderNumber TRUE TRUE TRUE Encourage

3. To add an attribute to a Dimension, drag the Key column from the Data Source View pane and drop it in the Attributes pane.

4. To change the Name Column of an attribute, click on the attribute and then edit its NameColumn property. Select the alternate column as the new Name Column.

5. Optionally change the other properties like AttributeHierarchyEnabled, AttributeHierarchyOptimized etc. as per the table above.

Task 4: Designing Attribute Relationships and User Defined Hierarchies

1. In the previous task, when we added attributes to the Dimension, a default set of Attribute Relationships would’ve been automatically created by SSAS. The default set of attribute relationships would link every attribute of the dimension to its key attribute. For example, in the figure below, all the attributes of the Geography dimension are linked to its key attribute which is Geography Key. While there is nothing wrong in this set of relationships, it is not the most optimized. Creating proper attribute relationships is a key design factor to improve and optimize the performance of your OLAP solutions.

2. Since we can deduce that every Geography Key can be linked to a Postal Code and every Postal Code can have only one City and every City can have only one State and every State can have only one Country, we need to modify the default set of attribute relationships to reflect it. The new set of attribute relationships should be as shown below

3. To design a new set of attribute relationships, make sure you are in the Attribute Relationships tab of the Dimension designer. Delete all the current relationships from the Attribute Relationships panel.

4. Right click on the Key Attribute and select the New Attribute Relationship option.

5. In the Create Attribute Relationship dialog, select the relevant Related Attribute and click OK.

6. Then using this Related attribute as the Source, create another relationship to the next attribute in the chain.

7. The attribute relationships for each of the Dimensions in the OLAP solution should be follows –

Dimension Attribute Relationships
Promotion
Product
Customer
Country
Reseller
Sales Territory
Employee
Date

8. By default all attribute relationships created by SSAS are flexible relationships. In the attribute relationship diagram, a flexible relationship is depicted using a hollow arrow, whereas a Rigid relationship is depicted using a solid black arrow as shown below. A Rigid relationship is a relationship that will not change over a period of time. For example, a particular City will always remain under a State and is never likely to change (other than in some rare and extreme circumstances). Also a given Month will always fall in a particular Quarter. Such relationships between attributes can be changed to Rigid. Rigid relationships are always better than flexible relationships and should be used wherever applicable. Review the above table to see which relationships are Rigid and which ones are flexible.

9. After creating attribute relationships, the next thing to concentrate on are User Defined Hierarchies. Just like attribute relationships, it is not mandatory to create UDHs, but it is strongly recommended for a number of reasons. Most of the time your UDHs will match the attribute relationships, however that is not necessary. When a UDH matches an attribute relationship, it is called a natural user defined hierarchy. To create a UDH, go to the Dimension Structure tab of the Dimension designer. In the Hierarchies pane, drag and drop the topmost level of the UDH. Drag and drop the other subsequent levels accordingly. For example, in the figure below, a UDH has been created for the Geography dimension. The name of the UDH is Geography. The first level is Country; State-Province is at the second level etc.

10. Each dimension should have UDHs as shown below –

Dimension User Defined Hierarchies
Promotion
Product
Customer. Note the warning reported by BIDS. It is warning you that the UDH is unnatural. While a natural hierarchy is always recommended, there can be situations where you may create an unnatural hierarchy as well, as in this case.
Geography
Reseller
Sales Territory
Employee
Date. There is no limit to the number of UDHs that can be created.

Task 5: Designing Measures

1. Open the AdventureWorks cube.

  1. In the “Dimensions” pane in the lower left corner, right click on the “Adventure Works” node and select the “Add cube dimension” option. Shift select all the Dimensions and click “Ok”.
  1. Determine the measures that will be required by reviewing the interview transcripts. The keywords highlighted in green indicate the measures that will be required in this OLAP solution. One of the measures – average revenue per customer will need to be defined as a Calculated Measure.
  2. In the “Measures” pane in the upper left corner of the cube designer, right click and select the “New Measure” option. Add the following measures as shown in the table below –
Measure Name Fact Table Aggregation Function
Internet Sales Amount FactInternetSales Sum
Internet Sales Quantity FactInternetSales Sum
Internet Transaction Count FactInternetSales Count
Number of Customers FactInternetSales Distint Count
Reseller Sales Amount FactResellerSales Sum
Reseller Sales Quantity FactResellerSales Sum
Reseller Transaction Count FactResellerSales Count
Total Sales Amount FactSalesSummary Sum
Total Sales Quantity FactSalesSummary Sum
Sales Amount Quota FactSalesQuota Sum

5. The Measures pane should appear as shown below –

6. To create a calculated measure, go to the Calculations tab in the cube designer and click on the New Calculated Member button.

7. Create a new calculated member named Avg Revenue Per Customer and add it to the Measures Hierarchy. The formula for the member should be Internet Sales Amount divided by the Number of Customers. Pay attention to the Non-empty behavior property.

Task 6: Dimension Usage

1. The Dimension Usage tab in the cube designer can be used to review the relationships that exist between dimensions and measures (via measure groups). For example, from the figure below we can see that currently the Geography dimension is not related to any measure group and hence any measure. So we will never be able to effectively slice any measure by the any attribute of the Geography dimension. This dimension is currently practically useless. The relationships that have already been created are Regular Relationships. These relationships are automatically created by BIDS on the basis of the primary-foreign key constraints existing in the tables that were used in the Data Source View. A couple of the relationships (highlighted in the figure below) are Fact Relationships. Fact Relationships are automatically created by BIDS whenever the same table is used as a dimension as well as a fact table. For example, the FactInternetSales table is a fact table, because the Internet Sales Amount, Internet Order Quantity etc. measures are coming from it. The same table is also a dimension table because the Internet Sales Order Details Dimension is based on it. In OLAP terminology, such a dimension would be known as a degenerate dimension.

2. Coming back to the Geography dimension, we would like to use it to slice measures coming from the Reseller Sales measure group. Since there is no common column between the FactResellerSales table and the DimGeography table, BIDS has not been to create a Regular Relationship. However we can see from the DSV that the FactResellerSales table is related to the DimReseller table, which in turn is related to the DimGeography table. Hence an indirect (or referenced) relationship can exist between FactResellerSales and DimGeography.

3. To create a new referenced relationship, click on the cell intersecting the Reseller Sales measure group and the Geography Dimension. In the Define Realtionship dialog, select Referenced as the Relationship type and Reseller as the Intermediate Dimension. Geography Key is the common attribute in Reference (Geography) as well as the Intermediate (Reseller) dimension.

4. The referenced relationship is created as follows –

5. The cube and the OLAP solution are now complete as per the requirements.

As a next step, you can refer to this guide on optimizing an SSAS solution using Partitions, Proactive Caching and Aggregations.

%d bloggers like this: