Setting up Search Infrastructure – Part III


Some of the content appearing in these posts is taken from the SharePoint 2010 Search Evaluation Guide which can be downloaded from here.

 

This post covers the following –

  • Creating Metadata Properties
  • Search Reports
  • Creating Keywords, Definitions and Best Bets
  • Creating Search Scopes

 

Part I of this series is available here

Part II of this series is available here

 

Creating Metadata Properties

Crawled properties represent the metadata for content that is indexed. Typically, crawled properties include column data for SharePoint list items, document properties for Microsoft Office or other binary file types, and HTML metadata in Web pages. Administrators map crawled properties to managed properties in order to provide useful search experiences. For example, an administrator might create a managed property named Client that maps to various crawled properties called Customer, Client, and Customer from different content repositories. Managed properties can then be used across enterprise search solutions, such as in defining search scopes and in applying query filters.

In this procedure you will create a custom column. You will then crawl the lists so that their columns are indexed, and then you will create a managed metadata property that maps to columns in the lists.

  1. Browse to your SharePoint site. Navigate to any existing list and create a new column called Technology in it. Edit the properties of some of the list items so that they contain a value in the newly added column.

  1. Go to the Search Service Application and start a Full Crawl on the Local SharePoint Sites content source. You need not wait until the crawl completes.

     

  2. Go to the Search Center and run a search query as follows – “<column name>:<value>” (for example – Technology:CRM). You should notice that there are no search results returned. This is because either the content source has not yet been crawled or the crawled property has not yet been mapped to a Managed Property.
  3. On the Quick Launch of the Search Service Application, in the Queries and Results section, click Metadata Properties | New Managed Property.
  4. In the Property Name text box, type Technology.
  5. Click Add Mapping.
  6. In the Select a category drop-down list, ensure that All categories is selected. In the Crawled property name box, type ows_Technology, and then click Find.
  7. Click the ows_Technology(Text) property in the search results, and then click OK.
  8. Check the Allow this property to be used in scopes check box. Click OK.
  9. Start a Full crawl of the Local SharePoint Sites content source. Wait until the crawl completes. It should take about 2-3 minutes.
  10. Navigate to the Search Center and re-run the search query. This time you should see matching items in the Search Results.

 

 

Search Reports

The following step-by-step instructions will help you get started working with search reports.

Running Administration Reports

  1. On the Quick Launch of the Search Service Application, in the Reports section, click Administration Reports.
  2. Click Search administration reports.
  3. Click each of the reports to review the information contained.
  4. On the Quick Launch, in the Reports section, click Web Analytics Reports.
  5. Click each of the links on the Quick Launch to view the different reports.

 

Creating Keywords, Definitions, Best Bets, and Synonyms

Best Bets are URLs to documents that are associated with one or more keywords. Typically these documents or sites are ones that you expect users will want to see at the top of the search results list. Best Bets are returned by queries that include the associated keywords, regardless of whether the URL has been indexed. Site collection administrators can create keywords and associate Best Bets with them.

Synonyms are words that mean the same thing as other words. For example, you might consider laptop and notebook to mean the same thing. Administrators can create synonyms for keywords that information workers are likely to search for in their organization. Additionally, synonyms that can be used to improve recall of relevant documents are stored in thesaurus files.

  1. Browse to the Search Center site. On the Site Actions menu.
  2. In the Site Collection Administration section, click Search keywords.
  3. Click Add Keyword.
  4. In the Keyword Phrase text box, type SharePoint.
  5. In the Synonyms text box, type SharePoint Foundation; SharePoint Server; Windows SharePoint Services.
  6. Click Add Best Bet.
  7. In the URL text box, type http://www.microsoft.com/sharepoint.
  8. In the Title text box, type SharePoint on the Web.
  9. In the Description text box, type SharePoint home page on http://www.microsoft.com.
  10. Click OK.
  11. Click Add Best Bet.
  12. In the URL text box, type http://msdn.microsoft.com/sharepoint.
  13. In the Title text box, type SharePoint Developer.
  14. In the Description text box, type SharePoint home page on MSDN.
  15. Click OK.
  16. In the Keyword Definition text box, type Collaboration and Search Platform.
  17. Click OK.

 

Creating Search Scopes

  1. Browse to the Search Center site. On the Site Actions menu, click Site Settings.
  2. In the Site Collection Administration section, click Search scopes.
  3. Click New Scope.
  4. In the Title text box, type File System. In the Display Groups section, check all check boxes. Click OK.
  5. In the Search Dropdown section, next to File System, click Add rules.

  1. In the Scope Rule Type section, click Web Address.
  2. In the Host Name textbox, specify your unc path (Example : \\win2k8\Documents). Click OK.
    You may be notified that the scope will be updated in a few minutes. If so, either wait the required number of minutes and then continue at step 18, or perform steps 13 through 17 and then continue at step 18.
  3. In Central Administration, go to the Search Service Application | Search Administration.
  4. In the System Status section, next to Scopes needing update, click Start update now.

  1. Switch back to your Search Center. On the Site Actions menu, click Edit Page.
  2. Edit the Search Box Web part.In the properties of the Web Part, expand the Scopes Dropdown section.
  3. In the Dropdown mode list, click Show scopes dropdown.
  4. Click OK.
  5. On the ribbon, click Save. Note that the scopes drop-down list appears, and that your new File System scope is included in the list.

 

Setting up Search Infrastructure – Part II


Some of the content appearing in these posts is taken from the SharePoint 2010 Search Evaluation Guide which can be downloaded from here.

 

This post covers the following –

  • Creating Authoritative Pages
  • Creating Federated Locations

 

Part I of this series is available here

Queries and Results Settings

The following step-by-step instructions will help you get started working with queries and results settings.

Creating Authoritative Pages

Many ingredients go in to the FAST Search for SharePoint 2010 Search Engine algorithm. They include: Contextual Relevance, Metadata Extraction, Automatic Language Detection, File Type Biasing, Click Distance, Anchor Text, URL Depth and URL Matching. When a user enters a search term into your SharePoint 2010 Search box and clicks search they are presented with a results page. A LOT goes into turning that innocent click into highly relevant results. The SharePoint 2010 Search Engine delivers highly relevant results because it has a robust search algorithm which decides how to rank the results. The search algorithm determines if a particular result (link) is on page 1 position 1 or on page 17, position 3. It’s the search algorithm that takes Contextual Relevance, Metadata Extraction, Automatic Language Detection, File Type Biasing, Click Distance, Anchor Text, URL Depth and URL Matching all into account in deciding how results rank.

Although it has served the Search Engine world well to NOT trust humans, the SharePoint 2010 Search allows you to at least influence one of the ingredients which help determine a pages rank in the form of Authoritative Pages. Authoritative Pages fall under the “Click Distance” ingredient. An Authoritative Page is a page which you have declared as, well…, somehow better than the rest. You can actually have as many Authoritative Pages as you need and at different levels of Authority. You are essentially saying that a page should be considered a better match for any given search term that qualifies it as a result candidate. Keep in mind this is merely one ingredient (or category) which is carefully scrutinized on the algorithm used for ranking results. Declaring an Authoritative Page does not guarantee it will rank well for every search term used (and it shouldn’t).

  1. In the Search Center, enter a search query (for example: SharePoint Deployment). Notice the position of a document in the search results.

  1. Copy the link to the document.
  2. In Central Administration, on the Quick Launch section of the FAST Query Search Service Application, in the Queries and Results section, click Authoritative Pages

  1. Add a new line and URL in the Most authoritative pages box.
  2. Add a new line and paste the URL in the Most Authoritative pages box and ensure that the Refresh Now checkbox is enabled. Click OK
  3. In the System Status section of the FAST Query SSA, you will see the value Computing Ranking displayed in the Background activity label. Wait for a minute or so for new rankings to be computed. When the computation is done, the value in the label will change to None.

  1. In Search Center, give the same query again and note the difference in the rank of the page in the result.

 

 

Creating Federated Locations

Federation is the concept of retrieving search results from multiple search providers, based on a single query performed by an information worker. For example, your organization might include federation with Bing.com so that results are returned by SharePoint Server and Bing.com for a given query.

  1. On the Quick Launch of the Search Service Application, in the Queries and Results section, click Federated Locations.
  2. Click Import Location and browse to the YouTube.FLD (Federated location definition) file and click OK.
  3. Once the FLD file is successfully imported, click Edit Location.
  4. In the Edit Federated Location Page, notice that the information from the FLD file has been extracted. Go to the Trigger section. Since we want to federate the search to YouTube only if the search query matches the following pattern “video Harley Davidson”, enable the Prefix radio button and enter video in the textbox.

  1. Browse to your Search Center site. In the search box, type a search term and press [ENTER] to get the Search Results page.
  2. Edit the Search Results Page. In the Right Zone, add the Federated Results web part from the Search category.

  1. Edit the Web Part and in the properties pane for the Web Part, in the Location section drop-down list, click YouTube, and then click OK.
  2. On the ribbon, click Save and Close.
  3. Enter a search query with the video prefix and you should see results from YouTube appearing in your search results.

Setting up Search Infrastructure – Part I


Some of the content appearing in these posts is taken from the SharePoint 2010 Search Evaluation Guide which can be downloaded from here.

 

This post covers the following –

  • Creating Enterprise Search Centers
  • Creating Content Sources
  • Creating Crawl Rules

 

The enterprise search features provided by SharePoint Server 2010 can be administered at the site collection level and at the Search service application level. The following sections provide step-by-step instructions for working with various aspects of enterprise search in SharePoint Server 2010. Administrators can use the Search Administration pages to manage search settings that affect all Web applications that consume the search service. Administrators will typically start here when configuring the search system. The main day-to-day operations include creating content sources, configuring crawler settings, configuring settings to improve relevance for those content sources, adding federated content repositories, and working with search reports. The following step-lists provide instructions for performing common operations in all of these scenarios.

 

Creating Enterprise Search Centers

Search Center is a site based on the Search Center site template. It provides a focused user interface that enables information workers to run queries and work with search results.

The following procedure creates a Search Center at the root Web for a site collection. This is the generally recommended approach and architecture for creating Search Center sites with SharePoint Server 2010.

  1. Click Start>All Programs>Microsoft SharePoint 2010 Products>SharePoint 2010 Central Administration.
  2. In the Application Management group, click on the Create Site Collections link.
  3. Create a new site collection in the web application of your choice. In the Title text box, type Search Center. In the Description text box, type Enterprise Search Center for SharePoint 2010.
  4. In the Web Site Address section, select /sites/ in the drop-down list, and then type search in the text box. In the Template Selection section, click the Enterprise tab. Click FAST Search Center.
  5. In the Primary Site Collection Administrator section, type your name in the text box, and then click Check Names. Click OK.
    After a short period of time, the site collection is created and the Top-Level Site Successfully Created page appears.
  6. Click the hyperlink to the new site collection to start exploring the Search Center.

 

Creating Content Sources

Content sources are definitions of systems that will be crawled and indexed. For example, administrators can create content sources to represent shared network folders, SharePoint sites, other Web sites, Exchange public folders, third-party applications, databases, and so on.

  1. Start SharePoint 2010 Central Administration.
  2. In the Application Management Section, click Manage service applications |
    FAST Content.
  3. On the Quick Launch, in the Crawling section, click Content Sources.
  4. Click New Content Source.
  5. In case you do not see a content source for SharePoint Sites already created, create it before proceeding to the next step.

  1. Create a new content Source named Documents to point to a File Share on your machine which contains a bunch of Documents. Optionally create a crawl schedule while defining this Content source.

    Note : You will need to specify a path using UNC naming conventions and may need to share the folder before you can specify the path.

  2. After the Content Source has been created, start a Full Crawl on it.


 

Creating Crawl Rules

Crawl rules specify how crawlers retrieve content to be indexed from content repositories. For example, a crawl rule might specify that specific file types are to be excluded from a crawl, or might specify that a specific user account is to be used to crawl a given range of URLs.

Crawl schedules specify the frequency and dates/times for crawling content repositories. Administrators create crawl schedules so that they do not have to start all crawl processes manually.

A crawler impact rule governs the load that the crawler places on source systems when it crawls the content in those source systems. For example, one crawler impact rule might specify that a specific content repositories that is not used heavily by information workers should be crawled by requesting 64 documents simultaneously, whereas another crawler impact rule might specify less aggressive crawl characteristics for systems that are constantly in use by information workers.

  1. On the Quick Launch of the FAST Content Service Application, in the Crawling section, click Crawl Rules.
  2. Click New Crawl Rule.
  3. Specify the path file://<<machinename>>/<<sharename>> of the content source you created earlier. Include all items in this path. Since the default content access account may not have adequate permissions to access the file share, use the Specify a different content access account option in the Specify Authentication section to specify credentials that have read access to the content source. Close the page.

     

  1. Start a crawl of the content source to make sure there are no errors in the crawl rule.
  2. Navigate to the Search Center website and enter a search query to make sure that content from the file system is appearing in the search results.

FAST Search for SharePoint 2010 – Creating a custom property extractor


  1. Open Visual Studio and create a Custom Property Extraction Dictionary. A property extraction dictionary is an xml file and defines which words will be searched for in the indexed items and indexed in the associated managed property. Each entry has a key and an optional value. The key is the string that must be present in the item. Depending on the type of property extractor (whole words or part words), the matching of the key can be case-sensitive or case-insensitive. Note: A key should not contain any apostrophes. If it does, the term will never be matched.

  1. Save the file as technologies.xml. Ensure there are no spaces or new lines after the closing dictionary tag, or the dictionary will generate an error. The custom dictionary must be saved in UTF-8 format without a byte order mark (BOM). In Visual Studio, click on File, Advanced Save Options and select Unicode (UTF-8 without signature) option for Encoding.

  1. Take the technologies.xml, rename it to wholewords_extraction1.xml.
  2. Navigate to the FASTSearch\components\resourcestore\dictionaries\matching and replace the original wholewords_extraction1.xml with the file you just created.
  3. Use either the FASTQuery service application in Central Administration or PowerShell to create a new managed property called technology and link it to a crawled property called wholewords1. If you want to use PowerShell, you can use the following script –

$c = Get-FASTSearchMetadataCrawledProperty –name wholewords1

$m = New-FASTSearchMetadataManagedProperty –name technology –type 1

$m.RefinementEnabled=1

Set-FASTSearchMetaDataManagedProperty –ManagedProperty $m

New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $m –CrawledProperty $c

 

  1. Activate the wholewords1 property extractor by editing the FASTSearch\etc\config_data\DocumentProcessor\optionalprocessing.xml file. Scroll down in the file and activate the wholewordsextractor1 processor.

  1. Restart the document processor by executing the following command –

psctrl reset

  1. Go to the FAST Search Connector Service application and start a Full crawl on the relevant content sources. While the full crawl is in progress, you can proceed to the next stage.
  2. Go to the search center and the search results page. Configure the search results page to enable refinement on your new managed property. Switch the page in Edit Mode, by clicking on Site Actions, Edit Page.
  3. Locate the Refinement Panel webpart and edit its properties.

  1. In the EditorZone that pops up on the right hand side of the page, expand the Refinement category and click on the Ellipse button next to Filter Category Definition.

  1. Copy the entire xml contents into an xml editor like Visual Studio.NET 2010 or SharePoint Designer.
  2. Add the following xml to the file –

     

    <Category Title=”Technologies” Description=”” Type=”Microsoft.Office.Server.Search.WebControls.ManagedPropertyFilterGenerator” MetadataThreshold=”3″ NumberOfFiltersToDisplay=”10″ MaxNumberOfFilters=”20″ ShowMoreLink=”True” MappedProperty=”technology” MoreLinkText=”show more” LessLinkText=”show fewer” ShowCounts=”Count” />

     

     

  3. Copy the modified xml contents and paste them back into Filter Category Definition property. Click OK to close the EditorZone. Save and Close the page.

Note: Ensure that the Use Default Configuration checkbox is not enabled. If you leave it enabled, the modifications made to the Filter category definition will be lost when you click OK.

  1. If the crawl has completed, proceed to test the custom property extraction. Search for a keyword like deployment and you should see the presence of a new refiner called Technologies in the refinement panel.

 

  1. In this exercise, we made use of a built-in crawled property called wholewords1 as well as a built-in property extractor. In the next exercise, we will create a custom crawled property as well as create a custom property extractor instead of using the built-in ones.

 

  1. Upload the custom property extraction dictionary to the FAST Search Server 2010 for SharePoint resource store by using the Windows PowerShell command Add-FASTSearchResource or by just copy pasting it to the FASTSearch\components\resourcestore\dictionaries\matching folder.

    Add-FASTSearchResource -FilePath c:\temp\technologies.xml -Path dictionaries\matching\technologies.xml

     

  2. Then we need to configure the custom property extraction item processing stage thru an xml configuration file named CustomPropertyExtractors.xml. The file should be present in the FASTSearch\etc\config_data\DocumentProcessor folder. If this is the first time you are creating a custom property extractor, then you will need to create a new file or else add to the already existing file.
  3. Create a new file or open the already existing file CustomPropertyExtractors.xml.
  4. Add the following xml to the file. Note that the property value should be the name of the crawled property (which will be created subsequently) and dictionary name should be the name of the xml file you copied to the resourcestore folder in the earlier step.

<?xml version=”1.0″ encoding=”utf-8″?>

<extractors>

    <extractor name=”Technology terms” type=”Verbatim” property=”mytechnologyterms“>

        <dictionary name=”technologies” yield-values=”yes”/>

    </extractor>

</extractors>

 

  1. Delete the previously created managed property – technology from either Central Administration UI or PowerShell. To delete the property via PowerShell, use the following commands –

$m = Get-FASTSearchMetadataManagedProperty –ManagedProperty technology

Remove-FASTSearchMetadataManagedProperty –ManagedProperty $m

 

  1. On the administration server, type the following command:

    psctrl reset

    This resets all currently running item processors in the system, and activates the new item processing configuration.

     

  2. To use the extracted data in queries or query refinement, you must create the crawled property and map it to a managed property within the index schema. All extracted crawled properties must be in the crawled property category named MESG Linguistics with property set value 48385c54-cdfc-4e84-8117-c95b3cf8911c and Variant Type 31.

$cp = New-FASTSearchMetadataCrawledProperty -Name mytechnologyterms -Propset 48385c54-cdfc-4e84-8117-c95b3cf8911c -VariantType 31

$mp = New-FASTSearchMetadataManagedProperty -Name technology –type 1

$mp.StemmingEnabled=0

$mp.RefinementEnabled=1

$mp.MergeCrawledProperties=1

$mp.Update()

New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

 

  1. Start a full crawl of all the relevant content sources from the FAST Search Connector Service Application.
  2. If you have already completed the previous exercise, then as long as the name of the managed property created in both the exercises is the same (technology), you don’t need to configure the Refinement Panel webpart. Wait for the crawls to complete.
  3. Test the search results and you should see the presence of a refiner called Technologies in the refinement panel.

     

Restricting Search Results to contain only List Items / Documents


A very common request that comes up every now and then is to omit views, sites, lists/libraries etc. from Search results and only show list items/documents.

For example, in my test environment, if I search for a term “SharePoint“, I get 23 search results. Some of the items in the search results are site names or view names as shown below –

While not questioning the value or validity of these items in the search results, sometimes it may be necessary to make sure only documents or list items are included. So if I re-submit my search for SharePoint, but this time by suffixing a property called isDocument, I get only 14 items returned in the results. All those 14 items are documents. I have a list item in the Task and Announcements list which contains the word SharePoint in its Title, but those have been omitted as well.

If I alter my query to include a property called contentclass having a value starting with STS_ListItem, I get back 16 items. Two additional list items (one from the Announcements and the other from the Tasks list) are now present in the search results.

So achieving this was pretty simple, but how do you know what properties to include in your search and what values to assign to them. This is where you need to inspect the raw xml returned by the Query Service. The raw xml gives us a wealth of information related to the search results. From the XML I can see a list of managed properties returned and this is how I can take advantage of them to limit the information returned from search.

In case you are interested in knowing how to get search results information in raw xml format, refer to my blog post – Viewing Search Results in raw XML Format.

If you are not comfortable telling your end-users to always include a property in their search queries, you could add the property name:value as an additional query term.

  1. To accomplish this, edit the Search Box webpart.
  2. Under the Query Text Box section, specify the property name:value in the Additional query terms property. Uncheck the Append Additional terms to the query checkbox to hide this additional query term from the end-user by not having it echoed in the Search Box when the page is refreshed.

  1. Save and Close the page. Now when I search for SharePoint, even without specifying any other search limiter, only documents matching the search term are displayed in the results.

     

Viewing Search Results in raw XML Format


When you use the FAST Search for SharePoint 2010 or the Enterprise Search Service in SharePoint, the default way in which search results are rendered is thru the Core Results Webpart. This webpart gets an XML document as input from the Search Service and uses an XSLT style sheet to customize their appearance before displaying them. A lot of requirements related to changing the way in which search results are rendered, can be accomplished by modifying the default XSLT used by the web part.

For example, in my test environment I have the same image uploaded in a Picture Library as well as a Document Library. When I search for a term which causes both of these images to be returned in the search results, you will notice that the item returned from the Picture Library is displayed using a Thumbnail, whereas the item returned from the Document Library is rendered differently.

To explore why these two items are rendered differently in the search results, we can edit the properties of the Search Core Results Web Part and inspect the XSLT.

Under the Core Results category in the Editor Pane, expand the Display Properties section and click on XSL Editor. You will need to clear the Use Location Visualization checkbox.

The XSLT is almost 720 lines long and it would be better to use some XML Editor to understand it. Somewhere in the XSLT is a condition which checks if the current item is coming from a Picture Library and if there is a thumbnail present. If yes, then it proceeds to render the thumbnail or a generic image.

Not only is this XSLT useful for altering the way in which search results are rendered, it is also useful for debugging and understanding better how search results are actually returned.

For debugging purposes, it is better to view the raw XML returned by the Query Service. So typically in my development environment, I always recommend developers to look at the raw XML returned for accomplishing a variety of tasks related to Search.

  1. Assuming you have a site created using the Enterprise Search Center or FAST Search Center template, go to the search center site and create a new publishing page.

  1. Give XML Results as the Title and XMLResults.aspx as the URL Name. Select Search Results as the Page Layout. Click Create.

  1. Save and Close the newly created page.
  2. Edit the default.aspx page in the Search Center. Click Add New Tab.

  1. Specify Tab Name as XML and XMLResults.aspx (or whatever page you created in the previous steps) as the Page. Click Save.
  2. Click on the newly created XML tab.

 

  1. You should be redirected to the XMLResults.aspx page. Edit this page and repeat the previous step to add the same XML Tab on this page as well.
  2. To ensure that when we type a search query in the XML tab, the search results are displayed in the XML tab and not in the All Sites tab, edit the Search Webpart.

  1. Expand the Miscellaneous category and type XMLResults.aspx in the Target Search results page URL textbox. Click OK.

  1. While the page is still in Edit mode, edit the properties of the Search Core Results web part as well.

  1. Expand Display Properties and click on XSL Editor. You may need to disable the Use Location Visualization checkbox to be able to click on the XSL Editor Button.
    Delete the original XSLT and replace it with the following and then click OK.

<?xml version=”1.0″ encoding=”UTF-8″?>

<xsl:stylesheet version=”1.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform”&gt;

<xsl:output method=”xml” version=”1.0″ encoding=”UTF-8″ indent=”yes”/>

<xsl:template match=”/”>

<xmp><xsl:copy-of select=”*”/></xmp>

</xsl:template>

</xsl:stylesheet>

 

  1. Save and Close the page.
  2. Type a search query in the Search box of the XML Tab. This time you should see search results in their raw XML format.

%d bloggers like this: