FAST Search for SharePoint 2010 – Creating a custom property extractor


  1. Open Visual Studio and create a Custom Property Extraction Dictionary. A property extraction dictionary is an xml file and defines which words will be searched for in the indexed items and indexed in the associated managed property. Each entry has a key and an optional value. The key is the string that must be present in the item. Depending on the type of property extractor (whole words or part words), the matching of the key can be case-sensitive or case-insensitive. Note: A key should not contain any apostrophes. If it does, the term will never be matched.

  1. Save the file as technologies.xml. Ensure there are no spaces or new lines after the closing dictionary tag, or the dictionary will generate an error. The custom dictionary must be saved in UTF-8 format without a byte order mark (BOM). In Visual Studio, click on File, Advanced Save Options and select Unicode (UTF-8 without signature) option for Encoding.

  1. Take the technologies.xml, rename it to wholewords_extraction1.xml.
  2. Navigate to the FASTSearch\components\resourcestore\dictionaries\matching and replace the original wholewords_extraction1.xml with the file you just created.
  3. Use either the FASTQuery service application in Central Administration or PowerShell to create a new managed property called technology and link it to a crawled property called wholewords1. If you want to use PowerShell, you can use the following script –

$c = Get-FASTSearchMetadataCrawledProperty –name wholewords1

$m = New-FASTSearchMetadataManagedProperty –name technology –type 1

$m.RefinementEnabled=1

Set-FASTSearchMetaDataManagedProperty –ManagedProperty $m

New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $m –CrawledProperty $c

 

  1. Activate the wholewords1 property extractor by editing the FASTSearch\etc\config_data\DocumentProcessor\optionalprocessing.xml file. Scroll down in the file and activate the wholewordsextractor1 processor.

  1. Restart the document processor by executing the following command –

psctrl reset

  1. Go to the FAST Search Connector Service application and start a Full crawl on the relevant content sources. While the full crawl is in progress, you can proceed to the next stage.
  2. Go to the search center and the search results page. Configure the search results page to enable refinement on your new managed property. Switch the page in Edit Mode, by clicking on Site Actions, Edit Page.
  3. Locate the Refinement Panel webpart and edit its properties.

  1. In the EditorZone that pops up on the right hand side of the page, expand the Refinement category and click on the Ellipse button next to Filter Category Definition.

  1. Copy the entire xml contents into an xml editor like Visual Studio.NET 2010 or SharePoint Designer.
  2. Add the following xml to the file –

     

    <Category Title=”Technologies” Description=”” Type=”Microsoft.Office.Server.Search.WebControls.ManagedPropertyFilterGenerator” MetadataThreshold=”3″ NumberOfFiltersToDisplay=”10″ MaxNumberOfFilters=”20″ ShowMoreLink=”True” MappedProperty=”technology” MoreLinkText=”show more” LessLinkText=”show fewer” ShowCounts=”Count” />

     

     

  3. Copy the modified xml contents and paste them back into Filter Category Definition property. Click OK to close the EditorZone. Save and Close the page.

Note: Ensure that the Use Default Configuration checkbox is not enabled. If you leave it enabled, the modifications made to the Filter category definition will be lost when you click OK.

  1. If the crawl has completed, proceed to test the custom property extraction. Search for a keyword like deployment and you should see the presence of a new refiner called Technologies in the refinement panel.

 

  1. In this exercise, we made use of a built-in crawled property called wholewords1 as well as a built-in property extractor. In the next exercise, we will create a custom crawled property as well as create a custom property extractor instead of using the built-in ones.

 

  1. Upload the custom property extraction dictionary to the FAST Search Server 2010 for SharePoint resource store by using the Windows PowerShell command Add-FASTSearchResource or by just copy pasting it to the FASTSearch\components\resourcestore\dictionaries\matching folder.

    Add-FASTSearchResource -FilePath c:\temp\technologies.xml -Path dictionaries\matching\technologies.xml

     

  2. Then we need to configure the custom property extraction item processing stage thru an xml configuration file named CustomPropertyExtractors.xml. The file should be present in the FASTSearch\etc\config_data\DocumentProcessor folder. If this is the first time you are creating a custom property extractor, then you will need to create a new file or else add to the already existing file.
  3. Create a new file or open the already existing file CustomPropertyExtractors.xml.
  4. Add the following xml to the file. Note that the property value should be the name of the crawled property (which will be created subsequently) and dictionary name should be the name of the xml file you copied to the resourcestore folder in the earlier step.

<?xml version=”1.0″ encoding=”utf-8″?>

<extractors>

    <extractor name=”Technology terms” type=”Verbatim” property=”mytechnologyterms“>

        <dictionary name=”technologies” yield-values=”yes”/>

    </extractor>

</extractors>

 

  1. Delete the previously created managed property – technology from either Central Administration UI or PowerShell. To delete the property via PowerShell, use the following commands –

$m = Get-FASTSearchMetadataManagedProperty –ManagedProperty technology

Remove-FASTSearchMetadataManagedProperty –ManagedProperty $m

 

  1. On the administration server, type the following command:

    psctrl reset

    This resets all currently running item processors in the system, and activates the new item processing configuration.

     

  2. To use the extracted data in queries or query refinement, you must create the crawled property and map it to a managed property within the index schema. All extracted crawled properties must be in the crawled property category named MESG Linguistics with property set value 48385c54-cdfc-4e84-8117-c95b3cf8911c and Variant Type 31.

$cp = New-FASTSearchMetadataCrawledProperty -Name mytechnologyterms -Propset 48385c54-cdfc-4e84-8117-c95b3cf8911c -VariantType 31

$mp = New-FASTSearchMetadataManagedProperty -Name technology –type 1

$mp.StemmingEnabled=0

$mp.RefinementEnabled=1

$mp.MergeCrawledProperties=1

$mp.Update()

New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

 

  1. Start a full crawl of all the relevant content sources from the FAST Search Connector Service Application.
  2. If you have already completed the previous exercise, then as long as the name of the managed property created in both the exercises is the same (technology), you don’t need to configure the Refinement Panel webpart. Wait for the crawls to complete.
  3. Test the search results and you should see the presence of a refiner called Technologies in the refinement panel.

     

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: