Here is an overview of how I recently implemented search for a web site built using Sitecore. I did not have the option to design the site templates from scratch. Instead, I inherited a messy template inheritance structure with some inconsistencies in the design. I also had no budget, so Coveo was not an option.
Index Creation
I started by creating a custom index to use for search. I didn’t like the idea of tacking a large number of computed fields onto the default Sitecore indexes.
The search index needed to include items from the entire content tree as well as the media library. I added two crawler location definitions.
< locations hint ="list:AddCrawler" >
< crawler type ="Sitecore.ContentSearch.ExcludeItemCrawler, XL.Website" >
< Database >master< /Database >
< Root >/sitecore/content< /Root >
< /crawler >
< /locations >
< locations hint ="list:AddCrawler" >
< crawler type ="Sitecore.ContentSearch.ExcludeItemCrawler, XL.Website" >
< Database >master< /Database >
< Root >/sitecore/media library< /Root >
< /crawler >
< /locations >
In the index configuration, I defined all of the templates that I needed to include in the index.
< include hint="list:IncludeTemplate" >
< Product >{272C2195-AFE6-47CC-9707-BC8FB1909BE4}< /Product >
< Article >{ABAEACC9-AC67-4A0D-B0DE-54982D2D3246}< /Article >
...
< /include >
I also defined any common fields that I would need in the index to search on, as well as display in the UI.
< fieldMap type ="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch" >
< fieldNames hint ="raw:AddFieldByFieldName" >
< field fieldName ="MetadataTitle" storageType ="YES" indexType ="UNTOKENIZED" vectorType ="NO" boost ="1f" type ="System.String" settingType ="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" >
< analyzer type ="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" / >
< /field >
< field fieldName ="MetadataKeywords" storageType ="YES" indexType ="UNTOKENIZED" vectorType ="NO" boost ="1f" type ="System.String" settingType ="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" >
< analyzer type ="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" / >
< /field >
< /fieldNames >
< /fieldMap >
Computed Fields
So, what about the data that is not stored consistently across the site? You can aggregate the needed data using a computed field.
public class MetaDataDescriptionField : IComputedIndexField { public string FieldName { get; set; } public string ReturnType { get; set; } public object ComputeFieldValue(IIndexable indexable) { Assert.ArgumentNotNull(indexable, "indexable"); var indexableItem = indexable as SitecoreIndexableItem; if (indexableItem == null) { Log.Warn(string.Format("{0} : unsupported IIndexable type : {1}", this, indexable.GetType()), this); return null; } string sDescription = String.Empty; if (indexableItem.Item.IsDerived(indexableItem.Item, new Sitecore.Data.ID("some-template-guid"))) { Sitecore.Data.Fields.Field stringField = indexableItem.Item.Fields["Somefieldname"]; if (stringField != null) { sDescription = indexableItem.Item.Fields["Somefieldname"].Value; } } else { // Handle other templates ... } return sDescription; } } public static class ItemExtensions { public static bool IsDerived([NotNull] this Item item, [NotNull] ID templateId) { return TemplateManager.GetTemplate(item).IsDerived(templateId); } }
The computed fields get added to the custom Lucene index configuration.
< fields hint="raw:AddComputedIndexField" >
< field fieldName="Description" storageType="YES" indexType="UNTOKENIZED" >Someproject.Indexes.Computed.MetadataDescriptionField, Somenamespace< /field >
...
< /fields >
I added a variety of computed fields. Some of the fields accessed the LinkManager to store urls to pages or the MediaManager to store urls for images; needed by the presentation layer. In addition, MultiList fields needed to be converted into a usable format so that they could be used for faceting.
Computed Fields for Facets
If you are using facet values in your presentation layer, rather than GUIDs, you will need to convert your Multilist fields into tokenized facet value data in the Lucene search index.
public class CategoryField : IComputedIndexField { public string FieldName { get; set; } public string ReturnType { get; set; } public object ComputeFieldValue(IIndexable indexable) { Assert.ArgumentNotNull(indexable, "indexable"); var indexableItem = indexable as SitecoreIndexableItem; if (indexableItem == null) { Log.Warn(string.Format("{0} : unsupported IIndexable type : {1}", this, indexable.GetType()), this); return null; } List sReturn = new List(); if (indexableItem.Item != null) { if (indexableItem.Item.IsDerived(indexableItem.Item, new Sitecore.Data.ID("some-template-guid"))) { Sitecore.Data.Fields.MultilistField multilistField = currentItem.Fields["somefieldname"]; if (multilistField != null) { sReturn = HelperClass.GetListValues(multilistField, "someotherfieldname"); } else { ... } return sReturn; } } } public class HelperClass { public static List GetListValues(MultilistField multiListField, string fieldName) { List results = new List(); if (multiListField == null) { return results; } foreach (Sitecore.Data.ID sitecoreID in multiListField.TargetIDs) { Item sitecoreItem = SitecoreHelper.GetItem(sitecoreID.ToString()); string result = SitecoreHelper.GetFieldValue(sitecoreItem, fieldName); if (!string.IsNullOrWhiteSpace(result)) { results.Add(result); } } return results; } }
< fields hint="raw:AddComputedIndexField" >
< field fieldName="Description" storageType="YES" indexType="UNTOKENIZED" >Someproject.Indexes.Computed.MetadataDescriptionField, Somenamespace< /field >
< field fieldName="ComputedCategory" storageType="YES" indexType="TOKENIZED" >Someproject.Indexes.Computed.CategoryField, Somenamespace< /field >
...
< /fields >
Tokenized versus Untokenized
An important setting in index field configuration is indexType. Untokenized fields will be stored as one string in the Lucene index. Tokenized fields will be broken up.
Facet Values Containing Spaces
One major gotcha that I encountered was facet values that contain spaces. Because the facet fields are tokenized, the spaces in the facet values wrecked havoc with the facet results sets. I found an excellent blog by Ryan Bailey, referencing a solution provided by Martina Welander.
Adding the computed facet fields to the fieldMap section solved the problem.
< fieldMap type="Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch" >
< fieldNames hint="raw:AddFieldByFieldName" >
...
< field fieldName="ComputedCategory" storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" >
< analyzer type="Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider" / >
< /field >
...
< /fieldNames >
< /fieldMap >
Dynamically Excluding Content
One of the requirements in this site is to be able to explicitly hide content from the site search based on an item field value. I chose to create a custom crawler for this purpose. The direct solution would be to directly filter on the results. I chose to abstract this requirement into a crawler because of the complexity surrounding the search and faceting logic. I did not want to complicate it further.
public class ExcludeItemCrawler : SitecoreItemCrawler { protected override bool IsExcludedFromIndex(SitecoreIndexableItem indexable, bool checkLocation = false) { bool isExcluded = base.IsExcludedFromIndex(indexable, checkLocation); if (isExcluded) return true; Item item = (Item)indexable; // If its a wildcard if (item.Name == "*") return true; // Several complex checks if (somecondition) return true; ... return false; } }
< locations hint="list:AddCrawler" >
< crawler type="Somenamespace.ContentSearch.ExcludeItemCrawler, Somenamespace" >
< Database >master < /Database >
< Root >/sitecore/content < /Root >
< /crawler >
< /locations >
< locations hint="list:AddCrawler" >
< crawler type="Somenamespace.ContentSearch.ExcludeItemCrawler, Somenamespace" >
< Database >master < /Database >
< Root >/sitecore/media library < /Root >
< /crawler >
< /locations >
Ok, so now that we have the data that we want in the Lucene index, I’ll talk about how to query it in Part II.