If you have a multi-language or multi site installation in Umbraco where you might want to have a site search using Examine, you'll run into the issue that the indexes contain the reults for ALL of the sites, not just the current site that the user is on.
I've been working on a multi-language site recently and ran into just this issue. Here's how I got round it and made a search that can be included on all of the sites, with no changes needing to be made.
First up, how can we limit the search? Handily, we can use the path variable, which stores the path of the page in the Umbraco content tree, in a format something like: -1,1060,1075,1230, where -1 denotes the content root, and the rest of the numbers are the nodes between the root and page that you're looking at.
In our user control that does the search, we can get the current node, and rather than jumping back up the tree, we can just split the path variable out and get the 2nd item in the array to get the id of the site root node, like this:
var currentPage = umbraco.NodeFactory.Node.GetCurrent(); string parentId = currentPage.Path.Split(',')[1];
Now we know the root node of the current site, how can we use it with our search? Handily, you can just add the path to your index settings file. However, the path gets stored in the index in a comma separated format, which is no good for searching, as Examine treats it as one big string, so searching for the root node on the raw path will return no results. However, if you were to replace the commas with spaces in the index, the numbers in the path would be treated like words, so you could search for your root node on it, and it would return only pages with the root node in their path.
So how to alter the index? Easy! You can plug into the Examine events to alter the index as it's being written. Basically we want to hook into ther event, get the path field, replace the commas with spaces, and then save it as a new field in the Examine index. Here's an example of the code that we used in an AppliactionBase class to hook into the event handler and make the changes:
using System; using System.Collections.Generic; using System.Linq; using System.Web; using umbraco.BusinessLogic; using Examine; namespace MySite.UmbracoExtensions.EventHandlers { public class CmsEvents : ApplicationBase { public CmsEvents() { //Add event to allow searching by site section var indexerSite = ExamineManager.Instance.IndexProviderCollection["SiteSearchIndexer"]; indexerSite.GatheringNodeData += new EventHandler(SetSiteSearchFields); } //modifies the index field for the path variable, so that it can be searched properly void SetSiteSearchFields(object sender, IndexingNodeDataEventArgs e) { //grab the current data from the Fields collection var path = e.Fields["path"]; //let's get rid of those commas! path = path.Replace(",", " "); //add as new field, as path seems to be a reserved word in Lucene e.Fields.Add("searchPath", path); } } }
Obviously you'd need to change the "SiteSearchIndexer" part to the name of your indexer to get it to work! You'll also need to make sure that the path is included in your index (look at the default indexes in your Examine config files for an example of this).
Now all we need to do is make our Examine search look for the root id in the "searchPath" field. Here's the finished code where we get the root node, and use it in an example Examine search:
//do search var searcher = ExamineManager.Instance.SearchProviderCollection["SiteSearchSearcher"]; var criteria = searcher.CreateSearchCriteria(UmbracoExamine.IndexTypes.Content); Examine.SearchCriteria.IBooleanOperation filter = null; //search on main fields filter = criteria.GroupedOr(new string[] { "pageHeading", "pageContent", "navigationText" }, Search); //only show results in the current path var currentPage = umbraco.NodeFactory.Node.GetCurrent(); string parentId = currentPage.Path.Split(',')[1]; filter.And().Field("searchPath", parentId); //don't show hidden pages filter .Not() .Field("umbracoNaviHide", "1"); var resultsTemp = searcher.Search(filter.Compile());
And now your search should only return results for pages in the current site, not pages from ALL of the sites! Nice and easy to do, and a good example of how easy it is to extend Umbraco with its event model!
You can also use this technique to search a specific area of the site, e.g. have a dropdown to filter the search by the News area, or Events area. You could also have a single index for multiple sites, allowing for a search that spaned all the sites as well.
Nice, but... this means you are using the same index for all languages, ie the same Lucene analyzer for all languages, which is not always a good idea. Stemming, stopwords, etc. do not work the same in english and in german, french, whatever...
You can also specify multiple index sets in config and specify the start node id on a particular set to only index data from the specified start node. Then you'll end up with different indexes for each region/language and can then search by just specifying the correct searcher by name.
This also means you can specify a different analyzer for the different languages.
Nice work!