Sitecore: Solr spell check

Extending Sitecore ContentSearch provider with Solr spell check

During one of our projects we faced with a problem that Sitecore 7 does not support out of the box some cool features of Solr like spell check and similar results , which were in great demand on one of our projects.

After some time of investigation with reflector we found out that this kind of customization would not be easy, as a lot of classes in Sitecore provider to Solr wasn't designed to support third party customization. There are a lot of internal classes and important properties hidden in private properties.

Enable spell check on Solr side

First of all you should configure spellcheck component in solrconfig.xml (for our solution it was enough to change spellchecker.field).

<lst name="spellchecker">
  ...
  <str name="”field”">field_name</str>
  ...
</lst>

Then you should enable this component for select handler in order to be able to get the component results in response to query generated from Sitecore.

<requestHandler name="/select" class="solr.SearchHandler">
  ...
  <arr name="last-components"> 
    ...
    <str>spellcheck</str>
  </arr>
</requestHandler>

After that you could check if your configuration is working by querying link similar to that:

http://<core url>/select?spellcheck=true&spellcheck.q=wroong+word&spellcheck.collate=true

Solr is ready and we could move forward to Sitecore.

Solr provider in Sitecore

As usual process of Sitecore extension starts with digging into sources through ILSpy or similar tools, this time was not an exception.

When we working with ContentSearch we are getting results by calling GetResults or GetFasets functions, which are implemented as an extension to IQueriable<> interface and add required node to Linq calls chain.

However, if we go further we will see that ContentSearch Linq parser and mapper understand only limited set of functions (defined by enum) and have no values reserved for the future, so the only possible way is get closer to SolrNet queries.

Extending Solr Provider

In order to do this we need to create an extension for IQueryable<> and pass search context and spell check query in it. Within the extension we create QueryOptions object from SolrNet and configure spell check.

public static string CheckSpelling<TSource>(this IQueryable<TSource> query, IProviderSearchContext context, string text = null)
{
    var extendedQuery = (SolrCompositeQuery)((IHasNativeQuery)query).Query;
    extendedQuery.Methods.Add(new GetResultsMethod(GetResultsOptions.Default));

    var parameters = new SpellCheckingParameters { Collate = true };

    if (!string.IsNullOrEmpty(text))
    {
        parameters.Query = text;
    }

    var newQuery = new ExtendedCompositeQuery(
        extendedQuery.Query,
        extendedQuery.Filter,
        extendedQuery.Methods,
        extendedQuery.VirtualFieldProcessors,
        extendedQuery.FacetQueries,
        new QueryOptions
        {
            SpellCheck = parameters,
            Rows = 0
        }
        );

    var linqToSolr = new CustomLinqToSolrIndex<TSource>((SolrSearchContext)context, null);
    var response = linqToSolr.Execute<ExtendedSearchResults<TSource>>(newQuery);

    return GetSpellCheckedString(response.SpellCheckedResponse);
}

Than QueryOptions should be added to as a property to a type inherited from SolrCompositeQuery.

public class ExtendedCompositeQuery : SolrCompositeQuery
{
    public QueryOptions QueryOptions { get; set; }

    public ExtendedCompositeQuery(AbstractSolrQuery query, AbstractSolrQuery filterQuery, IEnumerable<Sitecore.ContentSearch.Linq.Methods.QueryMethod> methods, IEnumerable<IFieldQueryTranslator> virtualFieldProcessors, IEnumerable<FacetQuery> facetQueries, QueryOptions options)
        : base(query, filterQuery, methods, virtualFieldProcessors, facetQueries)
    {
        this.QueryOptions = options;
    }
}

Composite query could be executed with help of LinqToSolrIndex<> class, but standard realization of this class will not recognize our QueryOptions, so it should be used as a base class for CustomLinqToSolrIndex<>.

public class CustomLinqToSolrIndex<TItem> : LinqToSolrIndex<TItem>
{
    private readonly SolrSearchContext context;

    private readonly string cultureCode;

    /// <summary>
    /// Initializes a new instance of the <see cref="CustomLinqToSolrIndex{TItem}" /> class.
    /// </summary>
    /// <param name="context">The context.</param>
    /// <param name="executionContext">The execution context.</param>
    public CustomLinqToSolrIndex(SolrSearchContext context, IExecutionContext executionContext)
        : base(context, executionContext)
    {
        Assert.ArgumentNotNull(context, "context");
        this.context = context;
        var executionContext1 = this.Parameters.ExecutionContext as CultureExecutionContext;
        var culture = executionContext1 == null ? CultureInfo.GetCultureInfo(Settings.DefaultLanguage) : executionContext1.Culture;
        this.cultureCode = culture.TwoLetterISOLanguageName;
        ((SolrFieldNameTranslator)this.Parameters.FieldNameTranslator).AddCultureContext(culture);
    }

    /// <summary>
    /// Executes the specified composite query.
    /// </summary>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="compositeQuery">The composite query.</param>
    /// <returns></returns>
    public TResult Execute<TResult>(ExtendedCompositeQuery compositeQuery)
    {
        if (!typeof(TResult).IsGenericType || typeof(TResult).GetGenericTypeDefinition() != typeof(ExtendedSearchResults<>))
        {
            return base.Execute<TResult>(compositeQuery);
        }

        var resultType = typeof(TResult).GetGenericArguments()[0];
        var solrQueryResults = this.Execute(compositeQuery, resultType);
        var type = typeof(SolrSearchResults<>).MakeGenericType(
            new[]
            {
                resultType
            });
        var methodInfo = this.GetType().GetMethod("GetExtendedResults", BindingFlags.Instance | BindingFlags.NonPublic).MakeGenericMethod(typeof(TResult), resultType);
        var selectMethod = this.GetSelectMethod(compositeQuery);
        var instance = Activator.CreateInstance(
            type,
            new object[]
            {
                this.context,
                solrQueryResults,
                selectMethod,
                compositeQuery.VirtualFieldProcessors
            });
        return (TResult)methodInfo.Invoke(this, new[] { compositeQuery, instance, solrQueryResults });
    }

    /// <summary>
    /// Executes the specified composite query.
    /// </summary>
    /// <param name="compositeQuery">The composite query.</param>
    /// <param name="resultType">Type of the result.</param>
    /// <returns></returns>
    internal SolrQueryResults<Dictionary<string, object>> Execute(ExtendedCompositeQuery compositeQuery, Type resultType)
    {
        var options = compositeQuery.QueryOptions;
        if (compositeQuery.Methods != null)
        {
            var list1 = (compositeQuery.Methods).Where(m => m.MethodType == QueryMethodType.Select).Select(m => (SelectMethod)m).ToList();
            if ((list1).Any())
            {
                foreach (var str in list1.SelectMany(selectMethod => (IEnumerable<string>)selectMethod.FieldNames))
                {
                    options.Fields.Add(str.ToLowerInvariant());
                }
                if (!this.context.SecurityOptions.HasFlag(SearchSecurityOptions.DisableSecurityCheck))
                {
                    options.Fields.Add("_uniqueid");
                    options.Fields.Add("_datasource");
                }
            }

            var list2 = compositeQuery.Methods.Where(m => m.MethodType == QueryMethodType.GetResults).Select(m => (GetResultsMethod)m).ToList();
            if (list2.Any())
            {
                if (options.Fields.Count > 0)
                {
                    options.Fields.Add("score");
                }
                else
                {
                    options.Fields.Add("*");
                    options.Fields.Add("score");
                }
            }

            var list3 = compositeQuery.Methods.Where(m => m.MethodType == QueryMethodType.OrderBy).Select(m => (OrderByMethod)m).ToList();
            if (list3.Any())
            {
                foreach (var orderByMethod in list3)
                {
                    var field = orderByMethod.Field;
                    options.AddOrder(
                        new[]
                        {
                            new SortOrder(field, orderByMethod.SortDirection == SortDirection.Ascending ? Order.ASC : Order.DESC)
                        });
                }
            }

            var list4 =
                compositeQuery.Methods.Where(m => m.MethodType == QueryMethodType.Skip).Select(m => (SkipMethod)m).ToList();
            if (list4.Any())
            {
                var num = list4.Sum(skipMethod => skipMethod.Count);
                options.Start = num;
            }

            var list5 =
                compositeQuery.Methods.Where(m => m.MethodType == QueryMethodType.Take).Select(m => (TakeMethod)m).ToList();
            if (list5.Any())
            {
                var num = list5.Sum(takeMethod => takeMethod.Count);
                options.Rows = num;
            }

            var list6 =
                compositeQuery.Methods.Where(m => m.MethodType == QueryMethodType.Count).Select(m => (CountMethod)m).ToList();
            if (compositeQuery.Methods.Count == 1 && list6.Any())
            {
                options.Rows = 0;
            }

            var list7 =
                compositeQuery.Methods.Where(m => m.MethodType == QueryMethodType.Any).Select(m => (AnyMethod)m).ToList();
            if (compositeQuery.Methods.Count == 1 && list7.Any())
            {
                options.Rows = 0;
            }

            var list8 =
                compositeQuery.Methods.Where(m => m.MethodType == QueryMethodType.GetFacets).Select(m => (GetFacetsMethod)m).ToList();
            if (compositeQuery.FacetQueries.Count > 0 && (list8.Any() || list2.Any()))
            {
                foreach (
                    var facetQuery in
                        GetFacetsPipeline.Run(
                            new GetFacetsArgs(
                                null,
                                compositeQuery.FacetQueries,
                                this.context.Index.Configuration.VirtualFieldProcessors,
                                this.context.Index.FieldNameTranslator)).FacetQueries.ToHashSet())
                {
                    if (facetQuery.FieldNames.Any())
                    {
                        var minimumResultCount = facetQuery.MinimumResultCount;
                        if (facetQuery.FieldNames.Count() == 1)
                        {
                            var fieldNameTranslator = this.FieldNameTranslator as SolrFieldNameTranslator;
                            var str = facetQuery.FieldNames.First();
                            if (fieldNameTranslator != null && str == fieldNameTranslator.StripKnownExtensions(str) && this.context.Index.Configuration.FieldMap.GetFieldConfiguration(str) == null)
                            {
                                str = fieldNameTranslator.GetIndexFieldName(str.Replace("__", "!").Replace("_", " ").Replace("!", "__"), true);
                            }
                            var queryOptions = options;
                            var solrFacetQueryArray1 = new ISolrFacetQuery[1];
                            solrFacetQueryArray1[0] = new SolrFacetFieldQuery(str)
                                                      {
                                                          MinCount = minimumResultCount
                                                      };
                            var solrFacetQueryArray2 = solrFacetQueryArray1;
                            queryOptions.AddFacets(solrFacetQueryArray2);
                        }
                        if (facetQuery.FieldNames.Count() > 1)
                        {
                            var queryOptions = options;
                            var solrFacetQueryArray1 = new ISolrFacetQuery[1];
                            solrFacetQueryArray1[0] = new SolrFacetPivotQuery
                                                      {
                                                          Fields = new[]
                                                                   {
                                                                       string.Join(",", facetQuery.FieldNames)
                                                                   },
                                                          MinCount = minimumResultCount
                                                      };
                            var solrFacetQueryArray2 = solrFacetQueryArray1;
                            queryOptions.AddFacets(solrFacetQueryArray2);
                        }
                    }
                }
                if (!list2.Any())
                {
                    options.Rows = 0;
                }
                //var list9 =
                //    compositeQuery.Methods.Where(m => m.MethodType == QueryMethodType.Cast).Select(m => (GetSpellCheck)m).ToList();
                //if (list9.Any())
                //{
                //    options.Rows = 0;
                //    options.SpellCheck = new SpellCheckingParameters { Collate = true };
                //}
            }
        }

        if (compositeQuery.Filter != null)
        {
            options.AddFilterQueries(
                new ISolrQuery[]
                {
                    compositeQuery.Filter
                });
        }

        options.AddFilterQueries(
            new ISolrQuery[]
            {
                new SolrQueryByField("_indexname", this.context.Index.Name)
            });

        if (!Settings.DefaultLanguage.StartsWith(this.cultureCode))
        {
            var queryOptions = options;
            var solrQueryArray1 = new ISolrQuery[1];
            solrQueryArray1[0] = new SolrQueryByField("_language", this.cultureCode + "*")
                                 {
                                     Quoted = false
                                 };
            var solrQueryArray2 = solrQueryArray1;
            queryOptions.AddFilterQueries(solrQueryArray2);
        }

        var loggingSerializer = new SolrLoggingSerializer();
        var q = loggingSerializer.SerializeQuery(compositeQuery.Query);

        try
        {
            if (!options.Rows.HasValue)
            {
                options.Rows = ContentSearchConfigurationSettings.SearchMaxResults;
            }
            SearchLog.Log.Info("Query - " + q);
            SearchLog.Log.Info("Serialized Query - ?q=" + q + "&" + string.Join("&", loggingSerializer.GetAllParameters(options).Select(p => string.Format("{0}={1}", p.Key, p.Value)).ToArray()));

            return this.SolrOperations.Query(q, options);
        }
        catch (Exception ex)
        {
            if (!(ex is SolrConnectionException) && !(ex is SolrNetException))
            {
                throw;
            }
            var message = ex.Message;
            if (ex.Message.StartsWith("<?xml"))
            {
                var xmlDocument = new XmlDocument();
                xmlDocument.LoadXml(ex.Message);
                var xmlNode1 = xmlDocument.SelectSingleNode("/response/lst[@name='error'][1]/str[@name='msg'][1]");
                var xmlNode2 = xmlDocument.SelectSingleNode("/response/lst[@name='responseHeader'][1]/lst[@name='params'][1]/str[@name='q'][1]");
                if (xmlNode1 != null && xmlNode2 != null)
                {
                    SearchLog.Log.Error(string.Format("Solr Error : [\"{0}\"] - Query attempted: [{1}]", xmlNode1.InnerText, xmlNode2.InnerText));
                    return new SolrQueryResults<Dictionary<string, object>>();
                }
            }
            Log.Error(message, this);
            return new SolrQueryResults<Dictionary<string, object>>();
        }
    }

    /// <summary>
    /// Gets the extended results.
    /// </summary>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <typeparam name="TDocument">The type of the document.</typeparam>
    /// <param name="compositeQuery">The composite query.</param>
    /// <param name="processedResults">The processed results.</param>
    /// <param name="results">The results.</param>
    /// <returns></returns>
    internal TResult GetExtendedResults<TResult, TDocument>(ExtendedCompositeQuery compositeQuery, SolrSearchResults<TDocument> processedResults, SolrQueryResults<Dictionary<string, object>> results)
    {
        var type = typeof(TResult);

        var hits = processedResults.GetSearchHits();
        var facetResults = this.FormatFacetResults(processedResults.GetFacets(), compositeQuery.FacetQueries);

        var obj = Activator.CreateInstance(type, hits, processedResults.NumberFound, facetResults);

        if (type.HasProperty("SpellCheckedResponse"))
        {
            var spellCheckPropetry = type.GetProperty("SpellCheckedResponse");
            if (spellCheckPropetry != null && spellCheckPropetry.CanWrite)
            {
                spellCheckPropetry.SetValue(obj, results.SpellChecking.Collation);
            }
        }

        if (type.HasProperty("SimilarResults"))
        {
            var similarResultsPropetry = type.GetProperty("SimilarResults");
            if (similarResultsPropetry != null && similarResultsPropetry.CanWrite)
            {
                similarResultsPropetry.SetValue(obj, results.SimilarResults);
            }
        }

        return (TResult)Convert.ChangeType(obj, typeof(TResult));
    }

    private SelectMethod GetSelectMethod(SolrCompositeQuery compositeQuery)
    {
        var type = this.GetType().BaseType;
        var method = type.GetMethod("GetSelectMethod", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.Static);
        try
        {
            return (SelectMethod)method.Invoke(this, new object[] { compositeQuery });
        }
        catch (Exception ex)
        {
            Log.Error("Signture of internal LinqToSolrIndex<TItem>.GetSelectMethod has changed or method not found", ex, this);
            return null;
        }
    }

    private FacetResults FormatFacetResults(Dictionary<string, ICollection<KeyValuePair<string, int>>> facetResults, List<FacetQuery> facetQueries)
    {
        var type = this.GetType().BaseType;
        var method = type.GetMethod("FormatFacetResults", BindingFlags.NonPublic | BindingFlags.Instance);
        try
        {
            return (FacetResults)method.Invoke(this, new object[] { facetResults, facetQueries });
        }
        catch (Exception ex)
        {
            Log.Error("Signture of internal LinqToSolrIndex<TItem>.FormatFacetResults has changed or method not found", ex, this);
            return new FacetResults();
        }
    }

    private ISolrOperations<Dictionary<string, object>> SolrOperations
    {
        get
        {
            var solrSearchIndex = this.context.Index as SolrSearchIndex;

            if (solrSearchIndex != null)
            {
                return typeof(SolrSearchIndex)
                    .GetProperty("SolrOperations", BindingFlags.NonPublic | BindingFlags.Instance)
                    .GetValue(solrSearchIndex) as ISolrOperations<Dictionary<string, object>>;
            }
            return null;
        }
    }
}

In the class above we need to create Execute method that will take our ExtendedCompositeQuery as param and call overwritten logic (internal method SolrQueryResults) when we returning extended SearchResults (with SpellCheckedResponse field) and base in all other cases.
We also need to make some tweaks with reflection implementing new query execute logic like:


  • getting ISolrOperations, SelectMethod and FacetResults from private field of base class
  • duplicate source code for SearchResults and SolrSearchResults classes, as original is sealed or internal ☹

*** These point are definitely not the best options ever but as it was already mentioned ContentSearch.Solr.Provider was not designed for easy extensibility.

When all changes above are implemented we could call our extension and process results.

var query = context.GetQueryable<T>(.Filter( <filter query predicate> );
var checked = query.CheckSpelling(context, " < phrase to check> ");

This call will make separate call to Solr and will not return any mutch results and QueryOptions define amount or results rows = 0. To get search results or fasets you could call required method after calling CheckSpelling.

No comments :

Post a Comment