Plug-in: How do I configure the shop search with Smartstore MegaSearch?
Saturday, December 26, 2020

Plug-in: How do I configure the shop search with Smartstore MegaSearch?

For an introduction, we recommend first reading the article Plug-in: Smartstore MegaSearch "Finding instead of searching":

https://smartstore.com/en/feature-talk-smartstore-megasearch

The MegaSearch plug-in replaces the simple standard search in Smartstore and offers many possibilities of a Lucene.Net based full text search. In contrast to the standard search, MegaSearch does not work directly with the database but activates a search index based on files with the goal of making search as fast and flexible as possible. Currently MegaSearch includes an index for catalog data and one for forum data. The MegaSearch Plus Plug-in extends MegaSearch by localised\language dependent data, MultiStore and access restrictions, and product and specification attributes. This article explains the many and sometimes complex settings of the search using MegaSearch.

A note on the used terms. Lucene.Net is a .NET framework port of the original Apache Lucene search engine library. Product attributes (also known as product variants) are properties of a product selectable by the buyer, such as size and colour. Specification attributes on the other hand are not selectable and displayed purely textually on product detail pages. Both types of attributes can be filtered in search results.

A word in advance on the subject search speed. It depends on several factors, first on the mentioned configuration, also on the data volume and, above all, data quality. If large amounts of uncleaned data are pumped into a shop without any thinking, it can significantly affect the performance of the search. In tests we have tended to get good results in searches of almost three million products but less good values in the faceting of more than 40,000 specification attributes or categories, whatever the purpose of the latter may be. In other words, an increasing amount of data to be facetted affects the search performance more than a large amount of product or catalog data the actual full text search. The term speed is to be understood relatively here. With an usual amount of data the search is very fast and when a search setting is changed a difference in speed is hardly noticeable, as it is in the millisecond range.

The search affecting settings are divided into three main areas: the general search settings, the MegaSearch configuration and - in special cases - object editing such as that of specification attributes.

General search settings

Affects both the standard search and MegaSearch. To be found under Configuration > Settings > Search.

The Search mode determines when there is a match with the search term and when to provide a search hit accordingly. The setting affects both the number of hits and the speed of search. "Is equal to" (exact match) is the fastest with comparatively few hits, "Starts with" is something slower with significantly more hits and "Contains" is the slowest with very many hits.

The data fields to be searched are specified by Search fields setting. The fewer fields, the faster the search, whereby the increase in speed is usually comparatively low. The product name (or the topic title for forums) is not selectable, as it is always searched. The field for product tags has a special meaning here: if a product has to be found by a given term, that is not included in any of the search fields' content, then a product tag can be created and assigned to the product. If you watch the search log (plug-in Smartstore Search Log) and come across that a particular search term has no or too few hits, then it is worth thinking about to assign a corresponding product tag to that product.

The Default sort order specifies how to sort the search hits by default. "Best results" ranks hits with high relevance before those with lower relevance. The relevance (or scoring value) of a search hit is not a percentage match with the search term. Such a value does not exist. Therefore it is also not possible to filter search hits with let's say a match of 70 or more percent.

"Open product directly at SKU, MPN or GTIN" checks the database (before searching) if the search term matches one of the mentioned product specifier. In case of a match, the relevant product page is opened directly, so there's no list with search hits opened.

We'll skip the settings for the Instant Search (Search-As-You-Type) (because it's self-explanatory) and go directly to the Result filtering. MegaSearch offers a so-called drill-down faceting where you get fewer search results step by step the more filters are applied, including a number of hits when using the respective filter. A large number of categories, product or specification attributes can reduce the speed of search because all combinations of these values must be created and checked for search hits. Decreasing "Maximum number of filters" (default is 20) can speed up this process, if this upper limit is reached earlier. Suppose there are tens of thousands of categories with no or very few products in it. Then it may happen (depending on current search term) that the faceting has to check thousands of empty categories before it finds the first with a matched product. This happens on all pages where facets are offered, which means on search and all category and manufacturer pages. This effect is even stronger if the catalog setting "Include products from subcategories" is activated.

You can hide filters by using the setting "Deactivated", whereby those for categories are always displayed. Also you have the possibility to display unavailable products by default via "Include unavailable products".

MegaSearch configuration

To be found at Plugins > MegaSearch.

The grid shows information about the catalog and forum index. On the right there is a menu with commands like "Rebuild" (recreate index) and (transfer data of changed products to the index). A scheduled task ensures that the related index is updated at a certain time interval. This interval should not be too small for large amounts of data, since in addition to product data, further metadata of categories, product and specification attributes etc. needs to be stored in the index. "Show settings" displays index-specific settings. Some changes only become effective after a rebuild of the index.

Filters for product and specification attributes can be activated in this section (MegaSearch Plus required), which causes attribute data to be stored in the search index as well and corresponding filters to be displayed in the frontend. The option "Ignore Allow filtering on product level" is a special case for specification attributes. You can specify at attribute and at product level whether filtering by a specification attribute is enabled in the frontend. Through this option MegaSearch is instructed to generally ignore "Allow filtering" on product level, which can speed up faceting of a large number of specification attributes. We recommend to set "Allow filtering" only on specification attribute level (if possible) because it makes working with (many) specification attributes easier.

Top categories and Top manufacturers are displayed as links in Instant Search, to filter the search hits directly by the desired category or manufacturer. The "Boost search fields" setting is an important instrument to influence the order of search hits. A higher boost value ranks the product higher up in the search results if a hit was scored over the related field. The distances between the individual boost values should not be too large as this can lead to unwanted results.

"Active indexes" specifies which index is active. If you do not use a forum in the frontend, you can deactivate the forum index and also the associated scheduled task to update it. Through "Always rebuild index MegaSearch is instructed to always rebuild the index instead of updating it. The shop then does not have to look in the background for updates of product data.

.

Alternative suggestions for a search term are optionally displayed as links in the "Did you mean?" section. The product names are stored as so-called N-Grams in a separate index for it. With hundreds of thousands of products, this indexing may take a disproportionately long time. It therefore makes sense to specify an upper limit using "Maximum number of indexed suggestions".

The settings for Text analysis are intended for advanced users and control how the search data is internally processed by Lucene.Net. Lucene.Net does not directly compare the search phrase with e.g. the product name. Instead it compares the Terms derived from that data. Consequently, it is not the original text that is stored in the index but the terms emitted from it (with the help of so-called analyzers). Similar, when a search term is entered, it is also splitted into terms in order to make it comparable with the index. Therefore in context of internal processing it is common to distinguish between search and indexing phase.

The setting "Text analysis" specifies the "standard analyzer" which is always used if no particular analyzer is used for a field (fallback). By default, MegaSearch always processes text language-related (recommended) -, so select a different analyzer only in special cases (experimental). For the fields SKU, GTIN and manufacturer part number this has no effect because these fields are always analyzed by keywords (KeywordAnalyzer). The Minimum word length specifies the minimum length of the terms emitted by the analyzer. Terms of shorter length are ignored during search and indexing phase. In case of a very large search index or a very large number of products, it can be useful to increase this value to avoid "noise of search hits" and to achieve fewer but more accurate hits.

Further options are displayed for even finer control when activating "Enable advanced text analysis". "Differing text analysis" specifies a word tokenization and filtering that differs from the above "standard analyzer". It is used for product names for instance, which are very important for the search. In context of text analysis the product name is a special case where the type of text may differ from shop to shop. "Red living room blanket with check pattern for cosy evenings" would be a descriptive and "XLB-A9.Cistus Incanus Powder" a signifying name. If the product name is mainly signifying, it may be useful to use a different tokenization via "Differing text analysis" to avoid terms becoming too fragmented or even filtered out. Example: MegaSearch default settings (without advanced text analysis) generates the following terms for the product name "XLB-A9.Cistus Incanus Powder": xlb, a9, cistus, incanus, powder. Because of the first two terms, this can lead to too many hits and could be seen as inaccurate when searching for "xlb-a9.cistus". Instead Whitespace for "Differing text analysis" would emit: xlb-a9.cistus, incanus, powder. With the same search term, a more accurate hit list should be expected here. "Enable advanced text analysis" also allows to enter lists for special cases, e.g. for abbreviations, synonyms, word dictionary additions and exceptions for word combinations. This can be important for specific, frequently occurring terms. A shop for computer hardware could for instance use synonyms like "notebook,laptop,convertible,mobile computer" here.

Search settings when editing objects

Product and specification attributes have a property to show or hide facets (MegaSearch Plus required). In case of specification attributes, it's also possible to show or hide them on product level when assigning attributes to a certain product. However it is recommended to specify this directly at the attribute because it makes working with attributes easier. If you observe a negative impact on performance as a result of too many attributes, then you should activate the already mentioned option "Ignore Allow filtering on product level" and disable "Allow filtering" for those specification attributes that do not necessarily have to be filtered in the frontend.

For Search filter presentation you can choose between checkboxes, colour and image boxes and "Numeric Range". The latter is only available for specification attributes and requires numerical values to be entered for options. For example this filter type is useful for colour ranges or colour nuances and allows filtering them by using a from-to selection in the frontend. This type is also suitable for attributes with a large number of options where a selection of single options would be too confusing. The setting "Index option names" causes the name of options to be stored in the search index. This way the products will also be found when searching by option name.

Planned features

Other options and enhancements currently being planned for MegaSearch.

  • AND-combine the words of a search term. Means that the more words the search phrase contains, the fewer hits are generated.
  • Featured sorting, according to the order specified by the merchant. This option is currently only available for product lists of categories, not for search pages.
  • Search index for Page Builder stories.
  • Search of categories provides a link to the category.
Leave your comment