Your misconfigured Magento search module will hurt you

One of MageStack's many features is its native support for replacement Magento search engines; in an instant, you can enable a Sphinx instance, or a SOLR instance or Elastic - to replace MySQL. Replacing the search engine in Magento has immediate benefits,

  • Improved search accuracy
  • Improved search performance
  • Improved layered navigation performance
  • Reduced server load

We love Magento, but the native search is terrible, so dropping in a 3rd party module to replace the search functionality seems almost like a necessity.

What to choose

There are plenty of search modules out there, but we do tend to see a number of modules used very frequently. They each have their pro's/con's - and that's beyond the scope of this post to articulate all of them, as my focus is solely going to be on performance.

I touched on this briefly in one of my preparing for peak articles, improving Magento TTFB - which is certainly worth a read.

Real vs. emulated engines

Most modules out there introduce functionality via two different means,

  1. Emulated
    Where the search functionality is achieved using PHP methods, or extended MySQL queries. You'll still see an improvement over the standard Magento FullText search, but it will come at the penalty of greatly increased CPU utilisation. The reason being is that whilst PHP is a very versatile language, it cannot offer the performance of an application dedicated strictly to searching.

  2. Dedicated
    Where there is a dedicated piece of software running on the stack, built solely for the purpose of search. These not only offer accuracy improvements that often beat emulated search - but offer massively increased performance and scalability.

Who would want to use an emulated search engine then?

That would be users who host their stores on infrastructure that doesn't support dedicated search engines. This could range from,

  • Shared hosting, where is isn't possible to provide a dedicated search engine per customer
  • Low end hosting (VPS/Cloud), where the physical constraints (CPU/RAM/Disk I/O) of the environment cannot support the demands of a dedicated search engine
  • Standard server hosting, where the basic control panel (cPanel/Plesk/Interworx) cannot facilitate a dedicated search engine
  • DIY hosting, where the expertise of the system administrator doesn't extend so far as to configuring a dedicated search engine

Whilst we would love the world to have Sonassi Magento hosting and leverage the dedicated search engines we provide as standard - we're not always going to be the right fit for everyone. So it does mean that 3rd party search extension makers need to make their modules flexible enough to work in any environment; even if it means there is a performance overhead.

After all, I'm sure most merchants would prefer a relatively slow, accurate search - over a fast, inaccurate search.

Popular choices

Again, this isn't meant to be a definitive comparison of all search modules, but merely those we see most often. Some of these modules are emulated only, and some offer dedicated, with emulated fallback for compatibility.

Module Engine Type
PHP4U Blast Lucene PHP Emulated
Solarium SOLR Dedicated
Magento Enterprise Search SOLR Dedicated
SOLR Bridge SOLR Dedicated
IntegerNet Solr SOLR Dedicated
Mirasvit Sphinx Search Sphinx Dedicated/Emulated Fallback
Bubbleshop Elastic ElasticSearch Dedicated


A real world example

So you've selected a module that is a good fit for your needs, uses a high performance dedicated engine, installed it - and tested the results. It all looks great. But oddly, server loads have increased and your search results are slower to display - what's going on?.

Over the past week, I've seen 5 stores using Mirasvit Sphinx search, set to run in emulated mode - not dedicated mode; that's what spurred me to write up this article. The performance penalty of this particular module in emulated mode is extremely significant, and not just in the magnitudes of seconds - but tens of seconds.

Identifying the problem

Date: Sat, 05 Mar 2016 10:45:32
Action: alert
Host: dh1.cX.sonassihosting.com
Description: loadavg(5min) of 23.1 matches resource limit [loadavg(5min)<13.0]
Remedy: Autoscale initiating

Taking an isolated incident today with a customer on Intensive support, after our proactive monitoring picked up increased server load and MageStack prepared itself to auto-scale. As a matter of course, we always perform a preliminary investigation to identify why the stack wants to scale - so that we can avoid a customer paying for additional resources if they don't need to be.

The symptoms of the issue were,

  • Web node errors PHP Errors server reached pm.max_children
  • DB node high load (1,000% MySQL CPU)

So upon checking the MySQL query log, it built a picture as to what the cause was.

        Id      User         Host/IP         DB              Cmd Query or State
        --      ----         -------         --              --- --------------
   3170280      example_     web1            example_magliv  Query SELECT `s`.`product_id`, (CASE WHEN `s`.`name` LIKE " product cubapro converion kit 700r to 700 category 28 " THEN 400 ELSE 0 END+CASE WHEN `s`.`name` LIKE "% product cubapro converion kit 700r to 700 c

Normal Magento search queries don't look like this - and certainly don't take so long to execute that they appear in the slow query log. So it is indicative of a 3rd party module performing a more complex MySQL query than usual. Locating the module was easy by scanning for a partial in the query,

[http]$ grep -ri "ELSE 0" app/code/local/Mirasvit/
app/code/local/Mirasvit/SearchSphinx/Model/Engine/Fulltext.php:                $fullCases[] = 'CASE WHEN '.$cond.' THEN '.$weight.' ELSE 0 END';

With the culprit found - and this being a known issue. The settings in the admin for the store were checked to reveal the module was running in emulated mode,

Magento Sphinx Emulated

So the settings were corrected, to put the engine into dedicated mode, leveraging the dedicated high performance Sphinx engine in MageStack - and to disable "Search on 404" (read more on why Magento 404's are bad for performance)

Magento Sphinx Dedicated

After making the changes, MageStack rebuilt the Sphinx core for the 12,000 products and reindexed within 17 seconds,

[sphinx1 .microcloud /]
[/]$ magestack sphinx reload
Building core for example.com ...
-
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'examplecom'...
collected 123397 docs, 150.8 MB
sorted 21.0 Mhits, 100.0% done
WARNING: 707 duplicate document id pairs found
total 123397 docs, 150759315 bytes
total 17.310 sec, 8709077 bytes/sec, 7128.40 docs/sec
-
rotating indices: successfully sent SIGHUP to searchd (pid=24641).
Restarting sphinxsearch: Ok
-
using config file '/etc/sphinxsearch/sphinx.conf'...
listening on 127.0.0.1:9315
listening on 172.16.0.51:9315
precaching index 'examplecom'
rotating index 'examplecom': success
precached 1 index in 0.022 sec
sphinxsearch is restarted.

The performance impact

The results were immediately noticeable with a huge immediate drop in both CPU utilisation, from a whopping 90% CPU usage to a healthy 10%. Healthy CPU usage is always considered to be <25% - above this, performance will degrade.

Magento Sphinx CPU

Average stack load time dropped from near 20 seconds TTFB, to a customer and SEO pleasing 0.2s TTFB,

Magento Sphinx Load Time

The financial impact of this simple misconfiguration of two settings would have equated to the auto-scaling of a single 40-Core overflow server, at £60.49/day - and that's not to mention the measurable loss in sales and the potential SEO impact from a search engine seeing 20+ second page load times.

[nectar_btn size="large" button_style="regular" button_color_2="Extra-Color-1" solid_text_color_override="#13348e" class="button button-bigger" text="Read More: Quick Magento performance wins" url="https://www.sonassi.com/blog/quick-magento-performance-wins" /]