Your misconfigured Magento search module will hurt you
One of MageStack's many features is its native support for replacement Magento search engines; in an instant, you can enable a Sphinx instance, or a SOLR instance or Elastic - to replace MySQL. Replacing the search engine in Magento has immediate benefits,
- Improved search accuracy
- Improved search performance
- Improved layered navigation performance
- Reduced server load
We love Magento, but the native search is terrible, so dropping in a 3rd party module to replace the search functionality seems almost like a necessity.
What to choose
There are plenty of search modules out there, but we do tend to see a number of modules used very frequently. They each have their pro's/con's - and that's beyond the scope of this post to articulate all of them, as my focus is solely going to be on performance.
I touched on this briefly in one of my preparing for peak articles, improving Magento TTFB - which is certainly worth a read.
Real vs. emulated engines
Most modules out there introduce functionality via two different means,
-
Emulated
Where the search functionality is achieved using PHP methods, or extended MySQL queries. You'll still see an improvement over the standard Magento FullText search, but it will come at the penalty of greatly increased CPU utilisation. The reason being is that whilst PHP is a very versatile language, it cannot offer the performance of an application dedicated strictly to searching. - Dedicated
Where there is a dedicated piece of software running on the stack, built solely for the purpose of search. These not only offer accuracy improvements that often beat emulated search - but offer massively increased performance and scalability.
Who would want to use an emulated search engine then?
That would be users who host their stores on infrastructure that doesn't support dedicated search engines. This could range from,
- Shared hosting, where is isn't possible to provide a dedicated search engine per customer
- Low end hosting (VPS/Cloud), where the physical constraints (CPU/RAM/Disk I/O) of the environment cannot support the demands of a dedicated search engine
- Standard server hosting, where the basic control panel (cPanel/Plesk/Interworx) cannot facilitate a dedicated search engine
- DIY hosting, where the expertise of the system administrator doesn't extend so far as to configuring a dedicated search engine
Whilst we would love the world to have Sonassi Magento hosting and leverage the dedicated search engines we provide as standard - we're not always going to be the right fit for everyone. So it does mean that 3rd party search extension makers need to make their modules flexible enough to work in any environment; even if it means there is a performance overhead.
After all, I'm sure most merchants would prefer a relatively slow, accurate search - over a fast, inaccurate search.
Popular choices
Again, this isn't meant to be a definitive comparison of all search modules, but merely those we see most often. Some of these modules are emulated only, and some offer dedicated, with emulated fallback for compatibility.
Module | Engine | Type |
---|---|---|
PHP4U Blast Lucene | PHP | Emulated |
Solarium | SOLR | Dedicated |
Magento Enterprise Search | SOLR | Dedicated |
SOLR Bridge | SOLR | Dedicated |
IntegerNet Solr | SOLR | Dedicated |
Mirasvit Sphinx Search | Sphinx | Dedicated/Emulated Fallback |
Bubbleshop Elastic | ElasticSearch | Dedicated |
A real world example
So you've selected a module that is a good fit for your needs, uses a high performance dedicated engine, installed it - and tested the results. It all looks great. But oddly, server loads have increased and your search results are slower to display - what's going on?.
Over the past week, I've seen 5 stores using Mirasvit Sphinx search, set to run in emulated mode - not dedicated mode; that's what spurred me to write up this article. The performance penalty of this particular module in emulated mode is extremely significant, and not just in the magnitudes of seconds - but tens of seconds.
Identifying the problem
Date: Sat, 05 Mar 2016 10:45:32
Action: alert
Host: dh1.cX.sonassihosting.com
Description: loadavg(5min) of 23.1 matches resource limit [loadavg(5min)<13.0]
Remedy: Autoscale initiating
Taking an isolated incident today with a customer on Intensive support, after our proactive monitoring picked up increased server load and MageStack prepared itself to auto-scale. As a matter of course, we always perform a preliminary investigation to identify why the stack wants to scale - so that we can avoid a customer paying for additional resources if they don't need to be.
The symptoms of the issue were,
- Web node errors PHP Errors
server reached pm.max_children
- DB node high load (1,000% MySQL CPU)
So upon checking the MySQL query log, it built a picture as to what the cause was.
Id User Host/IP DB Cmd Query or State
-- ---- ------- -- --- --------------
3170280 example_ web1 example_magliv Query SELECT `s`.`product_id`, (CASE WHEN `s`.`name` LIKE " product cubapro converion kit 700r to 700 category 28 " THEN 400 ELSE 0 END+CASE WHEN `s`.`name` LIKE "% product cubapro converion kit 700r to 700 c
Normal Magento search queries don't look like this - and certainly don't take so long to execute that they appear in the slow query log. So it is indicative of a 3rd party module performing a more complex MySQL query than usual. Locating the module was easy by scanning for a partial in the query,
[http]$ grep -ri "ELSE 0" app/code/local/Mirasvit/
app/code/local/Mirasvit/SearchSphinx/Model/Engine/Fulltext.php: $fullCases[] = 'CASE WHEN '.$cond.' THEN '.$weight.' ELSE 0 END';
With the culprit found - and this being a known issue. The settings in the admin for the store were checked to reveal the module was running in emulated mode,
So the settings were corrected, to put the engine into dedicated mode, leveraging the dedicated high performance Sphinx engine in MageStack - and to disable "Search on 404" (read more on why Magento 404's are bad for performance)
After making the changes, MageStack rebuilt the Sphinx core for the 12,000 products and reindexed within 17 seconds,
[sphinx1 .microcloud /]
[/]$ magestack sphinx reload
Building core for example.com ...
-
using config file '/etc/sphinxsearch/sphinx.conf'...
indexing index 'examplecom'...
collected 123397 docs, 150.8 MB
sorted 21.0 Mhits, 100.0% done
WARNING: 707 duplicate document id pairs found
total 123397 docs, 150759315 bytes
total 17.310 sec, 8709077 bytes/sec, 7128.40 docs/sec
-
rotating indices: successfully sent SIGHUP to searchd (pid=24641).
Restarting sphinxsearch: Ok
-
using config file '/etc/sphinxsearch/sphinx.conf'...
listening on 127.0.0.1:9315
listening on 172.16.0.51:9315
precaching index 'examplecom'
rotating index 'examplecom': success
precached 1 index in 0.022 sec
sphinxsearch is restarted.
The performance impact
The results were immediately noticeable with a huge immediate drop in both CPU utilisation, from a whopping 90% CPU usage to a healthy 10%. Healthy CPU usage is always considered to be <25% - above this, performance will degrade.
Average stack load time dropped from near 20 seconds TTFB, to a customer and SEO pleasing 0.2s TTFB,
The financial impact of this simple misconfiguration of two settings would have equated to the auto-scaling of a single 40-Core overflow server, at £60.49/day - and that's not to mention the measurable loss in sales and the potential SEO impact from a search engine seeing 20+ second page load times.