New Data Search capability we introduced last month, needed a server-side data indexing and data discovery mechanism, to enable fast data/insights searching. We benchmarked SolR and ElasticSearch and found SolR significantly faster and more robust in the area of NLP/ML, so went ahead and integrated SolR in Germain.
Benchmark
-
Indexed 1million rows in less than 4 min using our own 3-year old desktop, without any tuning
-
Searched on this 1 million row is taking at the most 500ms
Comparison of Apache SolR & ElasticSearch
|
|
Solr |
ElasticSearch |
|
Index Speed based on 1mil rows (ootb) |
~4min |
~22min |
|
Index Speed based on 1mil rows (with simple optimizations) |
not tested |
~8min |
|
Index Size |
~500mb |
~750mb |
|
Requires additional tool/software to pull from DB and insert into search platform |
No |
Yes (Logstash) |
|
Simple Query API |
Yes |
Yes |
|
Built-in scheduler for updates |
No |
Yes (Logstash) |
|
Returns entire document as search result |
Yes |
Yes |
|
Full-Text Search Features (misspealing, synonyms, ..) |
Yes (very advanced) |
Yes |
|
Overall application |
Text search |
analytical querying, filtering, and grouping |
|
Nested documents support |
No |
Yes |