Skip to main content
Skip table of contents

Backend - Integrated Apache SolR for Data Indexing and Discovery (<500ms search in a 1 Million dataset)

New Data Search capability we introduced last month, needed a server-side data indexing and data discovery mechanism, to enable fast data/insights searching. We benchmarked SolR and ElasticSearch and found SolR significantly faster and more robust in the area of NLP/ML, so went ahead and integrated SolR in Germain. 

Benchmark

  • Indexed 1million rows in less than 4 min using our own 3-year old desktop, without any tuning
  • Searched on this 1 million row is taking at the most 500ms


Comparison of Apache SolR & ElasticSearch



SolrElasticSearch
Index Speed based on 1mil rows (ootb)~4min~22min
Index Speed based on 1mil rows (with simple optimizations)not tested~8min
Index Size~500mb~750mb
Requires additional tool/software to pull from DB and insert into search platformNoYes (Logstash)
Simple Query APIYesYes
Built-in scheduler for updatesNoYes (Logstash)
Returns entire document as search resultYesYes
Full-Text Search Features (misspealing, synonyms, ..)Yes (very advanced)Yes
Overall applicationText searchanalytical querying, filtering, and grouping
Nested documents supportNoYes
JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.