Backend - Integrated Apache SolR for Data Indexing and Discovery (<500ms search in a 1 Million dataset)
New Data Search capability we introduced last month, needed a server-side data indexing and data discovery mechanism, to enable fast data/insights searching. We benchmarked SolR and ElasticSearch and found SolR significantly faster and more robust in the area of NLP/ML, so went ahead and integrated SolR in Germain.
Benchmark
- Indexed 1million rows in less than 4 min using our own 3-year old desktop, without any tuning
- Searched on this 1 million row is taking at the most 500ms
Comparison of Apache SolR & ElasticSearch
Solr | ElasticSearch | |
Index Speed based on 1mil rows (ootb) | ~4min | ~22min |
Index Speed based on 1mil rows (with simple optimizations) | not tested | ~8min |
Index Size | ~500mb | ~750mb |
Requires additional tool/software to pull from DB and insert into search platform | No | Yes (Logstash) |
Simple Query API | Yes | Yes |
Built-in scheduler for updates | No | Yes (Logstash) |
Returns entire document as search result | Yes | Yes |
Full-Text Search Features (misspealing, synonyms, ..) | Yes (very advanced) | Yes |
Overall application | Text search | analytical querying, filtering, and grouping |
Nested documents support | No | Yes |