ElasticSearch schema (preferred)
Platform for real-time search, analytics, and logging. Germain prefers Elastic over any other datastore.
Basic high-level process to setup Elastic Search as data store for Germain. please contact us for script and more details.
Get host for ES and Germain
Get DB for Germain config (SQL DB)
export config json from Stage Germain
Create a Temporary Germain Server
It will be used to setup the Elasticsearch database pre data migration.
It will need a SQL database for the config.
Change the queue names to have new prefixes so that it can use the same ActiveMQ without interfering
germain.elastic.properties.indexPrefixto be more meaningful (Stage)
Configure the connection to ES
No Services will be needed
Setup Elasticsearch DB
Make sure the Elastic DB host is the same timezone as the data source (Central Time)
Import the search indexes from
Prepare the DB
Verify it connects okay in DB
Use special rest endpoint that applies the rest of the config for Germain to ES DB
Data Migration Script
fill in the details in the migration tools .yml file
run the script one per data type. Arguments as per examples (same folder as tool)
Multiple can be run in parallel
I might adapt a script/tool to automate
Query Validation for Elastic
Germain supports the validation of Elastic Search queries
Log on to Germain.
Left Menu > System > Engine Settings > Component Types’ page.
Click on ‘+' icon.
Select ‘Elasticsearch Query Monitor Component’.
Percentiles or Std Deviation need row data
Percentiles or std deviation measures are not supported by Elastic’s roll-up mechanism; only min, max, avg, sum, count are supported. Percentiles or std deviation are supported by germain/Elastic as long as raw data is there
Germain’s "aggregation" equates to "roll-up" in Elastic; same basic idea, compress data over time by reducing detail level.
Built-in support for roll-up at configurable time, into variable time windows.
For now, germain rolls up into hourly index that can be kept past raw data window.
Single API call to query both raw and aggregated indexes, results are merged automatically.
Timeseries datastreams are generally append only / write once; in order to update data, we either use Elastic API or directly push update to underlying index.
For our fact datastreams, we use 1-day "hot" storage, afterwards read-only "cold" storage for as long as raw retention is configured.