Join us for on Thursday, October 24th for snacks, drinks, and a presentation from Wyn Bennett of Simon Data. The talk will cover offline index creation and loading of Elasticsearch indices Hadoop Map Reduce.
At Simon Data we make heavy use of Elasticsearch. Our heavy use resulted in that we needed index hundreds of millions of documents in Elasticsearch multiple times per day on shared clusters without impacting performance. The typical bulk load approach was simply not working. As we were bulk loading out clusters it was affecting the read performance of our cluster and many times completely taking down the cluster. We decided the better approach would be to build the index offline and then use the restore functionality of Elasticsearch to bring the offline index into the live cluster. The result was that our load times down from hours to minutes and allowed for seamless index updates on a live cluster. This talk will break down how we accomplished this and how other teams can do the same.