Elasticsearch in Production, Real-Time Geo-Replication and Custom Tokenization

Name: Elasticsearch in Production, Real-Time Geo-Replication and Custom Tokenization
Start: 2014-10-14T18:00:00-04:00
End: 2014-10-14T20:00:00-04:00
Location: hack/reduce

Boston

Oct 14, 2014, 10:00 PM – Oct 15, 2014, 12:00 AM

0 RSVPs

About this event

Join us for three great talks. As usual, food will be provided. We recommend you show up around 6pm to get a chance to chat and network before the talks begin:
Elasticsearch in production
Elasticsearch easily lets you develop amazing things, and it has gone to great lengths to make Lucene's features readily available in a distributed setting. However, when it comes to running Elasticsearch in production, you still have a fairly complicated system on your hands: a system with high demands on network stability, a huge appetite for memory, and a system that assumes all users are trustworthy. This talk will cover some of the lessons we've learned from securing and herding hundreds of Elasticsearch clusters.
Presenter: Konrad Beiske is a senior software engineer at Found AS, a company whose primary product is a hosted Elasticsearch service. Konrad holds a Master’s Degree in Computer Science, with an emphasis on databases and distributed systems. He has been focusing on Elasticsearch during the past two years. Konrad gives presentations about Elasticsearch and distributed systems at meetups and conferences, and he writes regularly on the Foundation blog.
Implementing Real-Time Geo-Replication with ElasticSearch
ElasticSearch is a phenomenal data store -- its easy approach to scalability using symmetric nodes has dramatically improved the way we operate scalable persistence services. With this huge success, we find that some challenges still remain -- especially in operating ElasticSearch across geographically distant clusters for fault-tolerance and disaster recovery. In this talk, I'd like to share a set of new, open source ElasticSearch plugins I'm building that use the PubNub fault-tolerant global data stream network as a medium for cross-cluster document replication and indexing. This includes a storage event listener for document change propagation and a new ElasticSearch River for indexing. I'd love to get feedback on these use cases and plugin design and implementation and also hear about some of the geo-replication challenges other folks might be facing. I'll have the code up on GitHub with a HOWTO and a downloadable demo bundle that folks can try if they'd like to follow along during the presentation.
Presenter: Sunny Gleason is founder and Cloud Guy at SunnyCloud, a company that provides Cloud, Web &Mobile application development, hosting and operations to businesses in the cloud. He specializes in real-time protocols and scalable persistence solutions. Before all that, Sunny was a Platform Engineer developing Cloud Computing solutions at Ning andAmazon.com.
ElasticSearch Custom Tokenization
We will explore the various techniques available in ElasticSearch / Lucene for tokenizing text, with special emphasis on the benefits of the StandardTokenizer. In addition, we'll walk through a new ElasticSearch plug-in (developed at Traackr) that allows users to customize the default word segmentation rules as implemented by Lucene's StandardTokenizer.
Presenter: Bryan Warner is Lead Engineer at Traackr, an Influencer Management Search Engine and Platform. Amongst other things, Bryan specializes in Traackr's Scala-based API as well as leveraging ElasticSearch's capabilities to drive everything from search to analytics geared at marketing professionals.