Elasticsearch at Stack Overflow

Stack Exchange - 110 William St New York New York City
Tue, Oct 22, 2019, 6:00 PM (EDT)

74 RSVP'ed

About this event

Heya all!

As we kick into fall, we are super excited to announce our next Meetup at Stack Overflow. We welcome Andrew Montaleni, co-founder and CTO of Parse.ly, as our guest speaker. Andrew will be giving a talk titled: Improving High-Cardinality Analytics with Index-Stored HyperLogLogs

From the Elastic side, Peter Soderberg, Elastic Solutions Architect, will be speaking about the first open source APM: Elastic APM.

NB: Stack Overflow is located on the 28th floor.

**BONUS: We will be giving away a free ticket to the Elastic{ON} Tour - NY at this event.** (https://www.elastic.co/elasticon/tour/new-york)

----

Talk 1: Improving High-Cardinality Analytics with Index-Stored HyperLogLogs

Abstract:
One of Elasticsearch's most powerful analytics features is the "cardinality" aggregation, which can do blazing-fast distinct counts across millions of high-cardinality document values by leveraging the probabilistic data structure, HyperLogLog++ (aka HLL). In this presentation, Andrew Montalenti, the co-founder and CTO of Parse.ly, will discuss the open source work his team has done to bring "index-stored HLLs" to Elasticsearch. We'll begin with a discussion of what HLL is and how it works -- in other words, how cardinality aggregations work, under-the-hood. Then we'll discuss why massive cost and performance savings (on the order of 10x) can come from index-storage of serialized HLLs, with the key trade-off being individual value searchability. Finally, we'll showcase how a small team of (primarily Python) programmers managed to get its head around Elasticsearch's Java codebase in order to build a new custom aggregation and a new index type to support this use case, and why we're releasing this functionality as open source. We'll close with a discussion of the broader effort for index-stored data sketches in ES, which, in the future, might include the percentiles aggregation and its TDigest data structure.

Bio:
Andrew Montalenti (@amontalenti) is the co-founder and CTO of Parse.ly (https://parse.ly), the creator of the top audience analytics system for content teams. Parse.ly tech is leveraged by top sites like Arstechnica, Bloomberg, and The Wall Street Journal, empowering content pros with real-time & historical analytics over all their web assets, in every digital channel. Andrew has over a decade of experience in finance, high tech, and online media, and earned his Computer Science degree at NYU. As a dedicated Pythonista, JavaScript hacker, and open source advocate, he works at the intersection of large-scale distributed systems, real-time measurement, and content analysis technologies. Relevant to Elasticsearch users, he is the author of "Lucene: The Good Parts", and was a presenter at Elastic{ON} in 2015, on "Web Content Analytics at Scale". He has also presented at PyData, PyCon, and several other technology conferences.

Talk 2: Elastic APM

Abstract:
Over the last few years, Elastic has focused on optimizing the Elastic Stack for the use cases adopted by our community. This talk will focus on Elastic APM - the first open source APM solution. We'll explore instrumenting an application, the data model for metrics and logs, distributed tracing, and real user monitoring--the tools that enable realtime monitoring and intuitive troubleshooting.

Bio:
Peter Soderberg is a Solutions Architect at Elastic based in Brooklyn, NY. He's spend years gleaning insights from diverse data, using tools like Elasticsearch and the Hadoop ecosystem.

---

Looking forward to seeing everyone in a few weeks!

PS: For those who missed our July Meetup at Vimeo, you can find the recording here: vim.io/2XLxPeR .

Kind regards,
Danielle

When

Tuesday, Oct 22
6:00 PM - 8:00 PM (EDT)

Where

Stack Exchange
110 William St New York

Organizers