Providing Metadata Discovery on Large-Volume Data Sets at eBay

Silicon Valley

Feb 8, 2019, 2:00 – 4:00 AM


About this event

Join us for our first meetup of 2019! The agenda for the evening is:

6:00 pm: Doors open
6:30 pm: "Providing Metadata Discovery on Large-Volume Data Sets at eBay"
7:45 pm: We'll wrap things up

Talk Abstract:

Many big data systems collect petabytes of data on a daily basis. Such systems are often designed primarily to query raw data records for a given time range with multiple data filters. However, discovering or identifying unique attributes present in such large datasets can be difficult. Performing runtime aggregations on large datasets, for example, the unique hosts that logged for an application for a particular time range, need high computational power and can be extremely slow. Performing sampling on the raw data is an option for attribute discovery. Such an approach would however also mean that we would miss sparse or rare attributes within large volumes of data.

The metadata store is our internal implementation to provide guaranteed real-time discovery of all the unique attributes (or metadata) within truly massive volumes of different monitoring signals. It primarily relies on Elasticsearch and RocksDB in the backend. Elasticsearch enables aggregations to find unique attributes over a time range, while RocksDB enables us to perform de-duplication of the same hash within a time window to avoid redundant writes.

About our speakers:

Sudeep Kumar has 10+ years of software development experience across the e-commerce, embedded systems, and telecom domains. He is currently working on an Elasticsearch-as-service initiative within eBay's platform group. Sudeep's tech interests lie in solving big-data problems and building scalable and resilient application platforms and frameworks.

Saurabh Mehta is an engineering leader for Telemetry & Monitoring platform at eBay. He currently leads a team of senior engineers working on eBay telemetry & monitoring systems which scales up to petabytes of data and billions of data points per day. He has over 12+ years of experience working in technology space. He is passionate about learning, problem solving, and working in big data and distributed systems.



Friday, February 8, 2019
2:00 AM – 4:00 AM UTC

Contact Us