Data Mastering at Scale with AI and Elasticsearch


Mar 10, 2021, 8:30 – 9:30 AM


About this event

This meetup is hosted in English.

Check event time in your local timezone:

Data Mastering at Scale with AI and Elasticsearch

Speaker: Sonal Goyal

Sonal is the founder at, anenterprise startup working in the areas of data unification and consolidation. Aficx enables enterprises to break data silos and build holistic views of customers, vendors and accounts for analytics, deduplication, AML and data quality. Sonal is a program committee member of Strata Data and AI and a repeat speaker at Spark Summit and Strata.

speech abstract: Disparate data sources are a big hurdle to enterprise sales and marketing, supply chain optimization, compliance and risk modeling. Lack of a trusted unified view of customers, suppliers, products and parts affects personalization, analytics, cross selling, recommendations, spend optimization and other core business functions. Unifying this data is challenging, as there are schema and record level variations like typos, missing fields, abbreviations, etc. The scale of data and the variety of systems and formats makes it a tough problem to solve. In this talk, we describe how we are leveraging Elasticsearch and ML over Spark to provide a unified view of mastered customer, suppliers, supplies and other entities. Elastic is a core part of our application, enabling quick access and discovery of mastered records at scale.

Smart analysis using Elasticsearch

Speaker: Vivek Pemawat

Vivek work at Cloudera, Infrastructure and Tools teams as Staff Software Engineer leading multiple projects related to Infrastructure and tooling like Logging and Monitoring infra , Log analysis tool and Workflow Platform. Vivek has experience in Infrastructure and tooling for more then 10 years.

Speech abstract: In this talk Vivek is going to present how Cloudera used Elasticsearch to build a Log analysis tool to automate the process of manual debugging of System test and E2E runs running at scale and make life of Engineers easy.

Automatically jira is suggested and RCA (root cause analysis) of failures is done by tool with TBs of logs ingested daily basis.

We will cover these topics:

- How we build and ease the process of finding RCA

- Ingest and store logs in Elasticsearch for ST and E2E runs.

- Learn from data and suggest issue root cause for failure analysis

- View Sliced logs using Elasticsearch and Kibana

- Deployment story of Elasticsearch stack on OCP

- Streaming of logs for all microservices running on OCP

Contact Us