Basis Technology is graciously hosting this Elasticsearch meetup at their Alewife location. Come enjoy some food, refreshments and networking at 6pm before the talk begins.
Topic: An Elasticsearch Plugin for Simple Fuzzy Name Matching
Normalization is crucial to high quality search results -- who wants irrelevant variations between queries and documents leading to missed hits (e.g., “celebrity” v. “celebrities”)? Normalizing dictionary words works, but what if your application focuses on names? Whether you’re tackling log analysis, e-commerce, watch list screening or other applications, names are often the key. Can you find “Abdul Jabbar, Karim” if you search for “Kareem AbdalJabar” or “كريم عبد الجبار”?
Applications using Elasticsearch provide some fuzziness by mixing its built-in edit-distance matching and phonetic analysis with more generic analyzers and filters (see example #1 or #2). We’ve tried to go beyond that to provide both better matching and a simpler integration. We use a custom Mapper and Score Function so that linguistic nuances can be handled behind-the-scenes. We’ll talk about how we built this sort of plug-in for Rosette, its customization, and its connection to broader trend of entity-centric search.
Brian Sawyer joined Basis in 2010. He is an Engineering Manager and the Product Owner of the Rosette Name Indexer (RNI), using Lucene and Lucene-backed search applications to provide name matching solutions. He holds a B.S. in Computer Science and Cognitive Psychology from Northeastern University.
Chris Mack is the Director of Customer Engineering for text analytics at Basis Technology. Chris's team designs solutions and delivers services to adapt text analytic components for a broad range of customer problems. Chris has spent the last 20 years in software development, data analytics, business strategy, and business operations. Chris received his BS in Management from Bentley University where he also studied Computer Information Systems.