Elasticsearch Boston NLP/ML Hackday!

Name: Elasticsearch Boston NLP/ML Hackday!
Start: 2013-03-16T09:00:00-04:00
End: 2013-03-16T21:00:00-04:00
Location: hack/reduce
Boston

Mar 16, 2013, 1:00 PM – Mar 17, 2013, 1:00 AM
0 RSVPs
About this event

Please, RSVP at no cost at http://elasticsearchbostonnlphackday.eventbrite.com/not through meetup.com.
We are changing the format for this meetup: let’s do a full hackday where NLP/ML is the topic and folks get to work hands-on with Elasticsearch, NLP/ML libraries and unstructured data sets. It will be very open ended and all about seeing what type of applications or insights people can achieve with these tools in that short time frame. And it will be free.
Data
Traackr will provide a subset of influencer data (mainly articles), probably in the form of a MongoDB instance as well as possibly give access to their beta API currently under works. Embedly will be participating in the ML/NLP hackday, offering tips and support. Embedly can take any link and return important features associated with that URL, including the title, description, and images on a given page. The Embedly API includes useful NLP tools like entity extraction, keyword extraction, related articles, and text extraction. These tools can help build powerful apps and analytics with social data. For example, the API alongside Twitter data expands the tweets to include relevant data associated with links, offering deeper insights beyond the tweets. Use the API in your analysis pipeline to clean data, prepare it, or to build feature vectors. We're excited to see how you use the API, and more than happy to answer any questions. We will also provide pre-populated Elasticsearch indices with various datasets from Twitter, Wikipedia, etc. Details to follow. Other datasets are also welcome, so please get in touch and let us know if you’d like to provide something. We can also work with you to pre-download some data on the cluster ahead of time so that we don’t wait the day of to do it.  
Venue/Cluster
Hack/Reduce will be hosting us! They will also provide us with a 10 machine cluster, pre-loaded with Hadoop. We’ll install Elasticsearch on it as well as the pre-populated indices mentioned above.
Format
Date: Saturday March 16th, 9am to 9pm Kick off at 9am with a couple of short presentations to get us warmed up (see presentation info below) Following the presentations, people will form their teams (or can work individually if they wish to). We will invite folks to come up to the mic and talk about their ideas to attract potential team members or express their interest in a certain category of problems to solve so that others can recruit them. We have seen this format work well in the past. Also, the cluster information will be posted at that time. After teams have formed, we simply code all day Finish coding by 7:30pm 7:30pm to 9pm: teams show off what they managed to build 9pm: crowd votes for the best idea (by a round of applause). Top three winners will get one of these prizes: Lucene in Action book Mining the Social Web book Programming Collective Intelligence book $100 for best use of the Embed.ly API  
Prerequisites
All attendees:
Git and git client (to download or share code) A GitHub account Text editor or IDE of choice If you'll work with Java:
Java 6 or later Maven 3.x If you'll work with Python (very likely you'll want to because of the libs):
Python 2.7.x (not 3.x) Python libraries: nltk, scikit-learn, pybrain, gensim, pymongo Optional:
It's recommended that you download and play with Elasticsearch locally if only to get familiar with the basic commands.  
Food & Drinks
Breakfast, lunch and dinner will be provided Food sponsors: Traackr, Elasticsearch, Embed.ly
Morning Presentations
Presenter: Kawandeep Virdee, Engineer @ Embed.ly
Title: Machine Learning and NLP for Startups
Abstract: The rise of social media APIs is an opportunity for artists, hackers, and entrepreneurs to create value through machine learning and natural language processing.  Such growing, active, datasets are fuel for startups- be it making life more convenient, telling stories, or solving some of the worlds big problems. To get the ideas flowing before the ML/NLP hackday, I'll discuss companies built on machine learning insights, as well as Embedly's new NLP features.  Finally, available frameworks, and resources will be presented to get you started.
Presenter: Igor Motov Software Developer @ Elasticsearch
Title: Elasticsearch Customization using plug-ins
Abstract: Plug-in support is one of really important features of Elasticsearch. Powerful plug-in infrastructure together with dependency injection framework makes it incredibly easy to customize many aspects of Elasticsearch behavior and extend its functionality. This talk will cover a typical plug-in structure and loading process. We will also talk about basic Elasticsearch architecture, how plug-ins fit into it, as well as outline steps that are required to build an Elasticsearch plug-in. Code examples of several common plug-in types will be shown and discussed.