Every day we are overloaded with more data than we know what to do with. Most data is prepared and stored by competent data pipeline engineers to make the data useful. In many cases, this is proprietary data that belongs to one entity or another. But, there is more data out there. You just have to know where to find it and how to process it.
This presentation focuses on the United States Mortality data set provided by the Centers for Disease Control. The purpose of the project this presentation is centered around has to do with decoding the near 250GB of data provided by this data set.
The personal angle of this project is to look for changes in patterns of death and disease possibly influenced by changes in our environment or food supply. The data access system provided by the CDC is, in my view, inadequate for the deep level of research needed to prove or disprove my theory that these changes have happened, continue to happen, and do have an impact on everyone.
As mentioned above, the CDC data access system is inadequate for my needs; Elasticsearch and Kibana fit that role perfectly. Unfortunately, the data provided from the CDC FTP is in an archaic tape format that has to be decoded before I can use it for my research. This presentation goes into more detail about what I am trying to do, how I did it, and why I chose Perl, Elasticsearch, and Kibana for this research.
Presentation Outline and Notes:
Bryan Vest is an Information Technology veteran with over twenty years of experience across many aspects of the field.Starting at a small computer store repairing and building personal computers, Bryan worked his way through the intricacies of Information Technology, ending up working on big data projects as a support engineer helping others stabilize and optimize their big data platforms.Using t…