Strata 2016: Docker for Data Scientists

May 11, 2016 Michelangelo D'Agostino

In March, I got the chance to speak at O’Reilly’s Strata+Hadoop World 2016 about how we use Docker to power our data science work at Civis. Now that we have the video of the talk, we thought this would be a great time to share it.

As data scientists, we inhabit an ever-changing landscape of languages, packages, and frameworks. Given that it seems like something new pops up every day, it can be easy to succumb to tool fatigue. If this sounds familiar, you may have missed the increasing popularity of Linux containers in the DevOps world, in particular the rise of Docker. In the talk, I showcase Docker’s many benefits to the data scientist, from making data science code and environments more portable and shareable to making the transition from development to production more seamless to giving data scientists a common basis for collaborating with software engineers.

I go through a total beginner’s tutorial on containers, on Docker, and on the Docker tool ecosystem using a “real-life” end-to-end data science example. Well, sort of real-life, if you’re interested in training a deep neural network to distinguish photos of pugs from photos of golden retrievers. You can find the slides here, and all of the code is up on GitHub. Enjoy!

