Suneeta Mall

Rambling of a curious engineer & data scientist

The 3-R's of Data-Science - Repeatability, Reproducibility, and Replicability

May 2019

Reproducibility crisis is real in data-science. This crisis has been recognized in data-science research and several efforts e.g. International Conference on Learning Representations has been underway in improving reproducibility in data science research. Dr. Joelle Pineau, an Associate Professor at McGill University and lead for Facebook’s Artificial Intelligence Research lab (FAIR) in her talk covered the importance of reproducibility.

Industry adoptation of data-science in last 5 years have been phenomenon. As per KDnuggets, machine-learning/data-science surpassed big data frenzy in 2017! According to a recent survey in UK (year 2016), 84% of startups primarily focussed on data-science. Whats more interesting is that more than half of these companies preferred to build, train and use their own models as oppose to sourcing it from elsewhere. According to Evolving Data Infrastructure - Ben Lorica and Paco Nathan (O’Reilly, Oct 2018), 58% of industries were seriously building data science based solutions, with only 14% indicating no involvement data or data-science just yet.

Big Data Vs Machine Learning

The 3-R’s of Data-Science - Repeatability, Reproducibility, and Replicability is YOW Data 2019 (Sydney) talk presented in May 2019. In this talk, I covered the 3-R’s Repeatability, Reproducibility, and Replicability and tools and techniques avaialable to practice reproducible data-science. Slides can be accessed on this link.

So here I am at YOW Data 2019 (Sydney)!