Building More Reliable Data Pipelines for Nearmap Deep Learning Models: An Evolutionary Case Study

July 2021

I spoke about our recent experience working with ML pipelines at Kafka Summit 2021. The topic was Building More Reliable Data Pipelines for Nearmap's Deep Learning Models: An Evolutionary Case Study link.

Continual learning using a continually evolving dataset is the norm for the AI team at Nearmap. We have had a software system & data pipelines to facilitate the management of this ever-growing dataset in place for several years of operation. During that time, both our needs & the system have evolved – we improvised and learned from early limitations & challenges.

One of the biggest challenges of MLOps is building data systems right! Reliable, Fault-tolerant, & continually flowing pipelines are the foundation, with necessary additional capabilities for data quality control, reconciliations, & lineage/tracking.

Based on our learnings, we have rebuilt a new generation of our system (based on Kafka) with one aim – the much-discussed “operation vacation”. The aim is to facilitate full automation and zero manual intervention of the system.

In this session, we will go into details of the challenges we encountered, the lessons we learned, what we improved, and lastly; are we on vacation yet?

Slides can be found here. Much to my disappointment, the recording is not available:

« KubeCon NA 2021 - Who Killed My Pod? #Whodunit

AWS Community Day 2020 - Nearmap AI: The story of data curation to power deep learning »

Suneeta Mall

Rambling of a curious engineer & data scientist

Building More Reliable Data Pipelines for Nearmap Deep Learning Models: An Evolutionary Case Study