ChatGPT vs Me: As a Children's Authors

January 7, 2023

Machine-learning, AI, Data-science, Book, LLM

For all the right reasons, large language models have taken the world by storm! It’s pretty impressive what ChatGPT can do - unprecedented coherence, relevance and tone of delivery for a synthetic text generation. Well done OpenAI!

Review and comparison of two manifold learning algorithms: t-SNE and UMAP

June 9, 2022

Machine-learning, AI, Data-science, Data, Data-Centric-AI, t-SNE, UMAP, PyTorch

What are manifold learning algorithms? What is t-SNE and UMAP? What are the differences between t-SNE and UMAP? How can I explore the features of a dataset using t-SNE and UMAP? These are my notes from my recent exploration into t-SNE and UMAP and trying to apply them to a multi-label dataset to understand the abilities and limits of these algorithms.

Confident Learning: Label errors are imperative! So what can you do?

May 16, 2022

Machine-learning, AI, Deep-Learning, Data, Data-Centric-AI, Confident-Learning, PyTorch

What makes deep-learning so great, despite what you may have heard, is data! There is an old saying, that sums it up pretty well:

Pydra - Pydantic and Hydra for configuration management of model training experiments

March 15, 2022

Machine-learning, AI, Software

How do you manage the configurations of your model training experiments? Having good configuration management in place improves the user experience and simplifies experiment management. Some of the indirect advantages of having a good config solution are clean, reliable, and simple application code. This is true for any software application however some applications demand higher investment on the config piece than others. One key differentiator perhaps here is how many config fields one has and how each of these relates to the other. This can get real messy real quick with deep learning experiments where the list of config and hyperparameters can rapidly grow out of hand as soon as one breaks out of adhoc experimentations.

Review of recent advances in dealing with data size challenges in Deep Learning

December 31, 2021

Machine-learning, AI, Deep-Learning, Data, Data-Centric-AI

The energy and excitement in machine learning and deep learning communities are infectious these days. So many groundbreaking advances are happening in this area but I have often found myself wondering why the only thing that makes it all shine - yes I am talking about the dark horse of deep learning the data is so underappreciated. The last few years of DL research have given me much joy and excitement and I carry hope now that going forward we can see some exciting progress in this space that explore advances in deep learning in conjunction with data! In this article, I summarise some of the recent developments in the deep learning space that I have been blown away by.

WTH! Who killed my pod - Whodunit?

March 14, 2021

Kubernetes, OOM

A few days ago, I deployed a brand new application onto a self-managed Kubernetes cluster (hereafter referred to as Kube). Suffice to say, all hell broke loose. The pods were getting OOMKilled with error code 137 left and right!

End-to-end reproducible Machine Learning pipelines on Kubernetes

December 23, 2019

Machine-learning, AI, Kubernetes, Reproducible-ml

This is Part 3 - End-to-end reproducible Machine Learning pipelines on Kubernetes of technical blog series titled Reproducibility in Machine Learning. Part 1 & Part 2 can be found here & here respectively.

Realizing reproducible Machine Learning - with Tensorflow

December 22, 2019

Machine-learning, AI, Reproducible-ml

This is Part 2 - Realizing reproducible Machine Learning - with Tensorflow of technical blog series titled Reproducibility in Machine Learning. Part 1 & Part 3 can be found here & here respectively.

Reproducibility in Machine Learning - Research and Industry

December 21, 2019

Machine-learning, AI, Reproducible-ml

This is Part 1 - Reproducibility in Machine Learning - Research and Industry of technical blog series titled Reproducibility in Machine Learning. Part 2 & Part 3 can be found here & here respectively.

Reproducibility in Machine Learning blog series

December 20, 2019

Machine-learning, AI, Kubernetes, Reproducible-ml

This technical blog series titled “Reproducibility in Machine Learning” is going to be divided into three parts:

Reproducibility in Machine Learning - Research and Industry
Realizing reproducible Machine Learning - with Tensorflow
End-to-end reproducible Machine Learning pipelines on Kubernetes

Suneeta Mall

Rambling of a curious engineer & data scientist

ChatGPT vs Me: As a Children's Authors

Review and comparison of two manifold learning algorithms: t-SNE and UMAP

Confident Learning: Label errors are imperative! So what can you do?

Pydra - Pydantic and Hydra for configuration management of model training experiments

Review of recent advances in dealing with data size challenges in Deep Learning

WTH! Who killed my pod - Whodunit?

End-to-end reproducible Machine Learning pipelines on Kubernetes

Realizing reproducible Machine Learning - with Tensorflow

Reproducibility in Machine Learning - Research and Industry

Reproducibility in Machine Learning blog series