The Ultimate Guide to Machine Learning & AI Datasets for 2025

Back when I first wrote this post in 2022, the AI landscape was a different place. Finding a good dataset felt like a treasure hunt, piecing together links from university pages, old GitHub repos, and forum posts. I started this list to keep track of the gems I stumbled upon during literature reviews or through … Read more

Bengali Datasets in 2025 for Named Entity Recognition and Sentiment Analysis

(Updated in 2025) A major limitation of current AI research is the overemphasis on English and the under-representation of the Bengali or Bangla language. Despite being the seventh most spoken language worldwide, Bengali remains a low-resource language in the field of Natural Language Processing (NLP). This scarcity of resources poses significant challenges for developing robust … Read more

AI and ML Datasets to start with – Part 2

Datasets form the basis of the domain of Artificial Intelligence and Machine learning. Over the last few years, model-centric AI has shifted from model-centric to data-centric AI. Supervised learning depends heavily on labeled data (i.e., features with ground-truth labels), where a mapping function is learned between the features and labels. In unsupervised learning, we depend … Read more

Working with Imbalanced Data sets

Theory Imbalanced data sets, in the context of supervised classification problems, refer to the case when the class distribution is highly skewed or disproportionate. Since general supervised learning algorithms assume them to be balanced, they perform accuracy maximization. However, this, in turn, will propagate a model bias and be addressed to some extent, when we … Read more