Datasets – Medical AI Insights

Working with Imbalanced Datasets: A Practical Guide for ML and Medical AI Researchers

March 29, 2026July 23, 2025 by medical-ai-insights

Why 98% accuracy can mean a useless model — and how to detect, measure, and fix class imbalance with resampling, SMOTE, and cost-sensitive learning. Updated for 2025 — originally published August 2018 Class imbalance is one of the most common real-world problems in machine learning — and it’s especially severe in medical datasets. Here’s a … Read more

The Ultimate Guide to Machine Learning & AI Datasets for 2025

June 24, 2025April 24, 2025 by medical-ai-insights

Back when I first wrote this post in 2022, the AI landscape was a different place. Finding a good dataset felt like a treasure hunt, piecing together links from university pages, old GitHub repos, and forum posts. I started this list to keep track of the gems I stumbled upon during literature reviews or through … Read more

Bengali Datasets in 2025 for Named Entity Recognition and Sentiment Analysis

June 22, 2025March 20, 2025 by medical-ai-insights

(Updated in 2025) A major limitation of current AI research is the overemphasis on English and the under-representation of the Bengali or Bangla language. Despite being the seventh most spoken language worldwide, Bengali remains a low-resource language in the field of Natural Language Processing (NLP). This scarcity of resources poses significant challenges for developing robust … Read more