Bangla Language Datasets for Sentiment Analysis and NER

A major limitation of current research in Machine Learning and Natural Language Processing (NLP) is that they are focused on few languages, particularly English. In this article, we will talk about resources available in the Bangla language for the NLP tasks of Named Entity Recognition (NER) and Sentiment Analysis. Bangla language (also many sources also call … Read more

AI and ML Datasets to start with – Part 2

Datasets form the basis of the domain of Artificial Intelligence and Machine learning. Over the last few years, model-centric AI has shifted from model-centric to data-centric AI. Supervised learning depends heavily on labeled data (i.e., features with ground-truth labels), where a mapping function is learned between the features and labels. In unsupervised learning, we depend … Read more

Working with Imbalanced Data sets

Theory Imbalanced data sets, in the context of supervised classification problems, refer to the case when the class distribution is highly skewed or disproportionate. Since general supervised learning algorithms assume them to be balanced, they perform accuracy maximization. However, this, in turn, will propagate a model bias and be addressed to some extent, when we … Read more

Datasets to start with – Part 1

In this article, we compile a list of datasets and codebases of recent papers or even diverse domains which I have come across. I usually have stumbled upon them during the literature survey for one of my works and the most recent ones from Twitter mostly. I have covered a list of data challenge competitions … Read more