Top 10 Conference Shared Tasks/ Data competitions

Image by StockSnap from Pixabay

In this post, I am going to explore a quite different and upcoming genre of competitions as well as certain conference workshops.

Recent data competitions at AI and Machine Learning conferences


This covers a vast range of computer science domains like information retrieval, recommender systems, predictive models, time series forecasting, and so on.

  1. SemEval – International Workshop on Semantic Evaluation
  2. Kaggle competitions
  3. Online Reputation Monitoring – RepLab 2014
  4. Digital Text Forensics – CLEF PAN 2019


This is the domain in which my personal research interests lie. From my experience, it also solves the crucial problem of getting sound and comprehensive medical data for conceptualizing a feasible data-driven approach to solve it.


The following three tasks are proposed in 2019:

a caption analysis task;

a tuberculosis task.

a visual question answering task (VQA);

TREC Precision Medicine / Clinical Decision Support Track :

The 2018 track focus on an important use case in clinical decision support: providing useful precision medicine-related information to clinicians treating cancer patients. 

  1. The first set of results represents the retrieval of existing knowledge in the scientific literature,
  2. while the second represents the potential for connecting patients with experimental treatments if existing treatments have been ineffective.
  3. You can find more details on their website.

CLEF e-Health

CLEF eHealth aims to bring together researchers working on related information access topics and provide them with datasets to work with and validate the outcomes. In 2018, the event comprised of 3 tasks :

  1. Multilingual Information Extraction
  2. Technologically Assisted Reviews in Empirical Medicine
  3. Patient-centered information retrieval

The vision for the Lab is two-fold:

  1. To develop tasks that potentially impact patient understanding of medical information 
  2. To provide the community with  sophisticated dataset of clinical narrative, links to evidence-based care guidelines, systematic reviews, in order to advance the state-of-the-art in multilingual information extraction and information retrieval in health care.


It is hosted as a Shared task at the ACL-BioNLP workshop, a reputed publication venue targeting digital health and the core medicine domain. The task in Natural Language Inference (NLI), Recognizing Question Entailment (RQE), and their applications in medical Question Answering (QA). You may join their mailing list.

3. Online Social media

This field has seen a lot of research interest in the last few years.

3.1 WSDM Cup competitions

  1. Fake news classification:
  2. App user retention prediction:


It stands for Fact Extraction and Verification at EMNLP 2018 workshop. Here, the aim of the task is to: evaluate the ability of a system to verify information using evidence from Wikipedia.

  1. Given a factual claim involving one or more entities (resolvable to Wikipedia pages), the system must extract textual evidence (sets of sentences from Wikipedia pages) that support or refute the claim.
  2. Using this evidence, label the claim as SupportedRefuted given the evidence, or NotEnoughInfo (if there isn’t sufficient evidence to either support or refute it).

4. Acoustics

This field has recently drawn attention with a number of individual workshops to promote this field. 

4.1 DCASE 2018

IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events. There are multiple tasks such as :

  1. Acoustic Scene classification
  2. General-purpose audio tagging of Freesound content with AudioSet labels
  3. Bird audio detection
  4. Large-scale weakly labeled semi-supervised sound event detection in domestic environments
  5. Monitoring of domestic activities based on multi-channel acoustics

Shared Tasks can be directly used as your starting problem statement

This is usually referred to as “Data Challenges” or “Data Competitions”. This is really a beautiful development as it from beforehand provides 3 major advantages. 

Main advantages for a researcher

  1. a sound problem statement with significant impact, 
  2. a well-annotated data set (often being benchmarks themselves)
  3. a sound and well-established evaluation criteria

These 3 points, in fact, saves the researcher a great deal of struggle and hardship and help him channel all his/her acumen and energy for developing the methodology and slowly rising up the public leader-board.

This brings us to the end of this article.

I hope it gave pointers to shared tasks and data competitions organized as part of a conference or workshop. Secondly, we also learned how such shared tasks can be used as a good starting point for their research journey.

What is your take on this topic?