Top 10 Conference Shared Tasks/ Data competitions

Article Summary

In this post, I am going to explore a quite different and upcoming genre of competitions as well as certain conference workshops. This is usually referred to as “Data Challenges” or “Data Competitions”. This is really a beautiful development as it from beforehand provides 3 major advantages. 

Main advantages for a researcher

  1. a sound problem statement with significant impact, 
  2. a well-annotated data set (often being benchmarks themselves)
  3. a sound and well-established evaluation criteria

These 3 points, in fact, saves the researcher a great deal of struggle and hardship and help him channelize all his/her acumen and energy for developing the methodology and slowly rising up the public leader-board.

Image by StockSnap from Pixabay

If you want to share any more such shared tasks or data competitions, kindly comment below

I will try to cover the entire life-cycle of such data challenge in a separate post. However, I now provide or mention a list of well-known such data competitions based on last and the current year. Some of them have been consistently being organized for 3-4 years while some are pretty new and slowly growing.

Recent data competitions or data challenges

1. Miscellaneous : 

This covers a vast range of computer science domains like information retrieval, recommender systems, predictive models, time series forecasting and so on.

  1. SemEval – International Workshop on Semantic Evaluation
  2. Kaggle competitions
  3. Online Reputation Monitoring – RepLab 2014
  4. Digital Text Forensics – CLEF PAN 2019

2. Health :

This is the domain in which my personal research interests lie. From my experience, it also solves a crucial problem of getting sound and comprehensive medical data for conceptualizing a feasible data-driven approach to solve it.

2.1  ImageCLEF  :

The following three tasks are proposed in 2019:

  1. a caption analysis task;
  2. a tuberculosis task.
  3. a visual question answering task (VQA);

2.2 TREC Precision Medicine / Clinical Decision Support Track :

The 2018 track focus on an important use case in clinical decision support: providing useful precision medicine-related information to clinicians treating cancer patients. 

  1. The first set of results represents the retrieval of existing knowledge in the scientific literature,
  2. while the second represents the potential for connecting patients with experimental treatments if existing treatments have been ineffective.
  3. You can find more details on their website.

2.3 CLEF e-Health

CLEF eHealth aims to bring together researchers working on related information access topics and provide them with datasets to work with and validate the outcomes. In 2018, the event comprised of 3 tasks :

  1. Multilingual Information Extraction
  2. Technologically Assisted Reviews in Empirical Medicine
  3. Patient-centred information retrieval

The vision for the Lab is two-fold:

  1. To develop tasks that potentially impact patient understanding of medical information 
  2. To provide the community with  sophisticated dataset of clinical narrative, links to evidence-based care guidelines, systematic reviews, in order to advance the state-of-the-art in multilingual information extraction and information retrieval in health care.

2.4 MediQA

It is hosted as a Shared task at the ACL-BioNLP workshop, a reputed publication venue targeting digital health and core medicine domain. The task in Natural Language Inference (NLI), Recognizing Question Entailment (RQE), and their applications in medical Question Answering (QA). You may join their mailing list.

3. Online Social media

This field has seen a lot of research of interest in the last few years.

3.1 WSDM Cup competitions

  1. Fake news classification:
  2. App user retention prediction:


It stands for Fact Extraction and Verification at EMNLP 2018 workshop. Here, the aim of the task is to: evaluate the ability of a system to verify information using evidence from Wikipedia.

  1. Given a factual claim involving one or more entities (resolvable to Wikipedia pages), the system must extract textual evidence (sets of sentences from Wikipedia pages) that support or refute the claim.
  2. Using this evidence, label the claim as SupportedRefuted given the evidence or NotEnoughInfo (if there isn’t sufficient evidence to either support or refute it).

4. Acoustics

This field has recently drawn attention with a number of individual workshops to promote this field. 

4.1 DCASE 2018

IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events. There are multiple tasks such as :

  1. Acoustic Scene classification
  2. General-purpose audio tagging of Freesound content with AudioSet labels
  3. Bird audio detection
  4. Large-scale weakly labelled semi-supervised sound event detection in domestic environments
  5. Monitoring of domestic activities based on multi-channel acoustics

If you want to share any more such shared tasks or data competitions, kindly comment below

Similar Posts

What is your take on this topic?