Top 10 Conference Shared Tasks and Data Competitions for AI Researchers

The fastest path to your first research publication — curated shared tasks for medical AI and NLP, with a 4-month timeline to go from registration to paper.

Updated 2025 — originally published December 2018

Image by StockSnap from Pixabay


Shared tasks are the fastest on-ramp into research for new AI researchers — they provide clean datasets, clear evaluation metrics, and a proceedings paper for every participating team. Here’s a curated guide to the best ones for medical AI and NLP researchers.

I’ve participated in shared tasks as both a competitor and an organizer. From a career perspective, they offer an unusual opportunity: you can publish your first research paper by solving a clearly-defined problem on a provided dataset, without needing to collect data yourself or define the evaluation from scratch.

This guide focuses on competitions directly relevant to medical AI, NLP, and data science — organized by domain.


Table of Contents

  • Why Shared Tasks Are Underused by Industry Researchers
  • How to Evaluate a Task Before Committing
  • Healthcare and Medical AI Tasks
  • NLP and Text Mining Tasks
  • Multimodal and Vision Tasks
  • General ML Competitions
  • How to Write a Strong System Description Paper
  • Timeline: A Shared Task in 4 Months

Why Shared Tasks Are Underused by Industry Researchers

The research community organizes shared tasks primarily for academia — PhD students who need publishing opportunities and research experience. But the structure is equally or more valuable for industry practitioners:

  • You don’t need to source data. The organizers provide it.
  • You don’t need to define evaluation. The metric is given.
  • You have direct comparisons. A leaderboard tells you exactly how good your system is.
  • You get a publication. Most tasks publish system description papers in workshop proceedings.

The system description paper format (4–6 pages) is far more achievable for a practitioner working in limited time than a full research paper requiring novel contributions.


How to Evaluate a Task Before Committing

Before investing time in a shared task:

  1. Is the associated workshop at a respected venue? ACL, EMNLP, NeurIPS, MICCAI workshops are publishable. Random web competitions are not.
  2. What is the publication model? Confirm that system description papers are published in the proceedings.
  3. How many participating teams are expected? 10–30 teams is comfortable. 100+ teams makes it harder to distinguish your system.
  4. What compute does it require? Some tasks require fine-tuning large models; others run on CPU.
  5. Is the timeline realistic for you? Most tasks run for 8–12 weeks. Assess whether you can commit 10+ hours/week during that period.

Healthcare and Medical AI Tasks

1. MEDIQA (Medical QA and NLI)

Host: ACL-BioNLP Workshop
Tasks: Medical question answering, natural language inference, answer summarization
Why enter: Directly relevant to clinical decision support. Strong publication venue. Clean, well-annotated data from medical literature.
Access: https://sites.google.com/view/mediqa2024

2. BioASQ

Host: Annual challenge, associated with CLEF
Tasks: Biomedical semantic indexing, QA from biomedical literature
Why enter: One of the longest-running medical NLP benchmarks. Strong community, large dataset.
Access: bioasq.org

3. ClinIQLink

Host: ACL Workshop
Task: Test whether LLMs give accurate, safe healthcare information
Why enter: Timely — liability of LLMs in healthcare is a pressing research problem
Access: Check recent ACL workshops

4. ArchEHR-QA

Host: Clinical NLP Workshop
Task: Answer patient questions using information from electronic health records
Why enter: Directly addresses a real clinical use case. EHR access is simulated through the provided dataset.

5. CLEF eHealth

Host: CLEF (Conference and Labs of the Evaluation Forum)
Tasks vary yearly; typical tasks include:

  • Multilingual clinical document classification
  • Patient-centered information retrieval
  • Multi-lingual health entity extraction
    Why enter: Long-running, well-organized, covers non-English languages (relevant for Indian researchers)
    Access: clef-initiative.eu

6. ImageCLEF Medical

Host: CLEF
Tasks: Medical image captioning, visual QA, tuberculosis detection
Why enter: One of the only shared tasks combining medical images and text — ideal for multimodal medical AI
Access: imageclef.org

7. TREC Precision Medicine

Host: NIST TREC
Task: Retrieve relevant clinical trials and literature for cancer treatment decisions
Why enter: Real clinical decision support problem. TREC has a long history of high-quality IR evaluation.
Access: trec.nist.gov/data/precmed.html


NLP and Text Mining Tasks

8. SemEval

Host: Annual workshop at ACL or NAACL
Why it’s essential: 10–20 tasks per year covering the breadth of NLP. Every task has a proceedings paper slot. Runs every year, so you can enter repeatedly.
Access: semeval.github.io

Notable recent tasks relevant to medical/health NLP:

  • SemEval Task 10 (2022): Structured Sentiment Analysis (includes Bengali)
  • SemEval Task 4: Propaganda detection
  • SemEval Task 6: Figurative language understanding

9. FIRE (Forum for Information Retrieval Evaluation)

Host: ACM India
Why enter for Indian researchers: Specializes in Indian language NLP and information retrieval. Bengali, Hindi, Tamil datasets regularly available. Proceedings are published.
Access: fire.irsi.res.in


General ML Competitions

10. Kaggle Competitions

Link: kaggle.com/competitions

Not all Kaggle competitions are academically publishable, but several categories are:

  • RSNA (Radiology Society of North America) medical imaging challenges — publishable as medical AI papers
  • Grand Challenge medical image segmentation competitions
  • Jane Street market prediction competitions

Publication angle for Kaggle: Even if the competition itself doesn’t produce a paper, your post-competition analysis — error analysis, ablation study, what the top solutions had in common — can become a conference paper or blog post. Many top Kaggle competitors have turned their winning solutions into publications.


How to Write a Strong System Description Paper

Most shared task system description papers follow this structure:

Title: Our System for [Task Name] at [Workshop] [Year]

Abstract (150 words):
Brief description of your approach and key results.

Introduction (0.5 page):
Task description, your approach in 2 sentences, paper structure.

System Description (1–1.5 pages):
Your method in detail. Include a system diagram if appropriate.

Experimental Setup (0.5 page):
Dataset split used, preprocessing, hyperparameters, compute environment.

Results (0.5–1 page):
Main results table + key ablations. Compare to baseline and top systems.

Analysis (0.5 page):
Error analysis — what types of examples was your system good/bad at? Why?

Conclusion (0.25 page):
One-sentence summary + future work.

The key differentiator between forgettable system papers and memorable ones: the analysis section. Papers that explain why their system works (or doesn’t) are cited; papers that just report numbers are not.


Timeline: A Shared Task in 4 Months

MonthActivity
1Register, download data, run baseline model, understand evaluation metric
2Develop your approach, iterate, track experiments with wandb
3Final system, ablation study, identify your best configuration
4Write system description paper, submit, prepare for presentation

Most shared tasks publish results and accept system description papers within 4–6 weeks of the submission deadline. The full timeline from starting the task to having a published paper is typically 5–7 months.


Shared Tasks can be directly used as your starting problem statement

This is usually referred to as “Data Challenges” or “Data Competitions”. This is really a beautiful development as it from beforehand provides 3 major advantages. 

Main advantages for a researcher

  1. a sound problem statement with significant impact, 
  2. a well-annotated data set (often being benchmarks themselves)
  3. a sound and well-established evaluation criteria

These 3 points, in fact, saves the researcher a great deal of struggle and hardship and help him channel all his/her acumen and energy for developing the methodology and slowly rising up the public leader-board.

Learning Outcomes

I hope it gave pointers to shared tasks and data competitions organized as part of a conference or workshop. Secondly, we also learned how such shared tasks can be used as a good starting point for their research journey.


Discover more from Medical AI Insights

Subscribe to get the latest posts sent to your email.

What is your take on this topic?

Discover more from Medical AI Insights

Subscribe now to keep reading and get access to the full archive.

Continue reading