How to Get Started with Research in AI for Medicine: A Comprehensive Guide

With the continuous stream of LinkedIn updates and new preprints related to AI in medicine, it is easy and understandable to feel overwhelmed about where to start as a researcher.

In this article, I will give suggestions that focus on the “AI for Medicine” research area. This has been my research area during my Ph.D, and I have experience in publishing a few papers in computer science (CS) conferences. More importantly, I’ve learned from my mistakes, celebrated small wins, and navigated the unique challenges that come with working at the intersection of AI and healthcare.

Photo by Marcelo Leal on Unsplash

Getting Started with AI/ML Research in General

Before diving into the medical AI specifics, let’s establish a foundation. Previously, I had created a video explaining the typical lifecycle of building an AI and ML research project, with the ultimate goal of publishing research.

Lifecycle of an AI or ML Research Paper

Life cycle of an AI or ML Research Paper: Watch from 15 mins 6 secs onwards

The research lifecycle I’ve observed typically follows this pattern:

  1. Problem Identification: What clinical challenge are you addressing?
  2. Literature Review: What has been tried before, and where are the gaps?
  3. Data Collection/Preparation: This is often 70% of your work in medical AI
  4. Method Development: Adapting or creating AI approaches for your specific problem
  5. Evaluation: Beyond accuracy — does it actually help clinicians?
  6. Validation: Testing in real-world or clinically relevant scenarios. In academic papers, we perform human evaluation on a random subset of datapoints.
  7. Publication: Sharing your findings with the community
Typical AI research lifecycle in academia (proposed by me)

Thus, I aim to share my learnings from the past five years, so that they may help anyone interested in starting work in this exciting and emerging area.

What Makes AI for Medicine Research Unique?

The uniqueness of this research area arises from several factors that I wish someone had explained to me when I started:

1. Highly Interdisciplinary Nature

The problem statement, the design objective, the nature of input, and the desired output require significant involvement with domain (medical) experts and stakeholders. You’re not just building an AI system — you’re building a bridge between two worlds that often speak different languages.

What this means in practice:

  • You’ll spend considerable time learning medical terminology and clinical workflows
  • Your success depends heavily on establishing strong collaborations with healthcare professionals
  • You need to understand not just what the data says, but what it means in a clinical context

2. Beyond Benchmark Performance

Simple metrics like accuracy are not enough; improvement in clinical utility needs to be proved. Here, it is not enough to show improvement in benchmark datasets. Typically, human evaluation by medical experts is required to show that the performance improvement in terms of metrics also leads to improvement in clinical utility. This is because only a few of these AI models translate into clinical practice.

The reality check:

  • A 95% accuracy model might be clinically useless if it fails on the 5% of cases that matter most
  • You need to understand concepts like sensitivity, specificity, positive/negative predictive values in clinical contexts
  • Statistical significance doesn’t always translate to clinical significance
Number of studies published according to their level of readiness and year of publication on AI in the ICU. Source: van de Sande, D., van Genderen, M.E., Huiskens, J. et al. Moving from bytes to bedside: a systematic review on the use of artificial intelligence in the intensive care unit. Intensive Care Med 47, 750–760 (2021)

3. Regulatory and Ethical Considerations

Unlike other AI applications, medical AI systems often need to meet regulatory standards (FDA approval, CE marking, etc.) and address complex ethical considerations around patient privacy, algorithmic bias, and clinical decision-making.

4. Data Challenges

Medical data comes with unique challenges:

  • Privacy constraints: HIPAA, GDPR, and other regulations limit data sharing
  • Data quality issues: Missing values, inconsistent formats, human errors in documentation
  • Temporal dependencies: Patient conditions evolve over time
  • Multi-modal complexity: Text, images, time-series, and structured data all need to work together

Expert articles worth reading!

Curtis Langlotz: Dr. Langlotz is Professor of Radiology, Medicine, and Biomedical Data Science and Senior Associate Vice Provost for Research at Stanford University. His laboratory investigates the use of deep neural networks and other machine learning technologies to detect disease and eliminate diagnostic errors through analysis of medical images and clinical notes. (Quoted from Stanford profile https://profiles.stanford.edu/curtis-langlotz)

Getting Started in Medical AI: A Guide to Online Learning Resources for Clinicians

Your Roadmap to Getting Started

Step 1: Build Your Foundation and Network

Target Top-tier Conferences: Focus on AI and NLP conferences with strong “NLP Applications” or “AI for Healthcare” tracks. These venues value both methodological contributions and real-world impact. You may go through my Google Scholar profile to get a list of relevant conferences.

Essential conferences to follow:

  • NeurIPS (Machine Learning for Healthcare workshop)
  • ICML (Healthcare track)
  • ACL/EMNLP (BioNLP workshops)
  • AAAI (AI for Social Good)

Follow These Research Leaders and Labs:

  1. van der Schaar lab (University of Cambridge)
  • Research focus: Time-series analysis, Synthetic data generation
  • Why follow: Pioneering work in causal inference for healthcare
  • Link: https://www.vanderschaar-lab.com/

2. Zitnik Lab (Harvard)

  • Research focus: Graph neural networks, Multimodal learning, Biological foundation models
  • Why follow: Cutting-edge work in drug discovery and biological understanding
  • Link: https://zitniklab.hms.harvard.edu/

3. Boussard Lab (Stanford Medicine)

4. Butte Lab (UCSF)

5. Rajpurkar Lab (Harvard)

6. Johns Hopkins University:

7. Data Science and Text-based Information Systems (DATEXIS) of Berliner Hochschule für Technik, Germany www.datexis.com

8. CAIMed — Lower Saxony Center for AI & Causal Methods in Medicine, Germany https://caimed.de/en/

9. Biomedical Data Science Lab of ETH Zurich, Switzerland (https://bmds.ethz.ch/), Prof. Michael Moor (https://michaelmoor.me/) of ETH Zurich

10. Prof. Mary-Anne Hartley of Laboratory for Intelligent Global Health & Humanitarian Technologies at EPFL, Switzerland (https://www.light-laboratory.org/) — created one of the popular open-source medical LLMs called MEDITRON

11. Vector Institute, Toronto (https://vectorinstitute.ai/research/health-research/)

Step 2: Understand the Conference Landscape

Listen to cutting-edge research talks and symposiums

RAISE Health Symposium from Stanford HAI and Stanford Medicine: https://med.stanford.edu/raisehealth/events/symposium.html

Stanford Medicine YouTube channel (https://www.youtube.com/@StanfordMedicine)

Free newsletters:

Doctor Penguin (https://doctorpenguin.com/about)

Healthcare AI Guy Weekly (https://www.linkedin.com/newsletters/healthcare-ai-guy-weekly-7229262910038888448/)

Healthcare-Focused Venues:

  • IEEE ICHI (International Conference on Healthcare Informatics)
  • CHIL (Conference on Health, Inference, and Learning)
  • ACM Transactions on Computing for Healthcare

Essential Workshops:

  • ACL BioNLP: Natural language processing for biomedical texts
  • ML4H: Machine Learning for Health (NeurIPS workshop)
  • ICLR AI4PH: AI for Public Health

Step 3: Master the Technical Infrastructure

Data and Dataset Considerations: Before diving into any project, ask yourself:

  • Are relevant datasets available and accessible?
  • Do you understand the data collection process and potential biases?
  • Are there benchmark datasets for comparison?

Essential Tools for Medical Text Processing: Structured Data Extraction from Unstructured Medical Text

MetaMap: Maps text to UMLS concepts

QuickUMLS: Faster alternative to MetaMap

SemRep: Extracts semantic relationships

Python Packages for Medical NLP:

  • clinspacy: Clinical text processing
  • medspacy: Medical-specific spaCy extensions
  • scispacy: Scientific and biomedical text processing

Additional Technical Resources:

  • FHIR (Fast Healthcare Interoperability Resources): Understanding healthcare data standards
  • OHDSI (Observational Health Data Sciences and Informatics): Standardized healthcare databases
  • Hugging Face Medical Models: Pre-trained models for medical text

Step 4: Assess Your Resources and Skills

Resource Management Questions:

  • Do you have enough computing resources to feasibly experiment with various scenarios?
  • Can you access cloud computing platforms if needed?
  • Do you have partnerships with institutions that have medical data?

Technical Skills Inventory:

  • Coding Knowledge: How easily can you learn new tools, understand and debug codebases, and identify gaps in papers?
  • Domain Knowledge: Do you understand basic medical terminology and clinical workflows?
  • Statistical Skills: Can you properly evaluate clinical relevance beyond standard ML metrics?

Critical Thinking Skills:

  • Error Categorization: Can you systematically analyze where and why your model fails?
  • Literature Review: Are you conversant with the relevant literature to propose feasible solutions?
  • Clinical Intuition: Can you explain your model’s decisions to healthcare professionals?

Common Pitfalls and How to Avoid Them

1. The “Accuracy Trap”

Many researchers focus solely on improving accuracy without considering clinical relevance. A model that’s 99% accurate in detecting common conditions but misses rare but serious diseases isn’t clinically useful.

Solution: Always collaborate with clinicians to understand what types of errors matter most.

2. Data Leakage in Temporal Settings

Medical data often has temporal dependencies that can lead to data leakage if not handled properly.

Solution: Always use proper temporal splits and understand the clinical timeline.

3. Overfitting to Single Institutions

Models trained on data from one hospital often fail to generalize to other institutions due to different practices, populations, and equipment.

Solution: Seek multi-site validation whenever possible.

4. Ignoring Regulatory Requirements

Focusing purely on technical performance without considering regulatory pathways can limit real-world impact.

Future Trends to Watch

1. Foundation Models in Healthcare

Large language models and vision models specifically trained for medical applications are emerging rapidly.

2. Federated Learning

Enabling model training across institutions without sharing sensitive data.

3. Explainable AI for Healthcare

Making AI decisions interpretable for clinical use.

4. Digital Therapeutics

AI systems that directly provide therapeutic interventions.

5. Personalized Medicine

AI approaches for tailoring treatments to individual patients.

Final Words and Personal Reflections

In this article, we’ve covered the comprehensive landscape of getting started with research in AI for Medicine. The journey isn’t always smooth — you’ll face rejections, encounter datasets that don’t behave as expected, and grapple with the complexity of healthcare systems. But the potential impact makes every challenge worthwhile.

My key takeaways after five years in this field:

  1. Patience is crucial: Medical AI research takes longer than traditional AI research due to the need for clinical validation
  2. Collaboration is everything: Your medical collaborators are your most valuable asset
  3. Start small: Begin with well-defined problems before tackling broader challenges
  4. Think beyond the paper: Consider the real-world pathway for your research from day one

If you want me to cover specific topics in more detail, such as regulatory pathways, specific technical approaches, or career advice for medical AI researchers, please let me know in the comments or email me.

To learn more about my research portfolio and connect with me, please visit https://roysoumya.github.io/

Thank you for taking the time to go through this comprehensive guide. I hope it serves as a valuable roadmap for your journey in AI for Medicine research. Remember, every expert was once a beginner, and every breakthrough started with someone asking the right questions.

Breathe and build. — My motto in life


Discover more from Medical AI Insights & Guides | Datanalytics101

Subscribe to get the latest posts sent to your email.

What is your take on this topic?

Discover more from Medical AI Insights & Guides | Datanalytics101

Subscribe now to keep reading and get access to the full archive.

Continue reading