With the continuous stream of LinkedIn updates and new preprints related to AI in medicine, it is easy and understandable to feel overwhelmed about where to start as a researcher.
In this article, I will give suggestions that focus on the “AI for Medicine” research area. This has been my research area during my Ph.D, and I have experience in publishing a few papers in computer science (CS) conferences. More importantly, I’ve learned from my mistakes, celebrated small wins, and navigated the unique challenges that come with working at the intersection of AI and healthcare.

Getting Started with AI/ML Research in General
Before diving into the medical AI specifics, let’s establish a foundation. Previously, I had created a video explaining the typical lifecycle of building an AI and ML research project, with the ultimate goal of publishing research.
Lifecycle of an AI or ML Research Paper
Life cycle of an AI or ML Research Paper: Watch from 15 mins 6 secs onwards
The research lifecycle I’ve observed typically follows this pattern:
- Problem Identification: What clinical challenge are you addressing?
- Literature Review: What has been tried before, and where are the gaps?
- Data Collection/Preparation: This is often 70% of your work in medical AI
- Method Development: Adapting or creating AI approaches for your specific problem
- Evaluation: Beyond accuracy — does it actually help clinicians?
- Validation: Testing in real-world or clinically relevant scenarios. In academic papers, we perform human evaluation on a random subset of datapoints.
- Publication: Sharing your findings with the community

Thus, I aim to share my learnings from the past five years, so that they may help anyone interested in starting work in this exciting and emerging area.
What Makes AI for Medicine Research Unique?
The uniqueness of this research area arises from several factors that I wish someone had explained to me when I started:
1. Highly Interdisciplinary Nature
The problem statement, the design objective, the nature of input, and the desired output require significant involvement with domain (medical) experts and stakeholders. You’re not just building an AI system — you’re building a bridge between two worlds that often speak different languages.
What this means in practice:
- You’ll spend considerable time learning medical terminology and clinical workflows
- Your success depends heavily on establishing strong collaborations with healthcare professionals
- You need to understand not just what the data says, but what it means in a clinical context
2. Beyond Benchmark Performance
Simple metrics like accuracy are not enough; improvement in clinical utility needs to be proved. Here, it is not enough to show improvement in benchmark datasets. Typically, human evaluation by medical experts is required to show that the performance improvement in terms of metrics also leads to improvement in clinical utility. This is because only a few of these AI models translate into clinical practice.
The reality check:
- A 95% accuracy model might be clinically useless if it fails on the 5% of cases that matter most
- You need to understand concepts like sensitivity, specificity, positive/negative predictive values in clinical contexts
- Statistical significance doesn’t always translate to clinical significance

3. Regulatory and Ethical Considerations
Unlike other AI applications, medical AI systems often need to meet regulatory standards (FDA approval, CE marking, etc.) and address complex ethical considerations around patient privacy, algorithmic bias, and clinical decision-making.
4. Data Challenges
Medical data comes with unique challenges:
- Privacy constraints: HIPAA, GDPR, and other regulations limit data sharing
- Data quality issues: Missing values, inconsistent formats, human errors in documentation
- Temporal dependencies: Patient conditions evolve over time
- Multi-modal complexity: Text, images, time-series, and structured data all need to work together
Expert articles worth reading!
Curtis Langlotz: Dr. Langlotz is Professor of Radiology, Medicine, and Biomedical Data Science and Senior Associate Vice Provost for Research at Stanford University. His laboratory investigates the use of deep neural networks and other machine learning technologies to detect disease and eliminate diagnostic errors through analysis of medical images and clinical notes. (Quoted from Stanford profile https://profiles.stanford.edu/curtis-langlotz)
Getting Started in Medical AI: A Guide to Online Learning Resources for Clinicians
Your Roadmap to Getting Started
Step 1: Build Your Foundation and Network
Target Top-tier Conferences: Focus on AI and NLP conferences with strong “NLP Applications” or “AI for Healthcare” tracks. These venues value both methodological contributions and real-world impact. You may go through my Google Scholar profile to get a list of relevant conferences.
Essential conferences to follow:
- NeurIPS (Machine Learning for Healthcare workshop)
- ICML (Healthcare track)
- ACL/EMNLP (BioNLP workshops)
- AAAI (AI for Social Good)
Follow These Research Leaders and Labs:
- van der Schaar lab (University of Cambridge)
- Research focus: Time-series analysis, Synthetic data generation
- Why follow: Pioneering work in causal inference for healthcare
- Link: https://www.vanderschaar-lab.com/
2. Zitnik Lab (Harvard)
- Research focus: Graph neural networks, Multimodal learning, Biological foundation models
- Why follow: Cutting-edge work in drug discovery and biological understanding
- Link: https://zitniklab.hms.harvard.edu/
3. Boussard Lab (Stanford Medicine)
- Research focus: Electronic Health Records, NLP, Bioethics
- Why follow: Practical applications with strong ethical considerations
- Link: https://med.stanford.edu/boussard-lab.html
4. Butte Lab (UCSF)
- Research focus: Translational bioinformatics, precision medicine
- Link: https://profiles.ucsf.edu/atul.butte
5. Rajpurkar Lab (Harvard)
- Research focus: Computer vision for medical imaging
- Why follow: Excellent work in medical image analysis and benchmarking
- Link: https://www.rajpurkarlab.hms.harvard.edu/
6. Johns Hopkins University:
- Prof. Mark Dredze: NLP applications in public health
- Prof. Suchi Saria: Machine learning for clinical decision support
- Links: https://www.cs.jhu.edu/~mdredze/, https://suchisaria.jhu.edu/
7. Data Science and Text-based Information Systems (DATEXIS) of Berliner Hochschule für Technik, Germany www.datexis.com
8. CAIMed — Lower Saxony Center for AI & Causal Methods in Medicine, Germany https://caimed.de/en/
9. Biomedical Data Science Lab of ETH Zurich, Switzerland (https://bmds.ethz.ch/), Prof. Michael Moor (https://michaelmoor.me/) of ETH Zurich
10. Prof. Mary-Anne Hartley of Laboratory for Intelligent Global Health & Humanitarian Technologies at EPFL, Switzerland (https://www.light-laboratory.org/) — created one of the popular open-source medical LLMs called MEDITRON
11. Vector Institute, Toronto (https://vectorinstitute.ai/research/health-research/)
Step 2: Understand the Conference Landscape
Listen to cutting-edge research talks and symposiums
RAISE Health Symposium from Stanford HAI and Stanford Medicine: https://med.stanford.edu/raisehealth/events/symposium.html
Stanford Medicine YouTube channel (https://www.youtube.com/@StanfordMedicine)
Free newsletters:
Doctor Penguin (https://doctorpenguin.com/about)
Healthcare AI Guy Weekly (https://www.linkedin.com/newsletters/healthcare-ai-guy-weekly-7229262910038888448/)
Healthcare-Focused Venues:
- IEEE ICHI (International Conference on Healthcare Informatics)
- CHIL (Conference on Health, Inference, and Learning)
- ACM Transactions on Computing for Healthcare
Essential Workshops:
- ACL BioNLP: Natural language processing for biomedical texts
- ML4H: Machine Learning for Health (NeurIPS workshop)
- ICLR AI4PH: AI for Public Health
Step 3: Master the Technical Infrastructure
Data and Dataset Considerations: Before diving into any project, ask yourself:
- Are relevant datasets available and accessible?
- Do you understand the data collection process and potential biases?
- Are there benchmark datasets for comparison?
Essential Tools for Medical Text Processing: Structured Data Extraction from Unstructured Medical Text
MetaMap: Maps text to UMLS concepts
- Link: https://lhncbc.nlm.nih.gov/ii/tools/MetaMap.html
- Use case: Identifying medical concepts in clinical notes
QuickUMLS: Faster alternative to MetaMap
- Link: https://github.com/Georgetown-IR-Lab/QuickUMLS
- Advantage: Better performance for real-time applications
SemRep: Extracts semantic relationships
- Example: “Magnetic Resonance Imaging diagnoses Lacunar Infarction”
- Link: https://lhncbc.nlm.nih.gov/ii/tools/SemRep_SemMedDB_SKR/SemRep.html
Python Packages for Medical NLP:
- clinspacy: Clinical text processing
- medspacy: Medical-specific spaCy extensions
- scispacy: Scientific and biomedical text processing
Additional Technical Resources:
- FHIR (Fast Healthcare Interoperability Resources): Understanding healthcare data standards
- OHDSI (Observational Health Data Sciences and Informatics): Standardized healthcare databases
- Hugging Face Medical Models: Pre-trained models for medical text
Step 4: Assess Your Resources and Skills
Resource Management Questions:
- Do you have enough computing resources to feasibly experiment with various scenarios?
- Can you access cloud computing platforms if needed?
- Do you have partnerships with institutions that have medical data?
Technical Skills Inventory:
- Coding Knowledge: How easily can you learn new tools, understand and debug codebases, and identify gaps in papers?
- Domain Knowledge: Do you understand basic medical terminology and clinical workflows?
- Statistical Skills: Can you properly evaluate clinical relevance beyond standard ML metrics?
Critical Thinking Skills:
- Error Categorization: Can you systematically analyze where and why your model fails?
- Literature Review: Are you conversant with the relevant literature to propose feasible solutions?
- Clinical Intuition: Can you explain your model’s decisions to healthcare professionals?
Common Pitfalls and How to Avoid Them
1. The “Accuracy Trap”
Many researchers focus solely on improving accuracy without considering clinical relevance. A model that’s 99% accurate in detecting common conditions but misses rare but serious diseases isn’t clinically useful.
Solution: Always collaborate with clinicians to understand what types of errors matter most.
2. Data Leakage in Temporal Settings
Medical data often has temporal dependencies that can lead to data leakage if not handled properly.
Solution: Always use proper temporal splits and understand the clinical timeline.
3. Overfitting to Single Institutions
Models trained on data from one hospital often fail to generalize to other institutions due to different practices, populations, and equipment.
Solution: Seek multi-site validation whenever possible.
4. Ignoring Regulatory Requirements
Focusing purely on technical performance without considering regulatory pathways can limit real-world impact.
Future Trends to Watch
1. Foundation Models in Healthcare
Large language models and vision models specifically trained for medical applications are emerging rapidly.
2. Federated Learning
Enabling model training across institutions without sharing sensitive data.
3. Explainable AI for Healthcare
Making AI decisions interpretable for clinical use.
4. Digital Therapeutics
AI systems that directly provide therapeutic interventions.
5. Personalized Medicine
AI approaches for tailoring treatments to individual patients.
Final Words and Personal Reflections
In this article, we’ve covered the comprehensive landscape of getting started with research in AI for Medicine. The journey isn’t always smooth — you’ll face rejections, encounter datasets that don’t behave as expected, and grapple with the complexity of healthcare systems. But the potential impact makes every challenge worthwhile.
My key takeaways after five years in this field:
- Patience is crucial: Medical AI research takes longer than traditional AI research due to the need for clinical validation
- Collaboration is everything: Your medical collaborators are your most valuable asset
- Start small: Begin with well-defined problems before tackling broader challenges
- Think beyond the paper: Consider the real-world pathway for your research from day one
If you want me to cover specific topics in more detail, such as regulatory pathways, specific technical approaches, or career advice for medical AI researchers, please let me know in the comments or email me.
To learn more about my research portfolio and connect with me, please visit https://roysoumya.github.io/
Thank you for taking the time to go through this comprehensive guide. I hope it serves as a valuable roadmap for your journey in AI for Medicine research. Remember, every expert was once a beginner, and every breakthrough started with someone asking the right questions.
Breathe and build. — My motto in life
Discover more from Medical AI Insights & Guides | Datanalytics101
Subscribe to get the latest posts sent to your email.