Vocabulary Adaptation for Medical Language Models: From IJCAI 2024 to ACL 2025
A deep dive into why pre-trained LLMs fragment medical terms like "erythromycin" into five tokens — and the MEDVOC framework that fixes it. Companion article to the Microsoft Research India talk.
Read the guide