A Self-help guide to starting your own Machine Learning research project

During my research journey, I learned new things and faced a lot of challenges. I believe a structured orientation for students starting out as a researcher will be useful and definitely would have helped my younger self.

As I talked to more and more grad students, I began to realize that the experiences are quite heterogeneous and vary significantly from one another.

Eureka! A self-help guide assimilating advice from past researchers and professors.

I believe the knowledge contained in this blog series will help make the research experiences more equitable and enjoyable.

This is a part of the “Research for All” initiative, aimed to promote research awareness and make machine learning research more accessible.

First, to introduce myself, I am a 3rd year Ph.D. student at IIT Kharagpur, India, and have been a part of the research fraternity for close to 5 years — intern, then Masters’s student, and now pursuing a Ph.D.

What will you get to learn by the end of the self-help research blog series?

Here, I will try to provide my own research project life-cycle based on first-hand practical experience, and a list of research practices I observed and picked up along the way. I had the honor of working with a talented team of researchers and seniors, and two research labs with quite a unique work culture of their own.

Secondly, I have tried to assimilate research articles from eminent researchers, and provide them as a reading list associated with each subtopic

I believe this will provide a balanced view based on my personal, first-hand experience and more experienced guidance by eminent researchers on the same topic.

Your feedback is crucial to help me further improve this article, so don’t hold back and let me know about your views in the comments or by simply mailing me.

Specifically, you will learn about …

Part 1: Motivation, research from a self-improvement perspective, life-cycle of a research project, problem ideation, reading list (current article)

Part 2: Problem verification, baseline setup, novelty (will be out soon!)

Part 3: Soft skills including delivering a technical presentation, collaboration, and work productivity (another blog article, so do give it a read. 8 min. reading time)

Related Works

Interestingly and quite opposite to my starting hypothesis, I stumbled upon a treasure trove of learning resources, tutorials, or how-tos by eminent professors and researchers to help us guide at every step of the way.

https://mentorship.aclweb.org/ webpage, last seen on July 1, 2022

Along similar lines of making the research experience of students more equitable across different institutions for students pursuing a Ph.D.in natural language processing — — Zhijing Jin (Ph.D. student in NLP at Max Planck Institute, co-organizer of the ACL Year-Round Mentorship Program) has been made available an open-source Github repo containing reading material for research sub-topics like:

What Is Weekly Meeting with Advisors like?

Coming Up with Good Research Ideas

How to Read Papers.

I would definitely recommend you to give it a thorough read as it provides a deep and comprehensive overview of already available resources, and would definitely help the researchers to start off on the right foot.

How is this blog series any different?

If you go through the reading list provided at the end, you will hardly any material is written from the context of Indian academia, and that too from a student’s perspective

I tried to put together whatever I learned from my first-hand experience and what worked for me.

So, you can expect this piece to be highly opinionated, focusing on “what worked” instead of “how it should have worked ideally”

As the floor is now set, so let’s start!

Life-cycle of a Research Project

The Life-cycle of a Machine Learning Research Project

Prerequisites

Choice of broad domain — Text, Vision, Graphs, Speech
Basics of Machine Learning and Deep Learning
Python, Pytorch
Optional: R, other DL frameworks
Git commands

Let’s talk about the first and perhaps the most challenging step — Problem Ideation.

Deep dive into Problem Ideation

We will try to break into down and present my take on tackling each step.

Selecting top conferences

CORE Rankings — A*, A
Google Scholar Metrics — choose your subcategory

List of top-tier conferences in AI and ML

General — WebConf, CIKM, AAAI, KDD, ICDM
Health — BioKDD, CHIL, ACL BioNLP workshop, ACM Transactions of Computing in Healthcare
Natural Language Processing — ACL, EMNLP
Computer Vision — CVPR, ECCV, MICCAI
Information Retrieval — SIGIR, ECIR, WSDM
Recommender Systems — RecSys
Social — ICWSM, JCDL
NeurIPS, ICLR, AISTATS, KDD — super top
Conference deadlines (https://twitter.com/_ConferenceList)

Accessing research papers

Conferences from ACL Anthology like ACL, EMNLP, and NAACL are published publicly
Non-commercial preprint server — Arxiv, bioRxiv
Rxivist combines biology preprints from bioRxiv and medRxiv with data from Twitter to help you find the papers being discussed in your field
Unpaywall
Search for the paper in Google Scholar, select the article, click on ‘All [number] versions,’ and check if any one of them has a PDF version available
Check if the first or last author has a personal homepage, then an author’s copy is usually found on their webpage.

Some heuristics to filter Machine Learning papers

ML is a very fast-growing field — Published within the last 3 years.
Reasonable citation count
Authors from a reputed institution (first or principal author)

Finding Candidate Research Topics — Talks/Tutorials

Tutorials from top conferences (from the conference website)
Slides, and sometimes videos are also available
ACM Conference on Health, Inference, and Learning Tutorials
Youtube playlists covering videos of paper presentations and invited talks of KDD 2021 and CHI 2021, UIST, and DIS on the SIGCHI Youtube channel
CHI channel (other HCI videos)
Freely accessible videos
SlidesLive library
Follow the Twitter handle of the conference (@CIKM2021, @emnlpmeeting, @pakdd_social)
Live-streamed on Youtube

Reading papers — exploration phase

This is perhaps the most time-consuming part, but it was the exciting part of the process for me. First, start by reading the abstract and introduction section of the paper. Then create a paper summary where you summarize the following aspects:

The problem statement, key takeaways, and limitations or scope.

Read figures and tables along with their captions.

The ideal output of the problem ideation stage

Start with some keywords (NLP, medical, summarization)
Identify 10–15 papers — create paper summaries
Brainstorm with collaborators or colleagues or yourself
Cluster problem statements
List research challenges
Resources required
domain knowledge, labeled data availability, the skillset of authors, the time required, server requirements of deep learning experiments

Image source: https://i.ebayimg.com/images/g/YJMAAOSwEVhfx1jO/s-l400.jpg

A crucial resource that helped me at all stages of the research process

Twitter.

Yes, I may sound quite new to some but believe once you start using Twitter as a learning resource, it is a gold mine

It has been my go-to and is still one of the best platforms to stay up-to-date with the research in my domain.

I started following some well-known researchers and research labs, and that gave me access to:

Details about their recently published research paper
Their commentary and opinion of high-impact research articles. If you follow the discussion thread, you will get to learn a lot!
Job notifications — internship, postdoc, and Ph.D. positions
Work productivity and mental health issues and solutions

I started following well-known Twitter handles or hashtags like #AcademicTwitter, @PhDVoice, and @jenheemstra (now there are many more ) where such issues faced by the student are discussed or useful tips or perspectives are presented from time to time.

If you do not find an exact match to what you are looking for, please feel free to post your question, someone will surely get back to you.

Parts to come under the “Research for All” initiative

Part 2 containing details regarding Baseline Setup and Novelty will be out soon.

Part 3 contains a guide to improving the soft skills and day-to-day skills of a researcher (separate article already published, 8 min. reading time).

Reading List for Ph.D. and Research Scholars

I am quite pleased to announce that a lot of resources and advice from eminent researchers and institutions are freely available. As a believer in positive realism, I will say that you just need to read, read and continue reading …

The Missing Semester of your CS Education https://missing.csail.mit.edu/
How to be successful as a Ph.D. student: This document also has its own reading list at the end
Stanford CS Ph.D. Orientation 2021
Stanford CS Red Book: Please read section 1.4 on “Advisors: Choosing advisors, Communicating with advisors”, and section 1.7 on “How to do research”
Lessons from my Ph.D. — Austin Z. Henley
Newsletters, WordPress, and Quora: DoctoralWriting SIG, The Hidden Rules of Academia, by Bianca Pereira (Medium), and Dr. Doctorate (Quora)
Advice to pre-PhD self https://twitter.com/FromPhDtoLife/status/1514338255822639115
ACL is a premier conference on Natural Language Processing and they have organized mentoring sessions related to research like how to choose your NLP project, building collaborations, and more. Please go through the recorded videos available on their Youtube channel
Choosing between a Ph.D. and industry for new computer science graduates by Shreya Shankar (Blog)
Balancing Teaching and Research by Emily M. Bender (Slides)
Job positions: Twitter threads at @jobRxiv
PostGradual: The Ph.D. Careers Blog
The Ultimate UG Research Manual by Scholar’s Avenue, IIT Kharagpur, India
Highlights of mentoring sessions of EMNLP 2020 (Blog)

Concluding the self-help guide

I hope this article helps spread awareness about the free and open-source resources available for researchers and brings about a self-improvement perspective toward research in academia.

I wish you a unique, exciting but informed Ph.D. journey!

Disclaimer for the research self-help guide

The article is based on my opinion and experience alone and does not reflect the views of any researchers I have met or collaborated with. I am a Ph.D. student trying to navigate the long Ph.D. journey and am in no way an expert. This article aims to present what has worked for me till now (in specific domains of machine learning and natural language processing) and aggregate the public views and experiences of more experienced and eminent researchers on the same topic.

💚30+ free articles already available at datanalytics101.com

💚 Your feedback is critical to improving the content, so please feel free to share your take on this topic

💚Follow me on Twitter @roysoumya1 for getting updates on “AI in Healthcare”

💚I plan to write one post a month on Medium. To get updates directly to your email, please subscribe at https://medium.com/subscribe/@soumyadeeproy

A Self-help guide to starting your own Machine Learning research project