Python for Natural Language Processing with Pandas and NLTK

Python is the most popular programming language for Machine Learning and Natural Language Processing (NLP). Its compact nature along with an enormous collection of packages like pandas, nltk, and deep learning frameworks like Pytorch, and Tensorflow, have made it a go-to language for data science enthusiasts and programming newbies, in general.

Complete tutorial on how to use Python for NLP

The corresponding IPython notebook used in the above lecture can be downloaded from here.

Learning resources to go through

A fun way to directly dive into Python is Automate the Boring Stuff with Python, the online book version(paperback version also available). The projects are small and fun. I personally tried some of them and some of my own. They are available on my Github page. I found Codeacademy to be very useful and usually went with the free plan.

Apart from Python, R has also established its place to be a very useful language for Data Science projects. It makes it very easy to rapidly prototype your dataset, especially the data cleaning part. I have also covered an article on how to get started with R, which may prove useful to you.

From my experience, using Python and R hand-in-hand brings out the best of both the languages.

Nowadays, knowing Python has become a necessary skill

Be it front-end, back-end development as well as data science. Be it for internships or your very own research project, almost everyone whom I have encountered, either uses only Python or Python is their primary. So many well-crafted packages, an active open-source contribution community, and a minimalist coding style makes it much easy to learn, grasp as well as maintain, as your codebase scales both in size as well as in complexity.

Setting up your Python environment

Follow the official documentation to set up your Python environment. I will cover the list of important packages that you will need in another blog.

A strongly advised solution to avoid the hassles of installing packages and addressing broken dependencies, especially for someone who wants to try ML for the first time in practice, is Anaconda. Follow the official documentation and you should not have any problem.

In terms of IDE, you can choose from Pycharm, Spyder, or a Plugin in Eclipse. I personally prefer Visual Studio Code because it is lightweight, free, and has useful extensions to work with. Moreover, it provides great support for working remotely on a server.

Relevant references and websites to look out for

There are a lot of well-documented tutorials for setting up your Python environment. I will suggest you consider the following websites as a dependable source of reference : 

  1. AnalyticsVidya
  2. Machine Learning Mastery
  3. Tutorialspoint
  4. StackOverflow (:P)

Related articles that may be of interest to you

You can get a comprehensive list of academic conferences in the field of AI and Machine Learning in another article written by me

If you are new to writing papers using Latex for academic conferences, you can visit the following articles:

  1. I cover how to setup up a Tex environment in your local machine (article link)
  2. Conference or journal paper template – individual files and how to use them (article link)
  3. How to correctly write references or perform cross-referencing while writing your paper (article link)

Please comment below if you would like to add something.