Software Installation and Prior Study Materials

Please bring your laptop computers for the hands-on sessions, and have Python and/or R installed as follows to be able to follow along in the hands-on sessions. Having a working laptop and having the software installed are must-have pre-requisites for the hands-on sessions.

Software Installation Instructions

For Python users: Install Anaconda from All required libraries for the hands-on sessions are already included in Anaconda, except the library gensim. Please install this additional library gensim by using the following command from command-line:

pip install gensim

For R users: Install R from
Then Install RStudio from . (RStudio needs an existing R installation to work) Run the following command in RStudio to install the required libraries for the hands-on sessions:

install.packages(c(“ggplot2”, “randomForest”, “caret”, “rpart”, “plyr”, “gbm”, “rpart.plot”, “reshape2”, “naivebayes”, “corrplot”, “e1071”, “tm”, “topicmodels”, “lda”, “MASS”, “NLP”, “R.utils”, “stringdist”, “dplyr”, “openNLP”, “rJava”, “RWeka”, “qdap”, “magrittr”, “data.table”))

To install the package openNLPmodels.en please use the command:

install.packages(“openNLPmodels.en”, repos = ““, type = “source”)

The rJava package requires an existing Java installation

Prior Study Materials

The study and viewing materials below are suggested for a basic background for the topics we plan to discuss in the Data Science Master Class.

The links to complete online courses on Coursera and MIT OCW provide an external resource to rely upon.