Getting Started With R


In this post, I describe the things I have learnt to do in R and the resources that I used. There is no particular path to follow when learning a programming language, but in this article I will sequence the learning in a way that makes sense in retrospect.

Part 1: Setting Up

We learn programming by doing. The first few times you copy what someone else is doing, with time you are able to write code from memory and eventually you are able to combine the various things you know to create solutions for your specific problems. There are two main ways to learn R by programming:

1. Using online platforms that allow you to type code and see results. All you need is internet connection and a user account.

One such platform is Datacamp and it offers beginner courses. You can be able to access a few courses and topics for free, but you need a paid subscription to access all courses and topics. Datacamp offers an introduction to R course. Another online learning platform is Codecademy and it offers this introduction to R course. You can certainly find other online learning platforms on the web.

2. Install and use R on your computer. This might be a good choice for someone without continous internet connection and who prefers unlimited freedom when learning. This is the method I used and still use.

To install R, all you need is a basic computer. I started out with a 2GB RAM laptop and limited storage.I had a smooth learning experience as R is a light program particularly when running a few lines of code on a small dataset. Use what you have.

R comes with its own interface/editor called R GUI, however I recommend installing RStudio - a more beautiful interface with way better user experience- immediately after installing R. To install R, go to this R page and to install Rstudio, go to this Rstudio page. If you need step by step instructions on the installation processes, use this Datacamp guide. Once R and Rstudio are successfully installed, you are ready to code as you learn.

Part 2: R Basics

Below is a list of some basics to cover:

  • Data types

  • Loading data

  • Quickly understanding the data you have

  • Subsetting data

  • Plotting

  • Saving data

  • Importing packages

  • Functions

Some good resources to cover the above basics include:

  1. Cran Introduction to R pdf chapters 1 to 10, chapter 13

  2. Swirl is a good learning package that teaches you as you go along (an offline alternative to Datacamp and the likes). To get started with swirl, use this guide.

  3. For those who prefer video lectures, Data Science: Foundations using R Specialization is a great resource that starts from basics.

  4. For those who prefer online learning platforms like Datacamp and Codecademy, you can find various introductory courses on your preferred platforms.

Part 3: R packages to take you beyond the basics

R packages are smaller programs that extend the power of R. In this section, I cover packages that I almost always use in data analysis projects and links to some helpful resources:

  • Package to read csv, tsv and fwf files: readr

  • Package to read .xls/.xlsx files: readxl

  • Package that offers an alternative data type to Data Frame: tibble

  • Package to work with data at data analysis stage: dplyr

  • Package to tidy and reshape data: tidyr

  • Package to create beautiful visualizations: ggplot2 read through all the 3 blog posts

  • Package to work with date columns: lubridate

  • Package to work with text data: stringr

The aforementioned Intro to R swirl course also covers some of these packages.

readr, tibble, dplyr, tidyr and stringr are some of the packages in a collection of packages known as Tidyverse.

I use Tidyverse, but there is an alternative package known as data.table. Some resources to learn data.table include:

You can learn either tidyverse, data.table or both.

Part 4: More advanced and task specific packages

Rmarkdown allows you to combine normal text and code. It comes in handy when you want to include explanations to your code, visualizations and analysis outputs. It is what I have used to write this blog post. Rstudio offers great Rmarkdown learning resources here.

Shiny allows you to create web applications with limited knowledge of HTML and Javascript. It comes in handy when you want others to interact with the product of your code without them knowing or installing R. The most common use of Shiny, I have seen, is creating interactive dashboards. Rstudio provides shiny examples here and great shiny learning resources here.

Rvest allows you to scrape data from the web. A great rvest learning resource is here.

Blogdown allows you to create blog posts like this one in R. Blogdown works with Github to store your code and hosting services such as Netlify to host your blog. This article guides you on how to use blogdown to create your blog with blogdown. It would help to be familiar with Github before learning blogdown.

Part 5: Git and Github/Gitlab

Git is a versioning tool that keeps track of the changes you make to your code. Github and Gitlab are examples of code repositories that allow you to store and share your code online. To learn how to use Rstudio with Git and Github, use the following resources:

Part 6: Statistics

Part 7: Machine Learning

1. EDX Analytics Edge

This is the course that hooked me to R. It might seem odd that I started with a course that taught machine learning concepts, but I had just come across the terms “Data Science” and “Machine Learning”. The course provided a simple and comprehensive introduction to machine learning alongside R basics such as loading data, data types, summarizing data, subsetting data etc. The course covered the following topics: Linear Regression, Logistic Regression, Decision Trees, Random Forest, Clustering, Linear and Integral Optimization. The lectures were easy to understand, there were code examples to follow along and short questions after each lecture. The assignments at the end of each topic were the best part of the course as they required a learner to actually work on the data provided in order to answer the questions. The questions were accompanied with code examples and questions were ordered sequentially in such a way that learners started with loading data and step by step learnt a piece of code as they answered each question. This course gave me a good understanding of R and machine learning.

2. Machine Learning with R by Brett Lantz

This book covers a wide range of machine learning algorithms with simple explanations and step by step code examples. I always refer to this book whenever I need to refresh my knowledge on a machine learning algorithm. The datasets used in the book can be found here.

3. Machine Learning Projects

Now that know your way around R and the foundations of machine learning, you need to do projects to grow your skills and to build a portfolio. Some great sources of datasets, tutorials and others’ code include:

It also good to come up with your own project ideas, get your own datasets and implement your ideas.

Part 8: Complementary Tools

As your R career progresses, you need other tools/products to extend your work. Here are some of the tools I regularly use with R:

1. Relational databases beyond .xls, .xlsx and .csv files

As your datasets grow and the number of people using those datasets grow, you may need to look beyond .xls, .xlsx and .csv files. I have been using relational databases particularly Postgres. Alternatives to Postgres include MySQL, Microsoft SQL Server, Oracle etc.

2. SQL is a programming language that allows you to interact with relational databases including Postgres. To learn SQL for Postgres use this PostgreSQL tutorial.

3. R SQL packages

To work with relational databases in R, you need packages that mediate between R and SQL such as DBI, RPostgreSQL and RPostgres for working with Postgres. There are other packages that work with other types of relational databases.

4. Cloud Platforms as virtual working environments, databases andshiny applications host services

You can use cloud platforms such as Digital Ocean and AWS EC2 (what I use) to create virtual environments particularly when collaborating with others such that you don’t write code on your local pc, but on the virtual PC. To learn more about this, use this link.

Rstudio offers a free service to host shiny applications, but it has limited capabilities. It also offers a premium service called RStudio Connect - which I am yet to try. You can use cloud platforms such as Digital Ocean and AWS EC2 to host your application. This is my go to tutorial on hosting a shiny app on AWS EC2.

Part 9: More Resources

Additional R Learning Resources

The web will almost always have the answers you are looking for. Whenever you are stuck or you get an error, search the error or task you want to accomplish and you will most likely find an answer. Sites that are extremely valuable and will most likely have the solutions you are looking for include:

Non-programming Data Science Resources

Global Data Science Communities

Kenyan Data Science Communities

Nairobi Women in Machine Learning offers a community based learning program dubbed R Master Cohort Class

Non-programming tools

  • Flux: To automatically control your screen brightness according to waking and sleeping time:

  • Pocket: To save links to resources you like for later reading

  • Cold Turkey: To block distracting websites during your learning/working time.

  • Alarm clock/Timer of your choice: To remind you to take regular physical breaks to keep away neck and back aches: Alarm clock of your choice

The end

In this post, I have shared the resources that I found valuable in my R learning journey. However, there are a lot of resources and lot of learning paths, find what works for you and stick with it. All the best in learning R.


comments powered by Disqus