Florence Muriuki

Everyday Data Science

Getting Started With R

Part 1: Setting Up Part 2: R Basics Part 3: R packages to take you beyond the basics Part 4: More advanced and task specific packages Part 5: Git and Github/Gitlab Part 6: Statistics Part 7: Machine Learning Part 8: Complementary Tools Part 9: More Resources Introduction In this post, I describe the things I have learnt to do in R and the resources that I used.

How to read and combine many similar files into a single dataset in R

Summary In this post, we look at how to read many similar files at once and combine them into a single dataset in R. Suppose you have many similar files in .xlsx, .xls or .csv files with similar data, possibly in (almost) similar format, you definitely want to read all the files and combine the data in a single step instead of reading each file at a time. Scenarios in which this case of similar files may arise include, but are not limited to:

Flagging Toxic Comments Part 2

In the previous post, we used words to classify Wikipedia comments as harmful or harmless. In this post, we will create a few features from the comments and build another classification model. To explore features, we will use R as I prefer using R ggplot2 We will start by reading Kaggle’s training dataset, create column “harmful” then select columns “comment_text” and “harmful”. library(dplyr) library(caTools) library(ggplot2) library(gridExtra) library(stringr) library(ngram) library(tm) train <- readRDS("Kaggle-Toxic-Comment-Challenge/Data/train.

Flagging Toxic Comments Part 1

Summary In this post, we cover descriptions of Bayes Theorem and Naive Bayes. We then use Naive Bayes to classify Wikipedia comments as harmful or harmless. The model created detects 58% of harmful comments in the test data. In future posts, we improve the model by using more features and different classification models. Introduction Most of us find the internet entertaining and resourceful, but sometimes we come across perjoratives in various forums.

Past 10 years in headlines: Uhuru's Friends and Enemies, Corruption Web

In the information age, we have access to almost every piece of information imaginable in a matter of seconds. However, at any given moment, we only need very specific and negligible portions. A good example is when we are reading/watching news, we are only interested in particular segments and particular details from those segments. For some it’s sports and within the sports category, some people will pick up information only on football matches whilst others on athletes.

Food Production in Kenya

The purpose of this project is to showcase use of R shiny in creating simple interactive dashboards. I decided to use food production dataset because in future I would like to dig deeper into why we experience food shortages in some parts of Kenya. We all have our favourite foods. Some people can’t wait for their favourite fruit season and are willing to pay exorbitant amounts during the off season.

Overview of Kenya's Job Market

There were 37034 jobs posted between 2016-09-07 and 2019-06-14 on jobmag Kenya. According to the chart below, 2019 has witnessed decrease in jobs posted. Unsurprisingly, most of the top posters were recruitment agencies. Here are the recruitment and non-recruitment firms with highest number of posts: In terms of number of jobs posted, Safaricom led the pack of non-recruitment organizations followed by Public Service Commission and Save the Children.A quick look at Safaricom posts shows that ICT/Engineering, Sales & Marketing, Administration, Finance/Accounting and Customer Care formed the bulk of its job posts.