README.md

Teaching Data Science for Beginners

By Osama Mahmoud

Build
Status License: GPL
v3

The dsEssex is an R package containing data examples and helpful tools for teaching Data Science to beginner learners.

Overview

This R package provides datasets, case-studies, functions, and exercises that can be used for teaching Data Science to students with no/little statistical and/or programming backgrounds. This is originally created to facilitate the delivery of the MSc Applied Data Science and MSc Data Science and its Applications at the University of Essex, United Kingdom. However, it can be also used for teaching R programming and data science to both undergraduate and postgraduate students in other Data Science programmes.

System Requirements

Hardware requirements

The dsEssex R package should be smoothly installed and working well with most of the standard computers.

Software requirements

The dsEssex R package is supported for Windows, Linux and macOS. The package has been tested in R under the following systems: + Linux: Ubuntu 16.04 (R 3.6.1) + macOS: Mojave 10.14.6 (R 3.6.1) + Windows: 10 (R 3.6.3)

Installation Guide

The dsEssex R package includes a variety of data examples, case-studies, R package dependencies and practical sheets that can facilitate teaching data science in lectures, labs, workshops, and classes. The easiest way to install the dsEssex R package is by running the following code lines into your R session:

# required only once per machine!
if(!require("remotes")) install.packages("remotes")
remotes::install_github("statcourses/dsEssex")

R Dependencies

This software requires R (>= 3.5.0). If you do have an older version of R installed on your machine, you may need to install the latest R version from here.

Installing the dsEssex R package will automatically install the following dependencies that are required for most Data Science labs, classes and workshops:

tidyverse
dslabs
dplyr
stringr
ggplot2
tidytext
textdata
english
tidyr
jsonlite
lubridate
scales

Testing the Package

Get started with loading a few data sets by running the following:

# load the package into your R session
require(dsEssex)

# load data of Donald Trump's twitter account from 2009 to 2021
data(Trump_tweets)

# display the first few rows of the data
head(Trump_tweets)

# display description of the data
help(Trump_tweets)

# load the index page that lists all the components of the package
help(package = dsEssex)

Data example for string processing and text analytics

For simple string processing and text analytic exercises, you may load the daily mortality data for Puerto Rico, the USA territory, extracted for the month of October through a number of years (2015-2018) from this pdf file. This file was downloaded from the dslabs R package by Rafael A. Irizarry.

# load the package into your R session
require(dsEssex)

# load the raw daily mortality data for October extracted from the pdf file
data(PR_Oct_Deaths)

# display the data
PR_Oct_Deaths

License

This project is covered under the GNU General Public License, version 3.0 (GPL-3.0).

Contact

This project is developed by Dr. Osama Mahmoud, Department of Mathematical Sciences, University of Essex, United Kingdom. For bug reports, feature requests, and questions on technical issues of using the dsEssex R package, please open an Issue. If you would like to contact the author, please feel free to send him an email on o.mahmoud@essex.ac.uk.



statcourses/dsEssex documentation built on Jan. 10, 2024, 4:32 p.m.