After attending lecture/labs and completing this problem set you should be able to do the following:
You are to complete and turn in this problem set by Thursday September 20 at 9:44 am. The final problem will show you how to turn in your problem set.
All code should be included in the submission, and should be saved in an r markdown file. Any answers should be included as comments immediately below the code responsible, and should be labelled as follows:
# CODE 2a Put your code here and if you need another line, put it here, and also here!
When plotting remember to appropriately label your axes
Data Exploration with the mtcars dataset.
The built-in dataset, mtcars
includes data from the Motor Trend magazine for 1973-74 models of thirty-two automobiles on eleven different variables.
head(mtcars)
(5 pts) Create a basic historgram with x- and y-labels to show the distribution of fuel efficiency (mpg) in the data set. Normalize the mpg data so that the frequency falls on a scale from 0 to 1.
(3 pts) Which range of mpg is the most frequent? The least frequent?
# Create Histogram # ANSWER 1.1 #
Create a new data.frame, cars_new, that has two new columns, make and pow, where make stores the make of the car and pow stores the power-to-weight ratio (i.e. hp to weight) for each car
Note The dataset stores the make of the car as the data frame's row names
# Create the new data.frame with new fields
(2 pts) Sort the dataframe by horsepower-to-weight ratio from lowest to highest.
(5 pts) Create a bar plot showing the make of each car vs. the power-to-weight ratio for each car using the ordered data.
Hint look into the barplot arguement "las" and "cex.names" to make the x-axis more legible
# Sort by power to weight # Create bar plot
Using the function quantile(), consider the dataset iris and create a new field that color codes the elements of the dataframe as:
Write pseudo code to display a scatterplot comparing the petal lengths and widths. Show the steps necessary to color the petals based on the classification described above
--- WRITE PSEUDO CODE HERE ---
Create a scatterplot comparing the petal lengths and widths. Show the steps necessary to color the three categories of petals as follows:
In your actual code, use filled points (i.e. pch=19), e.g. plot(arg, arg, ..., pch = 19)
. Add proper title, axes labels and legend at the bottom-right corner of the figure for the three classes.
Using ?plot
will tell you how to add a plot title and axis labels. You can add the legend using the function, legend
as follows: legend("bottomright",[vector of data labels], [vector of color codes], pch=19)
# Create column for small/normal/large colors # Create scatterplot # add legend
The iris dataset has a set of 4 numeric columns. Create a simple scatter plot for every possible pair of numeric columns in the dataset (6 plots in total) using a for loop. Do not plot pairs composed by the same variable on X and Y and plot X VS Y only once (do not plot Y VS X).
Display the plot as a 2X3 panel and name the axes appropriately for every pair. Add a non-linear trend line with line width 2 (see ?lowess). The main title of each plot should report the p-value of the pearson correlation test. If the test result is not significant at alpha 0.05, draw the trend line in blue, otherwise red.
Hint 1: The functions colnames
, setdiff()
, combn
maybe helpful in answering this question.
Hint 2: The function, lowess
can be used to perform the fitting.
Hint 3: The function, cor.test()$[desired component]
can be used to perform and extract results of a correlation test.
Hint 4: Use the function, par(mfrow=[vector of rows and columns of figure])
to create a figure with subplots.
a. (6 pts) Write a list of instructions to solve this problem step-by-step in English for the first column pair, Sepal Length and Sepal Width.
<!-- List of instructions for single pair -->
b. (6 pts) Now translate your instructions from part a into pseudocode
<!-- Pseudo code for single pair -->
c. (6 pts) Now translate your pseudocode into actual running code
# Follow instructions for single pair
d. (18 pts) Repeat steps a - c for the first two column pairs, Sepal Length and Sepal Width and Sepal Length and Petal Length.
<!-- List of instructions for two pairs -->
<!-- Pseudo code for two pairs -->
# Follow instructions for first two pairs
e. (18 pts) Now, repeat steps a - c for all six combinations of numeric columns.
# Get all combinations # Set up subplot # Plot every combination
This question will take you through the process of sharing results via Git. Much of it can be done with GitHub desktop, but parts will require a little bit of work in the command line.
Note: if you are already familiar with git, you may use the command line to do the same functionality. Also, Github Desktop only works on Mac or Windows; if your computer uses a different OS, you will need to use the command line and take a screen shot
To provide screenshots use the following function in the code blocks below but remove eval = F
# Screenshot of cloned repo
`knitr::include_graphics("mypic.png")`
Login to Github.com and create a public repository with a README file as we did in lecture on 9/11.
Clone your new repo to your computer.
i. Working in the master branch of your repository, edit the README
file to contain a new line with the number sequence: 1234567890
.
When you have made some or all necessary changes, commit and push them to the online repository. First, ensure all changes you would like to make are checked in GD's left window and be sure to add a summary of your changes.
As we did in lecture, create a new branch of your repository and name it, Second Branch
Add to your second branch a text file and name it README2.md
.
Now, merge Second Branch Into the Master Branch
Tag your commit on Github.com as Final Draft
Copy and paste the link to your repo here
<!-- Link to Repo -->
How long did this problem set take you to complete? X hours
Please knit your markdown file and submit both the markdown file and the knitted pdf to canvas
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.