Home

/

GitHub

/

NiklasTR/bmi713

/

In NiklasTR/bmi713:

Overview:

Learning Goals

After attending lecture/labs and completing this problem set you should be able to do the following:

write a complete R program with conditionals and iterations, functions, including plotting but no reading/writing of files, etc. which will be covered later
understand relationship between R, R studio, package installation, package/library include in code, R console output, R script and R markdown script
understand and apply algorithmic thinking to write pseudo-code
know how to use the help functions and find commands etc.
initialize a git repo, add files, remove files, commit, push, pull, and tag

Instructions:

You are to complete and turn in this problem set by Thursday September 20 at 9:44 am. The final problem will show you how to turn in your problem set.

All code should be included in the submission, and should be saved in an r markdown file. Any answers should be included as comments immediately below the code responsible, and should be labelled as follows:

# CODE 2a
Put your code here
and if you need another line, put it here,
and also here!

When plotting remember to appropriately label your axes

Exercise 1 (18 points)

Data Exploration with the mtcars dataset. The built-in dataset, mtcars includes data from the Motor Trend magazine for 1973-74 models of thirty-two automobiles on eleven different variables.

head(mtcars)

1.1 (8 pts)

(5 pts) Create a basic historgram with x- and y-labels to show the distribution of fuel efficiency (mpg) in the data set. Normalize the mpg data so that the frequency falls on a scale from 0 to 1.

(3 pts) Which range of mpg is the most frequent? The least frequent?

# Create Histogram

# ANSWER 1.1
#

1.2 (3 pts)

Create a new data.frame, cars_new, that has two new columns, make and pow, where make stores the make of the car and pow stores the power-to-weight ratio (i.e. hp to weight) for each car

Note The dataset stores the make of the car as the data frame's row names

# Create the new data.frame with new fields

1.3 (7 pts)

(2 pts) Sort the dataframe by horsepower-to-weight ratio from lowest to highest.

(5 pts) Create a bar plot showing the make of each car vs. the power-to-weight ratio for each car using the ordered data.

Hint look into the barplot arguement "las" and "cex.names" to make the x-axis more legible

# Sort by power to weight

# Create bar plot

Exercise 2 (22 pts)

2.1 (11 pts)

Using the function quantile(), consider the dataset iris and create a new field that color codes the elements of the dataframe as:

'blue': small elements showing both petal length and width below their first quartiles (i.e. quantile 0.25 calculated for each)
'red': large elements showing both petal length and width above their third quartiles (i.e. quantile 0.75 calculated for each)
'grey': normal elments showing either petal length or width between their first and third quartiles

Write pseudo code to display a scatterplot comparing the petal lengths and widths. Show the steps necessary to color the petals based on the classification described above

--- WRITE PSEUDO CODE HERE ---

2.2 (11 pts)

Create a scatterplot comparing the petal lengths and widths. Show the steps necessary to color the three categories of petals as follows:

blue for 'small' elements
red for 'large' elements
grey for 'normal' elements

In your actual code, use filled points (i.e. pch=19), e.g. plot(arg, arg, ..., pch = 19). Add proper title, axes labels and legend at the bottom-right corner of the figure for the three classes.

Using ?plot will tell you how to add a plot title and axis labels. You can add the legend using the function, legend as follows: legend("bottomright",[vector of data labels], [vector of color codes], pch=19)

# Create column for small/normal/large colors

# Create scatterplot

# add legend

Exercise 3 (54 pts)

The iris dataset has a set of 4 numeric columns. Create a simple scatter plot for every possible pair of numeric columns in the dataset (6 plots in total) using a for loop. Do not plot pairs composed by the same variable on X and Y and plot X VS Y only once (do not plot Y VS X).

Display the plot as a 2X3 panel and name the axes appropriately for every pair. Add a non-linear trend line with line width 2 (see ?lowess). The main title of each plot should report the p-value of the pearson correlation test. If the test result is not significant at alpha 0.05, draw the trend line in blue, otherwise red.

Hint 1: The functions colnames, setdiff(), combnmaybe helpful in answering this question. Hint 2: The function, lowess can be used to perform the fitting. Hint 3: The function, cor.test()$[desired component] can be used to perform and extract results of a correlation test. Hint 4: Use the function, par(mfrow=[vector of rows and columns of figure]) to create a figure with subplots.

a. (6 pts) Write a list of instructions to solve this problem step-by-step in English for the first column pair, Sepal Length and Sepal Width.

<!-- List of instructions for single pair -->

b. (6 pts) Now translate your instructions from part a into pseudocode

<!-- Pseudo code for single pair -->

c. (6 pts) Now translate your pseudocode into actual running code

# Follow instructions for single pair

d. (18 pts) Repeat steps a - c for the first two column pairs, Sepal Length and Sepal Width and Sepal Length and Petal Length.

<!-- List of instructions for two pairs -->

<!-- Pseudo code for two pairs -->

# Follow instructions for first two pairs

e. (18 pts) Now, repeat steps a - c for all six combinations of numeric columns.

# Get all combinations

# Set up subplot

# Plot every combination

Question 4 (10 pts)

Version control with Git

This question will take you through the process of sharing results via Git. Much of it can be done with GitHub desktop, but parts will require a little bit of work in the command line.

Note: if you are already familiar with git, you may use the command line to do the same functionality. Also, Github Desktop only works on Mac or Windows; if your computer uses a different OS, you will need to use the command line and take a screen shot

To provide screenshots use the following function in the code blocks below but remove eval = F

# Screenshot of cloned repo

`knitr::include_graphics("mypic.png")`

4a. Create your repo (1 pt)

Login to Github.com and create a public repository with a README file as we did in lecture on 9/11.

4b. Clone your repo (1 pt)

Clone your new repo to your computer.

4c. Edit files (1 pt)

i. Working in the master branch of your repository, edit the README file to contain a new line with the number sequence: 1234567890.

4d. Commit and Push (1 pt)

When you have made some or all necessary changes, commit and push them to the online repository. First, ensure all changes you would like to make are checked in GD's left window and be sure to add a summary of your changes.

4e. Create a new branch (1 pt)

As we did in lecture, create a new branch of your repository and name it, Second Branch

4f. Add a .txt file to the second branch (1 pt)

Add to your second branch a text file and name it README2.md.

4g. Publish your branch to the web (1 pt)

4h. Merge your new branch into the master branch (1 pt)

Now, merge Second Branch Into the Master Branch

4i. Tag your commit (1 pt)

Tag your commit on Github.com as Final Draft

4j. Provide us with the link to your repo (1 pt)

Copy and paste the link to your repo here

<!-- Link to Repo -->

Question 5 (1 pts)

How long did this problem set take you to complete? X hours

Submission

Please knit your markdown file and submit both the markdown file and the knitted pdf to canvas

NiklasTR/bmi713 documentation built on May 29, 2019, 11:01 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.