$\$
This is a short practice homework that will not be turned in. The purpose of this homework is to review concepts that we covered in class, to practice some basic analyses in R, and to make sure you have all the software installed that you will need for this class.
In order to complete this homework you will need to have the SDS100 package installed. You should also try to get LaTeX installed on your computer using so that you can knit this homework to a pdf. Instructions for installing the SDS100 package and LaTeX are at: https://github.com/emeyers/SDS100 .
I recommend getting started on this homework early so that you can solve any technical difficulties you run into prior to starting on the first real homework next week. Some functions you will find useful to complete the worksheet are: sqrt()
, c()
, sum()
, table()
, and prop.table()
, bar()
and pie()
. You will also use the get_sprinkle_sample()
function that is part of the SDS100 package.
For all the homework, it will be important to print out all your answers for your calculations in the R chunks by putting objects that contain results on a line by themselves which causes them to be printed in the R Markdown output. However, only print out meaningful results, and do not print out pages of data. It would be good to practice to do this on this homework so that you get into the habit!
If you need help with any of the homework assignments, please attend the TA office hours which are listed on Canvas and/or ask questions on Ed Discussions forum. Also, if you have completed the homework, please help others out by answering questions on Ed Discussions. Finally, if you have conceptual questions, reviewing the class slides and videos will be helpful.
$\$
library(knitr) opts_chunk$set(tidy.opts=list(width.cutoff=60)) # set the random number generator to always give the same sequence of random numbers set.seed(100) # In order to complete this homework you will need to install the SDS100 package. # Instructions for installing the SDS100 package are at: https://github.com/emeyers/SDS100 library(SDS100)
Please answer the following questions to make sure you understand the concepts covered in class. We will build on these concepts over the course of the semester so it is important you have a solid understanding now. Note: some symbols that might be useful for answering the questions include $\hat{p}$, $\mu$, $\bar{x}$ and $\pi$.
$\$
In class we discussed two types of numbers: 1) parameters and 2) statistics. Please describe:
Which type of number is associated with a sample and which is associated with a population? (parameter or statistic?)
Which type of number is associated with Plato and which with the shadows?
Which type of number are we most interested in to get at the Truth?
Which type of number is associated with a categorical distribution and which is associated with a bar chart?
Which type of number varies depending on the particular sample of data collected?
When discussing proportions, what is the symbol used to denote the parameter and which one is used to denote the statistic?
Answers
$\$
Please answer the following questions to get practice using a few basic R functions. Make sure you have a clear understanding of how to use this code since future class work will build on this knowledge.
Exercise 2.1: Let's get started by using R as a calculator. Use R to calculate the square root of 21.32, and then divide this number by 2.71.
# delete the below lines and replace with the correct math (2 + 3)^2
$\$
Exercise 2.2: Create a vector with the numbers 7, 15, 18, 3, 5, 12, and 20 in it and assign this vector to an object called my_vec. Multiply this vector by 2 and assign it to the object my_vec2. Finally, use the sum() function to sum all the values in the vector my_vec2.
$\$
Exercise 2.3 The code below will use the get_sprinkle_sample()
function to get a sample of n = 242 sprinkles and will save them in an object called my_sample
. Use the table() function to convert this sample into a table that has the count of the number of sprinkles of each color and assign the result to an object called sprinkle_table. Below report how many sprinkles there are of the color that had the most sprinkles. Also answer the question, if another sample of sprinkles was taken, do you think the same color would have the most sprinkles? Try it and find out!
my_sample <- get_sprinkle_sample(242) # use the table() function to create a table that has the count of sprinkles # of each color and assign the results to an object called sprinkle_table. # get a second sample
Answer: [Report which color had the most sprinkles and how many sprinkles were of that color? Also try this a second time and report how many sprinkles were in the category with the most sprinkles].
$\$
Exercise 2.4 Now use the function prop.table() to convert the counts of sprinkle colors in sprinkle_table into the proportion of sprinkles of each color. Save the results to an object called sprinkle_proportions. Report the proportion of pink sprinkles.
$\$
Answer: [Report the proportion of sprinkles that were pink.]
$\$
Exercise 2.5 Above we created samples that had n = 242 sprinkles. Now repeat the process several times by calculating the proportion of red sprinkles ($\hat{p}{red}$) using sample sizes of n = 1, 10, 100, 1,000, and 10,000, and 100,000; i.e., use the functions get_sprinkle_sample(), table(), prop.table() etc. once for each sample size. Do the estimates of the proportion of red sprinkles ($\hat{p}{red}$) change a lot depending on the sample size n? Which sample size (n) seems to give you the most accurate estimate?
$\$
Answer:
$\$
Please reflect on how the homework went by going to Canvas, going to the Quizzes link, and clicking on Reflection on homework 0.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.