In rpruim/ISIwithR: Data sets for "Introduction to Statistical Investigations"

require(mosaic)
require(Tintle1)

Chapter 1: Significance

How should I name chunks? The Examples don't have clear ways to organize--sometimes I name them like "Figure1.2". The Explorations, I named "Exploration1.2.1" because they are organized as ordered questions. Any way to consistently name chunks?
Tintle uses simulations for every example in the text and I was wondering if we should change the default size for figures. I've been switching many to "fig4" but should we try something else?
Because Tintle uses simulations for every example, I've been trying to organize the examples by putting them in ordered lists (you can see this starting from chapter 1). Within the ordered list, I don't directly follow Tintle's order. Is this something I should continue doing or would you rather get rid of it?
Tintle focuses on introducing simulations first and then using theory in each of the chapters so I've been doing every example with both simulations and theory (doing 1000 repetitions vs. binom.test).
Is there any way to fit normal curves to dotPlots such as the one in "Figure1.20"?

Chapter 2: Generalization

I skipped all of Exploration 2.1A with dataset "GettysburgAddress". The dataset is in character form but the questions ask about the number of letters in each word and the inclusion of the letter "e" in each word. I thought it might be better just to have the dataset in a frame with two new columns that include this information. If not, how should I approach Exploration 2.1A questions?
Here is one way:

G <- data.frame(word=GettysburgAddress, stringsAsFactors=FALSE)
G %>% mutate( nchar = nchar(word), e=grepl("e", word)) -> G
histogram(~nchar, data=G, width=1)

I was trying to make the histogram on page 2-45 Figure2.9 and fitting a t-distribution like so:

histogram(~ result, data = simulation.time, groups = (result <= 6.29 | result >= 13.71),
  nint = 20, center = 10, fit = "t")

Is this not the way to fit a t-distribution on a histogram? Should I add a plot instead?

Chapter 3: Estimation

As I was working through 95% confidence intervals, I was wondering exactly what cdata calulated. So if I was trying to find the 95% CI of a simulation of 1000 proportions:

cdata(0.95, prop, data = simulation.amer)

Is this using a distribution of some type (i.e. normal)? or is it counting 950/1000 of the proportions in the middle of the simulation and giving the lower and higher limit values? Or is it something else completely?

I also realized I may want to go back to Lock5 to catch the nuances of the confidence intervals (probably after I finish the draft of Tintle).

Chapter 4: Causation

There's really not much here to do in R. Should I even include what's been done?

Chapter 5: Comparing Two Proportions

I've been making segmented bargraphs for this chapter and I couldn't figure out how to change the y-axis to show proportions or percentages instead of frequencies (this is important to show conditional proportions). Please see Figure5.1
Tintle works compares proportions through their differences but also relative risks. How do you find realtive risk in R and do simulations? The example is on page 5-28
There aren't datasets for many of the examples in chapter 5. I've been making them myself with the way you did in Lock5. Did you want to make them datasets?
I was also wondering if we could reorganize the datasets such that the default levels will match the hypotheses in the book. I think the clearest example is in Exploration 5.3: Donating Blood. The book states the null as "pi_2002 - pi_2004 = 0" but I did "pi_2004 - pi_2002" because that's the default I get when I do the diff in prop. I also had to change the level to "donated" because the default is "did.not". I can easily how this won't make a difference in the analysis but then Tintle asks about this in question 14 (Exploration5.3.14 on page 5-63). Further, if I do this:

prop.test(Response ~ Year, data = Blood)

I cannot seem to change the level to "donated" (nor can I change the alternative to one-sided tests and the conf.levels for the CIs)

So then, my next question is whether I should be doing this at all:

prop.test(Response ~ Year, data = Blood)

and instead just doing this for all of the difference in prop tests:

success <- c(230, 210)
n <- c(1336, 1362)
prop.test(success, n)

so that I can easily manipulate the levels, one-sided tests, and conf.levels (although for the simulations I would continue to have to use specified levels?).

(Those last two questions are a handful so if you didn't understand me, I can ask you when we talk in person. Also, chapter 5 is a mess right now because I wasn't checking whether the hypotheses matched the defaults so I wouldn't look too closely that all of the examples.)

Updated questions

Preliminaries

Is there a way to replicate Figure P.6 (pp.15)? This is what I want to graph:

game <- do(1000) * rflip(n = 10, prob = 1/3)    # 1000 samples, each of size 200 and proportion 1/3
head(game)

type = "l" - How would you do the Monty Hall problem in R? The above is a game picking one out of three doors. Tintle does the actual problem where after you pick one out of three doors, a goat is revealed in one of the two you don't pick. Then I'd simulate the success after the switch. (pp.15)

Chapter 1

For Figure1.20, in the third histogram, I wanted to group values more or as extreme as the observed statistic but the histogram is being colored in badly. How to fix?

Chapter 2

How do you get an ordered list of a data set? See Table 2.5 (pp.38) data is "TimeEstimate"
Part of Exploration 2.2 (pp.60, questions 24+) requires data sets called "Pop1", "Pop2", and "Pop3". They are not data sets in Tintle1. Could we add them? http://www.rossmanchance.com/ISIapplets.html under the "One Mean" section.

Chapter 6

I'm skipping Exploration 6.1B: Cancer Pamphlets because of the type of data and questions. Please let me know if you disagree.

Chapter 7

How do you do a simulation of differences in means? Specifically by "swapping" paired data? Like Table 7.3 and then 1000 repetitions? I've never done this with paired data (the data set is "FirstBase" in chapter 7).

Chapter 8

For simulations to compare multiple proportions, Tintle uses something called the "mean absolute difference" or "MAD". I haven't run across this statistic yet so I'm not quite sure how to simulate (pp.6, 10-12). But then later Tintle uses chi-sq as the theory based approach.
Example 8.2 uses a data set called "Acupuncture" and one of the variables is "Acupunture" without the c. I couldn't tell if it was a typo or if was used to differentiate the variable name from name of the data set.

Questions on Chapter 8-10

Chapter 8

For some of the data sets in this section (Towels and NightLight2) because they were in table format I made data frames out of them in order to do the shuffling for the simulations. (See Exploration 8.2 and Exploration 8.2b)
Example 8.2: How do you find the confidence interval for the difference in proportions for the three pairs of groups (last section of Example 8.2)?

acu.table <- tally(~ Improvement + Acupunture, data = Acupuncture)
acu.table
chisq.test(acu.table)

Exploration 8.2.17: Again, how do you find the confidence interval for the pairwise differences in proportions.

xchisq.test(Towels)

Chapter 9

Exploration 9.1.13: The MAD calculated by the online applet is 0.448 while the MAD function is returning 0.672. MAD for other data sets have been correct and the only difference between those data sets and this data set is that other data sets have four categories for its qualitative variable and "Brain" has four categories. (subsequent simualation of null dist of MAD should also then be incorrect)

MAD(mean(BrainChange ~ Treatment, data = Brain))

Follow-up analysis to Example 9.2: I'm trying to compute the confidence intervals for the differences in means

confint(lm(Recall ~ Condition, data = Recall))

results in the CI of Before-After and None-After. Is there any way to get the CI of Before-None?

Chapter 10

Figure 10.8: How do you replicate Figure 10.8, plotting residuals for best-fit line?

xyplot(size ~ year, data = PlateSize, type = c("p", "r"))
resid(lm(size ~ year, data = PlateSize))

Exploration 10.3.13: How do you find the the SSE if we were to use the average of the response (y bar) as the predicted vale for every x? I know that the SSE of the least squares line is:

deviance(lm(height ~ footlength, data = FootHeight))

Figure 10.15: I am trying to simulate a null distribution of t-statistics

sim.ratet <- do(1000) * coef(summary(lm(shuffle(HeartRate) ~ BodyTemp, data = TempHeart)))
head(sim.ratet, 10)

This is what I tried but the problem is that R is also giving me the t-stat for the interval. Is there any other way to simulate the t-stat or to do analysis (make a dotplot and find the p-val) with the t-stat of just the slope?

rpruim/ISIwithR documentation built on May 28, 2019, 1:30 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rpruim/ISIwithR
Data sets for "Introduction to Statistical Investigations"

In rpruim/ISIwithR: Data sets for "Introduction to Statistical Investigations"

Chapter 1: Significance

Chapter 2: Generalization

Chapter 3: Estimation

Chapter 4: Causation

Chapter 5: Comparing Two Proportions

Updated questions

Preliminaries

Chapter 1

Chapter 2

Chapter 6

Chapter 7

Chapter 8

More questions

Chapter 2

Chapter 5

Chapter 7

Questions on Chapter 8-10

Chapter 8

Chapter 9

Chapter 10

R Package Documentation

Browse R Packages

We want your feedback!

rpruim/ISIwithR Data sets for "Introduction to Statistical Investigations"

In rpruim/ISIwithR: Data sets for "Introduction to Statistical Investigations"

Chapter 1: Significance

Chapter 2: Generalization

Chapter 3: Estimation

Chapter 4: Causation

Chapter 5: Comparing Two Proportions

Updated questions

Preliminaries

Chapter 1

Chapter 2

Chapter 6

Chapter 7

Chapter 8

More questions

Chapter 2

Chapter 5

Chapter 7

Questions on Chapter 8-10

Chapter 8

Chapter 9

Chapter 10

R Package Documentation

Browse R Packages

We want your feedback!

rpruim/ISIwithR
Data sets for "Introduction to Statistical Investigations"