require(mosaic) require(Tintle1)
How should I name chunks? The Examples don't have clear ways to organize--sometimes I name them like "Figure1.2". The Explorations, I named "Exploration1.2.1" because they are organized as ordered questions. Any way to consistently name chunks?
Tintle uses simulations for every example in the text and I was wondering if we should change the default size for figures. I've been switching many to "fig4" but should we try something else?
Because Tintle uses simulations for every example, I've been trying to organize the examples by putting them in ordered lists (you can see this starting from chapter 1). Within the ordered list, I don't directly follow Tintle's order. Is this something I should continue doing or would you rather get rid of it?
Tintle focuses on introducing simulations first and then using theory in each of the chapters so I've been doing every example with both simulations and theory (doing 1000 repetitions vs. binom.test).
Is there any way to fit normal curves to dotPlots such as the one in "Figure1.20"?
I skipped all of Exploration 2.1A with dataset "GettysburgAddress". The dataset is in character form but the questions ask about the number of letters in each word and the inclusion of the letter "e" in each word. I thought it might be better just to have the dataset in a frame with two new columns that include this information. If not, how should I approach Exploration 2.1A questions?
Here is one way:
G <- data.frame(word=GettysburgAddress, stringsAsFactors=FALSE) G %>% mutate( nchar = nchar(word), e=grepl("e", word)) -> G histogram(~nchar, data=G, width=1)
histogram(~ result, data = simulation.time, groups = (result <= 6.29 | result >= 13.71), nint = 20, center = 10, fit = "t")
Is this not the way to fit a t-distribution on a histogram? Should I add a plot instead?
cdata(0.95, prop, data = simulation.amer)
Is this using a distribution of some type (i.e. normal)? or is it counting 950/1000 of the proportions in the middle of the simulation and giving the lower and higher limit values? Or is it something else completely?
I've been making segmented bargraphs for this chapter and I couldn't figure out how to change the y-axis to show proportions or percentages instead of frequencies (this is important to show conditional proportions). Please see Figure5.1
Tintle works compares proportions through their differences but also relative risks. How do you find realtive risk in R and do simulations? The example is on page 5-28
There aren't datasets for many of the examples in chapter 5. I've been making them myself with the way you did in Lock5. Did you want to make them datasets?
I was also wondering if we could reorganize the datasets such that the default levels will match the hypotheses in the book. I think the clearest example is in Exploration 5.3: Donating Blood. The book states the null as "pi_2002 - pi_2004 = 0" but I did "pi_2004 - pi_2002" because that's the default I get when I do the diff in prop. I also had to change the level to "donated" because the default is "did.not". I can easily how this won't make a difference in the analysis but then Tintle asks about this in question 14 (Exploration5.3.14 on page 5-63). Further, if I do this:
prop.test(Response ~ Year, data = Blood)
I cannot seem to change the level to "donated" (nor can I change the alternative to one-sided tests and the conf.levels for the CIs)
prop.test(Response ~ Year, data = Blood)
and instead just doing this for all of the difference in prop tests:
success <- c(230, 210) n <- c(1336, 1362) prop.test(success, n)
so that I can easily manipulate the levels, one-sided tests, and conf.levels (although for the simulations I would continue to have to use specified levels?).
(Those last two questions are a handful so if you didn't understand me, I can ask you when we talk in person. Also, chapter 5 is a mess right now because I wasn't checking whether the hypotheses matched the defaults so I wouldn't look too closely that all of the examples.)
game <- do(1000) * rflip(n = 10, prob = 1/3) # 1000 samples, each of size 200 and proportion 1/3 head(game)
type = "l" - How would you do the Monty Hall problem in R? The above is a game picking one out of three doors. Tintle does the actual problem where after you pick one out of three doors, a goat is revealed in one of the two you don't pick. Then I'd simulate the success after the switch. (pp.15)
How do you get an ordered list of a data set? See Table 2.5 (pp.38) data is "TimeEstimate"
Part of Exploration 2.2 (pp.60, questions 24+) requires data sets called "Pop1", "Pop2", and "Pop3". They are not data sets in Tintle1. Could we add them? http://www.rossmanchance.com/ISIapplets.html under the "One Mean" section.
For simulations to compare multiple proportions, Tintle uses something called the "mean absolute difference" or "MAD". I haven't run across this statistic yet so I'm not quite sure how to simulate (pp.6, 10-12). But then later Tintle uses chi-sq as the theory based approach.
Example 8.2 uses a data set called "Acupuncture" and one of the variables is "Acupunture" without the c. I couldn't tell if it was a typo or if was used to differentiate the variable name from name of the data set.
To get it into the package, it needs to go into Package/data/. I've added it:
head(Pop)
relrisk(tally(~ Perception + Wording, data = GoodandBad))
will not return the correct relative risk. "the proportion with a positive perception is 2.49 times higher in the 'good year' group than the 'bad year' group" (Example 5.1). I can't figure out what value the function is returning because no conditional proportion ratio matches the one give by the function. Am I doing this incorrectly?
tally(~ Perception + Wording, data = GoodandBad, margins=TRUE) 4/19 / (8/11) relrisk(tally(~ Perception + Wording, data = GoodandBad)) 3/18 / (8/12) relrisk(tally(~ Wording + Perception, data = GoodandBad)) relrisk(tally( Wording ~ Perception, data=GoodandBad, format="count") ) 15/18 / (4/12)
Example 7.2 and includes simulation in difference in mean by shuffling (not paired). How to do this with data organized for paired? (please see dataset "FirstBase"")
Two solutions:
require(tidyr) FirstBase %>% gather(angle, time) %>% head() t.test(time ~ angle, data = FirstBase %>% gather(angle, time)) t.test(FirstBase$narrow, FirstBase$wide)
I guess I like the first one better.
For some of the data sets in this section (Towels
and NightLight2
) because they were in table format I made data frames out of them in order to do the shuffling for the simulations. (See Exploration 8.2 and Exploration 8.2b)
Example 8.2: How do you find the confidence interval for the difference in proportions for the three pairs of groups (last section of Example 8.2)?
acu.table <- tally(~ Improvement + Acupunture, data = Acupuncture) acu.table chisq.test(acu.table)
xchisq.test(Towels)
MAD(mean(BrainChange ~ Treatment, data = Brain))
confint(lm(Recall ~ Condition, data = Recall))
results in the CI of Before-After and None-After. Is there any way to get the CI of Before-None?
xyplot(size ~ year, data = PlateSize, type = c("p", "r")) resid(lm(size ~ year, data = PlateSize))
deviance(lm(height ~ footlength, data = FootHeight))
sim.ratet <- do(1000) * coef(summary(lm(shuffle(HeartRate) ~ BodyTemp, data = TempHeart))) head(sim.ratet, 10)
This is what I tried but the problem is that R is also giving me the t-stat for the interval. Is there any other way to simulate the t-stat or to do analysis (make a dotplot and find the p-val) with the t-stat of just the slope?
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.