misc-functions: Add confidence interval and relative frequency variables

ciR Documentation

Add confidence interval and relative frequency variables

Description

Using prop.test(), ci adds three columns to a data frame:

  1. relative frequency (f)

  2. lower bound of a confidence interval (ci.low)

  3. upper bound of a confidence interval

Convenience function for converting frequency tables to instances per million.

Convenience function for converting frequency tables of alternative variants (generated with as.alternatives=TRUE) to percent.

Converts a vector of query or vc strings to typically appropriate legend labels by clipping off prefixes and suffixes that are common to all query strings.

Experimental convenience function for plotting typical frequency by year graphs with confidence intervals using ggplot2. Warning: This function may be moved to a new package.

Usage

ci(df, x = totalResults, N = total, conf.level = 0.95)

ipm(df)

percent(df)

queryStringToLabel(data, pubDateOnly = FALSE, excludePubDate = FALSE)

geom_freq_by_year_ci(mapping = aes(ymin = conf.low, ymax = conf.high), ...)

Arguments

df

table returned from frequencyQuery()

x

column with the observed absolute frequency.

N

column with the total frequencies

conf.level

confidence level of the returned confidence interval. Must be a single number between 0 and 1.

data

string or vector of query or vc definition strings

pubDateOnly

discard all but the publication date

excludePubDate

discard publication date constraints

mapping

Set of aesthetic mappings created by aes() or aes_(). If specified and inherit.aes = TRUE (the default), it is combined with the default mapping at the top level of the plot. You must supply mapping if there is no plot mapping.

...

Other arguments passed to geom_ribbon, geom_line, and geom_click_point.

Details

Given a table with columns f, conf.low, and conf.high, ipm ads a ⁠column ipm⁠ und multiplies conf.low and conf.high with 10^6.

Value

original table with additional column ipm and converted columns conf.low and conf.high

original table with converted columns f, conf.low and conf.high

string or vector of strings with clipped off common prefixes and suffixes

See Also

ci is already included in frequencyQuery()

Examples

## Not run: 

library(ggplot2)
kco <- new("KorAPConnection", verbose=TRUE)
expand_grid(year=2015:2018, alternatives=c("Hate Speech", "Hatespeech")) %>%
  bind_cols(corpusQuery(kco, .$alternatives, sprintf("pubDate in %d", .$year))) %>%
  mutate(total=corpusStats(kco, vc=vc)$tokens) %>%
  ci() %>%
  ggplot(aes(x=year, y=f, fill=query, color=query, ymin=conf.low, ymax=conf.high)) +
    geom_point() + geom_line() + geom_ribbon(alpha=.3)

## End(Not run)
## Not run: 

new("KorAPConnection") %>% frequencyQuery("Test", paste0("pubDate in ", 2000:2002)) %>% ipm()

## End(Not run)
## Not run: 

new("KorAPConnection") %>%
    frequencyQuery(c("Tollpatsch", "Tolpatsch"),
    vc=paste0("pubDate in ", 2000:2002),
    as.alternatives = TRUE) %>%
  percent()

## End(Not run)
queryStringToLabel(paste("textType = /Zeit.*/ & pubDate in", c(2010:2019)))
queryStringToLabel(c("[marmot/m=mood:subj]", "[marmot/m=mood:ind]"))
queryStringToLabel(c("wegen dem [tt/p=NN]", "wegen des [tt/p=NN]"))

## Not run: 
library(ggplot2)
kco <- new("KorAPConnection", verbose=TRUE)

expand_grid(condition = c("textDomain = /Wirtschaft.*/", "textDomain != /Wirtschaft.*/"),
            year = (2005:2011)) %>%
  cbind(frequencyQuery(kco, "[tt/l=Heuschrecke]",
                            paste0(.$condition," & pubDate in ", .$year)))  %>%
  ipm() %>%
  ggplot(aes(year, ipm, fill = condition, color = condition)) +
  geom_freq_by_year_ci()

## End(Not run)

KorAP/RKorAPClient documentation built on Feb. 6, 2024, 2:28 p.m.