knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(ggplot2)
library(lehmansociology)

This vignette shows you how to do descriptive statistics for interval variables. There are many ways to do each of these in R. This example is going to show one simple way using the poverty.states dataset. In a real data analysis you would never run all of these statistics at once.

You can see the actual poverty.states data set by typing View(poverty.states) in the console.

For this example we will use the variable PCTPOVALL_2013. We will refer to this variable as poverty.states$PCTPOVALL_2013.

This assumes you have no missing values, which is true for the poverty.states data.

`````r

# This gives us the number of observations.
length(poverty.states$PCTPOVALL_2013)
max(poverty.states$PCTPOVALL_2013)
min(poverty.states$PCTPOVALL_2013)
mean(poverty.states$PCTPOVALL_2013)
median(poverty.states$PCTPOVALL_2013)
sd(poverty.states$PCTPOVALL_2013)
var(poverty.states$PCTPOVALL_2013)
range(poverty.states$PCTPOVALL_2013)
sum(poverty.states$PCTPOVALL_2013)
#summary(poverty.states$PCTPOVALL_2013)
fivenum(poverty.states$PCTPOVALL_2013)
### Calculates the numbers associated to defined percentiles
quantile(poverty.states$PCTPOVALL_2013, c(.25, .5, .75, 1))
If you have missing values, for many of these the result will be missing (NA).
You really cannot count on having no missing values, so you are better off most of the time saying to ignore the missing values.

`````r

# This gives us the number of observations.

    max(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
    min(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
    mean(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
    median(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
    sd(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
    var(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
    range(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
    sum(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
    #summary(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
    fivenum(poverty.states$PCTPOVALL_2013, na.rm = TRUE)
 ### Calculates the numbers associated to defined percentiles
    quantile(poverty.states$PCTPOVALL_2013, c(.25, .5, .75, 1))

Let's see how this works.

Here's the mean of PCTPOVALL_2013 for all states

    mean(poverty.states$PCTPOVALL_2013)

Now let's set the PCTPOVALL_2013 in Wisconsin to missing.

    temp <- poverty.states$PCTPOVALL_2013
    temp[50]<- NA

Now if we calculate the mean.

    mean(temp)

So we try with na.rm = TRUE.

   mean(temp, na.rm = TRUE)

Overall, it is good to always be conscious of how you are handling missing values.

This summary is based on the work here: http://www.stats4stem.org/r-numerical-summaries---statistical.html



lehmansociology/lehmansociology documentation built on May 21, 2022, 9:06 p.m.