README.md
In CDEager/nauf: Regression with NA Values in Unordered Factors

nauf 1.1.1

To install the nauf package, call:

install.packages("nauf")

It is often the case that a factor only makes sense in a subset of a dataset (i.e. for some observations a factor may simply not be meaningful or contrastive), or that with observational datasets there are no observations in some levels of an interaction term. There are also cases where a random effects grouping factor is only applicable in a subset the data, and it is desireable to model the noise introduced by the repeated measures on the group members within the subset of the data where the repeated measures exist. The nauf package allows unordered factors and random effects grouping factors to be coded as NA in the subsets of the data where they are not applicable or otherwise not contrastive. Sum contrasts are used for all unordered factors (using named_contr_sum in the standardize package), and then NA values are set to 0. This allows all of the data to be modeled together without creating collinearity or making the output difficult to interpret.

For example, in the fricatives dataset, the factor uvoi (underlying voicing) is not contrastive in Spanish, and in Catalan it can only be contrastive in for certain word positions, leading to an imbalanced distribution:

library(nauf)
#> Loading required package: standardize
#> Loading required package: lme4
#> Loading required package: Matrix
#> Loading required package: rstanarm
#> Loading required package: Rcpp
#> rstanarm (Version 2.15.3, packaged: 2017-04-29 06:18:44 UTC)
#> - Do not expect the default priors to remain the same in future rstanarm versions.
#> Thus, R scripts should specify priors explicitly, even if they are just the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())

summary(fricatives)
#>       dur              pvoi             lang        wordpos   
#>  Min.   : 29.71   Min.   :0.0625   Catalan:974   Final  :296  
#>  1st Qu.: 71.14   1st Qu.:0.5263   Spanish:648   Initial:397  
#>  Median : 86.29   Median :0.6471                 Medial :929  
#>  Mean   : 90.35   Mean   :0.6937                              
#>  3rd Qu.:105.53   3rd Qu.:0.9231                              
#>  Max.   :250.78   Max.   :1.0000                              
#>                                                               
#>           uvoi         speaker    
#>  Neutralized: 160   s12    : 142  
#>  Voiced     : 327   s09    : 141  
#>  Voiceless  :1135   s10    : 102  
#>                     s20    :  71  
#>                     s36    :  70  
#>                     s19    :  64  
#>                     (Other):1032

dat <- fricatives

u <- unique(dat[, c("lang", "wordpos", "uvoi")])
u <- u[order(u$lang, u$wordpos, u$uvoi), ]
rownames(u) <- NULL
u
#>      lang wordpos        uvoi
#> 1 Catalan   Final Neutralized
#> 2 Catalan Initial   Voiceless
#> 3 Catalan  Medial      Voiced
#> 4 Catalan  Medial   Voiceless
#> 5 Spanish   Final   Voiceless
#> 6 Spanish Initial   Voiceless
#> 7 Spanish  Medial   Voiceless

With nauf, we can code uvoi as NA when it is not contrastive, and include uvoi slopes only for speakers where it is contrastive by creating language-specific speaker columns set to NA for the opposite language:

u$uvoi[!(u$lang == "Catalan" & u$wordpos == "Medial")] <- NA
u
#>      lang wordpos      uvoi
#> 1 Catalan   Final      <NA>
#> 2 Catalan Initial      <NA>
#> 3 Catalan  Medial    Voiced
#> 4 Catalan  Medial Voiceless
#> 5 Spanish   Final      <NA>
#> 6 Spanish Initial      <NA>
#> 7 Spanish  Medial      <NA>

dat$uvoi[!(dat$lang == "Catalan" & dat$wordpos == "Medial")] <- NA
dat$c_speaker <- dat$s_speaker <- dat$speaker
dat$c_speaker[dat$lang != "Catalan"] <- NA
dat$s_speaker[dat$lang != "Spanish"] <- NA

sobj <- standardize(pvoi ~ lang * wordpos + uvoi +
  (1 + wordpos + uvoi | c_speaker) + (1 + wordpos | s_speaker),
  dat)

mod <- nauf_lmer(sobj$formula, sobj$data)

summary(mod)
#> Linear mixed model fit by REML ['nauf.lmerMod']
#> Formula: 
#> pvoi ~ lang * wordpos + uvoi + (1 + wordpos + uvoi | c_speaker) +  
#>     (1 + wordpos | s_speaker)
#>    Data: sobj$data
#> 
#> REML criterion at convergence: 3584.7
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -4.4721 -0.6148  0.0511  0.5195  3.1344 
#> 
#> Random effects:
#>  Groups    Name           Variance Std.Dev. Corr             
#>  c_speaker (Intercept)    0.154115 0.39258                   
#>            wordposFinal   0.042090 0.20516  -0.67            
#>            wordposInitial 0.034222 0.18499   0.36 -0.91      
#>            uvoiVoiced     0.028117 0.16768  -0.21  0.76 -0.73
#>  s_speaker (Intercept)    0.123763 0.35180                   
#>            wordposFinal   0.011092 0.10532   0.92            
#>            wordposInitial 0.006788 0.08239  -0.38  0.01      
#>  Residual                 0.482529 0.69464                   
#> Number of obs: 1622, groups:  c_speaker, 26; s_speaker, 16
#> 
#> Fixed effects:
#>                            Estimate Std. Error t value
#> (Intercept)                 0.02215    0.06339   0.349
#> langCatalan                 0.28058    0.06339   4.426
#> wordposFinal                0.46883    0.04183  11.208
#> wordposInitial             -0.40194    0.03791 -10.603
#> uvoiVoiced                  0.64890    0.04716  13.761
#> langCatalan:wordposFinal    0.28252    0.04183   6.754
#> langCatalan:wordposInitial -0.39017    0.03791 -10.292
#> 
#> Correlation of Fixed Effects:
#>             (Intr) lngCtl wrdpsF wrdpsI uvoVcd lngC:F
#> langCatalan -0.150                                   
#> wordposFinl  0.105 -0.416                            
#> wordposIntl  0.045  0.155 -0.669                     
#> uvoiVoiced  -0.098 -0.098  0.282 -0.236              
#> lngCtln:wrF -0.416  0.105  0.076 -0.184  0.282       
#> lngCtln:wrI  0.155  0.045 -0.184  0.057 -0.236 -0.669

anova(mod, method = "S")
#> Mixed Model Anova Table (Type III tests, S-method)
#> 
#> Model: pvoi ~ lang * wordpos + uvoi + (1 + wordpos + uvoi | c_speaker) + 
#> Model:     (1 + wordpos | s_speaker)
#> Data: $
#> Data: sobj
#> Data: data
#>              num Df den Df       F    Pr(>F)    
#> lang              1 29.382  19.591 0.0001215 ***
#> wordpos           2 30.293  71.535 3.349e-12 ***
#> uvoi              1 20.331 189.361 9.136e-12 ***
#> lang:wordpos      2 30.293  52.983 1.285e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Predicted marginal means can be calculated for specific subsets where a factor is contrastive. For example, uvoi is only contrastive for word-medial Catalan fricatives, so we could call:

rg <- nauf_ref.grid(mod)

nauf_pmmeans(rg, "uvoi", pairwise = TRUE,
  subset = list(lang = "Catalan", wordpos = "Medial")
)
#> 
#> Predicted marginal means for 'uvoi'
#> NA not considered a level for: 'uvoi'
#> 
#> Factors conditioned on: 'lang' 'wordpos' 
#> 
#> See the 'subset' element of the 'specs' attribute for subsetted groups
#> 
#> $pmmeans
#>  uvoi          pmmean        SE    df   lower.CL    upper.CL
#>  Voiced     0.9923974 0.1005598 25.74  0.7855905  1.19920437
#>  Voiceless -0.3054125 0.1163456 23.48 -0.5446837 -0.06614136
#> 
#> Confidence level used: 0.95 
#> 
#> $contrasts
#>  contrast           estimate         SE    df t.ratio p.value
#>  Voiced - Voiceless  1.29781 0.09431177 20.33  13.761  <.0001

CDEager/nauf documentation built on May 6, 2019, 9:24 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com