**NOTE:** 

This `pdf` was generated from the vignette
`R4SPISE2018::cheeseMCA` from the `R` Package 
`R4SPISE2018`. Check the help for the 
very last version of this document.

Prelude

If you want to make sure that you have a clean start, you can execute the following commands:

rm(list = ls())
graphics.off()

Or, better (see below, preamble), you can use an Rproject for this project.

# Important: Remember 
#     build the vignettes with devtools::build_vignettes()
knitr::opts_chunk$set( collapse = TRUE, comment = "#>")
knitr::opts_knit$get()

Preambule

Make sure that you start this analysis as a new Rproject so that the default directory will be correctly set.

Before we start the analysis, we need to have our standard packages installed (some from Github) and the corresponding libraries loaded:

and all their extensions:

# Decomment all/some these lines if the packages are not installed
# devtools::install_github('HerveAbdi/PTCA4CATA')
# devtools::install_github('HerveAbdi/DistatisR')
# devtools::install_github('HerveAbdi/R4SPISE2018') # of course!
#  install.packages(prettyGraphs)
#  install.packages('Matrix')
#  install.packages('dplyr')
#  install.packages('gridExtra')
#  install.packages('grid')
#  install.packages('gtable')
#  install.packages('stringi')
#  load the libraries that we will need
suppressMessages(library(Matrix))
suppressMessages(library(DistatisR))
suppressMessages(library(PTCA4CATA))
suppressMessages(library(prettyGraphs))
suppressMessages(library(ExPosition))
suppressMessages(library(dplyr))
suppressMessages(library(gridExtra)) # to save a table as a graph
suppressMessages(library(grid))      # that will be saved in the
suppressMessages(library(gtable))    # powerpoint with the figures

Introduction

The data set can be found from the package R4SPISE2018. The data are stored in an excel file called miniCheeseSurvey4MCA.xlsx whose location can be found and stored in the variable path2file using the R function system.functionfile() as shown in the following command: ```{R findDataPath} path2file <- system.file("extdata", "miniCheeseSurvey4MCA.xlsx", package = "R4SPISE2018")

```r", fig.height=3, fig.width=4, include=TRUE, out.width='70%'}

knitr::include_graphics('../man/figures/maroilesMCA.png')

Now the variable path2file contains the name (and location) of the excel data file. If you open this excel file, it will look like the figure above. The sorting data are stored in the sheet labeled Data. The first column gives the ID of the respondents and each column represents a question asked to the respondents. At the intersection of a row and a column we have the response of a respondent (row) to a question (column).

When you record your own data, make sure that you follow the same format, this way the script described in this vignette will apply to your own analysis with minimal change.

We will first compute the results of the analysis, then create the graphics, and finally save everything into a powerpoint.

Run the statistical analysis

Read the data

Recall that, the excel file name and location (i.e., path) are stored in the variable path2file. To read the data we will use the function DistatisR::read.df.excel() (based upon the function readxl::read_excel()).

rawData <- read.df.excel(path = path2file, sheet = 'Data')$df.data

The data are now read and stored in the data frame called rawData.

To save the file

To save the data file (under the---not very---original name of myDataFile or maybe a more informative name) in a directory (say Downloads). use the following command

saveFile <- file.copy(from = path2file, to = '~/Downloads/myDataFile.xlsx')

Analysis plan

In this example we will analyze the answers to two sets of questions.

The first set evaluates the knowledge of the respondent: and it comprises the questions Q01_Know1 to Q08_Know8 (or 1 to 8); these questions have 5 different options one correct and 4 incorrect options. Here we have decided to code these questions with only two values: correct or incorrect.

the second set comprises the questions Q15_C01 to Q38_C24 (or 32 to 55); these questions evaluate the behaviors, opinions, or attitudes of the respondents toward cheese that are either farm-made or industrial. These questions are answered with a 4 point Likert scale (from 1 meaning "I totally agree" to 4 meaning "I totally disagree")

In addition we have some information about the respondents: Sex (Q09_Sex), Age (Q10_Age), the city where they live (Q11_City).

Selection and recoding

In MCA, before proceeding to analysis per se, all variables need to be nominal variables with the constraint that the different levels of a variable are roughly balanced. When a variable is already nominal, the process is straightforward but some levels may need to be fused to have the levels roughly balanced. For ordinal or quantitative data the procedure is to "bin" the data so that the bins are roughly balanced

From one dataframe to the other

From the dataframe rawData we know create the new dataframe cleanData. It will have the same rows but the columns are obtained by recoding the columns of rawData.

The first step is to create the new data frame with only the variables we want to use; we also re-order the variables to have the respondent variables (i.e., Sex, Age, city) first. This is done with this line of code

temp.df <- dplyr::select(rawData, Q09_Sex:Q11_City, Q01_Know1:Q08_Know8, 
                   Q15_C01:Q38_C24)

Participant description

We use car::recode to recode the variable sex from 1/2 to m/f

# We recode sex as m/f
sex  =  car::recode(temp.df[,'Q09_Sex'], " 1 = 'm';2 = 'f'")

Knowledge

The questions Q01_Know1 to Q08_Know8 are knowledge questions their sum could go from 0 to 8:

# correct responses
correct.Know = c(3,3,1,2,1,1,1,3)
# where are the knowledge questions
knowQuestions = (substr( colnames(temp.df),5,8)   == 'Know') 
# Get the number of correct amswers
correct.answers =  rowSums(matrix(correct.Know, nrow = nrow(temp.df), 
                   ncol = length(correct.Know), byrow = TRUE) ==
           temp.df[,which(knowQuestions)])

But in this sample the sum goes from 1 to 6 with a mode at 4/5 as can be seen from the histogram of the number of correct answers.

hist(correct.answers, main = '', breaks = 0:8,  xlab = 'Number of Correct Answers')

we will concatenate them.and re-evaluate them from 1 to 3

# Make only three categories use recode from package car
know =  car::recode(correct.answers, "0 = 1; 1 = 1; 2 = 1; 3 = 1; 4 = 2; 5 = 3; 6 = 3")

We will also recode the variable Age into four levels

oriAge <- temp.df$Q10_Age
newAge  <- car::recode(oriAge, "1 = 1; 2 = 1; 3 = 2; 4 = 3; 5 = 4; 6 = 4")

Attitudes

The first step when analyzing Likert scales is to look at the distribution of the scores. Here we will create a table with the questions as rows and the columns as levels (from 1 to 4). ```r', outwidth = '100%'} knitr::kable( t(apply(rawData[,32:55],2,function(x){summary(as.factor(x))})))

This table shows that the levels of the responses are not very balanced,
so we decided to binarize the answers and to keep only the subset
of questions that can be binarized as {1,2}|{3,4} 
(other choices would have been acceptable of course).
The set that we keep is in the variable `Lick2Keep`
```r
Lick2Keep = c('Q15_C01','Q19_C05','Q21_C07',
               'Q24_C10','Q25_C11','Q32_C18')
Lickert <- temp.df[, colnames(temp.df) %in% Lick2Keep]
recl <- function(x){y = car::recode(x, "1 = 1; 2 = 1; 3 = 2; 4 = 2") }
rec.Lickert <- apply(Lickert,2,recl)

New cleaned data set

We can now put together the new data set

cleanData <- cbind(sex,newAge,rawData$Q11_City,know,rec.Lickert)
colnames(cleanData) <- c('Sex','Age','City','Know',
                          'C01','C05','C07','C10','C11','C18')
# make sure that there is no NA left
cleanData <- cleanData[complete.cases(cleanData),]

The dataframe is now ready to be analyzed with multiple correspondence analysis

Multiple Correspondence Analysis

Complete analysis

Use the function epMCA from the package ExPosition

resMCA <- epMCA(cleanData, graphs = FALSE) 

Subset analysis

Because Dimension 1 mostly separates the inhabitants of Lille from the inhabitants of Angers, we decided to perform a subset analysis and analyze separately the data from each city.

Lille

Create the "Lille" data set by subsetting the whole data set.

 cleanData.Lille = cleanData[cleanData[,'City'] == 'Lille',]
 cleanData.Lille <- cleanData.Lille[, (colnames(cleanData.Lille) != 'City') ]

MCA Lille

resMCA.Lille <- epMCA(cleanData.Lille, graphs = FALSE) 

Angers

Create the "Angers" data set by subsetting the whole data set.

 cleanData.Angers <- cleanData[cleanData[,'City'] == 'Angers',]
 cleanData.Angers <- cleanData.Angers[, (colnames(cleanData.Angers) != 'City') ]

MCA Angers

 resMCA.Angers <- epMCA(cleanData.Angers, graphs = FALSE) 

Graphics

Make some graphs.

First, get the colors for the variables

cJ <- resMCA$ExPosition.Data$cj
color4Var <- prettyGraphs::prettyGraphsColorSelection(ncol(cleanData))

Variable contributions

# Extract the root names before the "."
varNames <- stringi::stri_extract(rownames(cJ),regex = '[^.]*')
varCtr.tmp <- aggregate(cJ ~ varNames, (cbind(varNames,cJ)),sum)
varCtr <- varCtr.tmp[,-1]
rownames(varCtr)    <- varCtr.tmp[,1]
rownames(color4Var) <- varCtr.tmp[,1]

The variable contributions show the variables important for a given dimension.

nFact <- min(5, ncol(cJ) - 1)
# knitr::kable(round( varCtr[,1:nFact]*1000 ) )
# save table as a graph
ctrTable <- tableGrob(round(varCtr[,1:nFact]*1000))
h <- grobHeight(ctrTable)
w <- grobWidth(ctrTable)
title <- textGrob("Variable Contributions",
                  y = unit(0.5,"npc") + 1.2*h, 
                  vjust = 0,
                  gp = gpar(fontsize=15))
TableWithTitle <- gTree(children = gList(ctrTable, title))
grid.draw(TableWithTitle)
a00.1.ctrTable  <- recordPlot()

To create graphics we will color the labels of a given variable with the same color. First we create the vector storing the colors.

nM   <- nrow(cJ)
nVar <- nrow(color4Var)
col4Labels <- rep("",nM)

for (i in 1:nVar){
  lindex <- varNames %in% rownames(color4Var)[i]
  col4Labels[lindex] <- color4Var[i]
}

Scree plot for the MCA

# 5.A. A scree plot for the RV coef. Using standard plot (PTCA4CATA)
scree.mca <- PlotScree(ev = resMCA$ExPosition.Data$eigs, 
                   p.ev = NULL, max.ev = NULL, alpha = 0.05,
                   col.ns = "#006D2C", col.sig = "#54278F",
                   title = "MCA. Explained Variance per Dimension", 
                   plotKaiser = FALSE,
                   color4Kaiser = "darkorchid4", 
                   lwd4Kaiser = 2.5)
a1.Scree <- recordPlot() # Save the plot

Variable Map

axis1 = 1
axis2 = 2
Fj <- resMCA$ExPosition.Data$fj
# generate the set of maps
BaseMap.Fj <- createFactorMap(X = Fj , # resMCA$ExPosition.Data$fj,
                              axis1 = axis1, axis2 = axis2,
                              constraints = NULL,
                              title = 'MCA. Variables', 
                              col.points = col4Labels,
                              display.points = TRUE,
                              pch = 19, cex = 1,
                              display.labels = TRUE,
                              col.labels = col4Labels,
                              text.cex = 2.5, font.face = "bold",
                              font.family = "sans",
                              col.axes = "darkorchid",
                              alpha.axes = 0.2,
                              width.axes = 1.1,
                              col.background = adjustcolor("lavender",
                                                       alpha.f = 0.2),
                              force = 2, segment.size = 0)
# add labels
labels4MCA <- createxyLabels.gen(x_axis = axis1,
                                   y_axis = axis2,
               lambda = resMCA$ExPosition.Data$eigs,
               tau = resMCA$ExPosition.Data$t)
# make the maps
aa.1.BaseMap.Fj <- BaseMap.Fj$zeMap +  labels4MCA 
aa.2.BaseMapNoDot.Fj  <- BaseMap.Fj$zeMap_background +
                          BaseMap.Fj$zeMap_text + labels4MCA 
print(aa.1.BaseMap.Fj)

Map for the observations

Here we color the observations by City, to confirm that this is, indeed, the main source of variation for this data set.

Fi <- resMCA$ExPosition.Data$fi
colCity <- c('darkblue', 'red4')
nI <- nrow(Fi)
col4I.City <- rep("",nI)
for (i in 1:length(colCity) ){
  lindex <- cleanData[,'City'] %in% unique(cleanData[,'City'])[i]
  col4I.City[lindex] <- colCity[i]
}
# generate the set of maps
BaseMap.Fi <- createFactorMap(X = Fi , # resMCA$ExPosition.Data$fj,
                              axis1 = axis1, axis2 = axis2,
                              constraints = NULL,
                              title = 'MCA. Variables', 
                              col.points = col4I.City,
                              alpha.points = .2,
                              display.points = TRUE,
                              pch = 19, cex = .8,
                              display.labels = TRUE,
                              col.labels = col4I.City,
                              text.cex = 2.5, font.face = "bold",
                              font.family = "sans",
                              col.axes = "darkorchid",
                              alpha.axes = 0.2,
                              width.axes = 1.1,
                              col.background = adjustcolor("lavender",
                                                       alpha.f = 0.2),
                              force = 2, segment.size = 0)
# make the maps
aa.5.BaseMapNoLabels.Fi  <- BaseMap.Fi$zeMap_background +
                          BaseMap.Fi$zeMap_dots + labels4MCA 
print(aa.5.BaseMapNoLabels.Fi)

Graphs for the sub-analyses

Graph of the Variables for Angers

col4Labels.sub <- col4Labels[varNames != 'City']
axis1 = 1
axis2 = 2
Fj.Angers <- resMCA.Angers$ExPosition.Data$fj
# generate the set of maps
BaseMap.Fj.Angers <- createFactorMap(X = Fj.Angers , # resMCA$ExPosition.Data$fj,
                              axis1 = axis1, axis2 = axis2,
                              constraints = NULL,
                              title = 'Angers. MCA. Variables', 
                              col.points = col4Labels.sub,
                              display.points = TRUE,
                              pch = 19, cex = 1,
                              display.labels = TRUE,
                              col.labels = col4Labels.sub,
                              text.cex = 2.5, font.face = "bold",
                              font.family = "sans",
                              col.axes = "darkorchid",
                              alpha.axes = 0.2,
                              width.axes = 1.1,
                              col.background = adjustcolor("lavender",
                                                       alpha.f = 0.2),
                              force = 2, segment.size = 0)
# add labels
labels4MCA.Angers <- createxyLabels.gen(x_axis = axis1,
                                   y_axis = axis2,
               lambda = resMCA.Angers$ExPosition.Data$eigs,
               tau = resMCA.Angers$ExPosition.Data$t)
# make the maps
ca.1.BaseMap.Fj.Angers       <- BaseMap.Fj.Angers$zeMap +  labels4MCA.Angers 
ca.2.BaseMapNoDot.Fj.Angers  <- BaseMap.Fj.Angers$zeMap_background +
                          BaseMap.Fj.Angers$zeMap_text + labels4MCA.Angers 
print(ca.1.BaseMap.Fj.Angers)

Graph of the Variables for Lille

col4Labels.sub <- col4Labels[varNames != 'City']
axis1 = 1
axis2 = 2
Fj.Lille <- resMCA.Lille$ExPosition.Data$fj
# generate the set of maps
BaseMap.Fj.Lille <- createFactorMap(X = Fj.Lille , # resMCA$ExPosition.Data$fj,
                              axis1 = axis1, axis2 = axis2,
                              constraints = NULL,
                              title = 'Lille. MCA. Variables', 
                              col.points = col4Labels.sub,
                              display.points = TRUE,
                              pch = 19, cex = 1,
                              display.labels = TRUE,
                              col.labels = col4Labels.sub,
                              text.cex = 2.5, font.face = "bold",
                              font.family = "sans",
                              col.axes = "darkorchid",
                              alpha.axes = 0.2,
                              width.axes = 1.1,
                              col.background = adjustcolor("lavender",
                                                       alpha.f = 0.2),
                              force = 2, segment.size = 0)
# add labels
labels4MCA.Lille <- createxyLabels.gen(x_axis = axis1,
                                   y_axis = axis2,
               lambda = resMCA.Lille$ExPosition.Data$eigs,
               tau    = resMCA.Lille$ExPosition.Data$t)
# make the maps
cb.1.BaseMap.Fj.Lille       <- BaseMap.Fj.Lille$zeMap + labels4MCA.Lille 
cb.2.BaseMapNoDot.Fj.Lille  <- BaseMap.Fj.Lille$zeMap_background +
                          BaseMap.Fj.Lille$zeMap_text + labels4MCA.Lille 
print(cb.1.BaseMap.Fj.Lille)

Save the graphics as a powerpoint

The graphics are saved as a powerpoint called 'cheeseFrom2Cities.pptx' with the following command

name4Graphs = 'cheeseFrom2Cities.pptx'
list2Graphs <- PTCA4CATA::saveGraph2pptx(file2Save.pptx = name4Graphs, 
                 title = 'Attitudes toward Cheese', 
                 addGraphNames = TRUE)

Note that we could also have created a powerpoint with Rmarkdown by using the following options in the preamble:

output:
      powerpoint_presentation:
           slide_level: 4

instead of (for example):

output:
       rmarkdown::html_vignette:
          toc: true
          number_sections: true


HerveAbdi/R4SPISE2018 documentation built on Sept. 3, 2020, 6:40 a.m.