# Important: Remember 
#     build the vignettes with devtools::build_vignettes()
knitr::opts_chunk$set(
  collapse = TRUE,
  fig.width = 9,
  comment = "#>"
)
knitr::opts_knit$get()


```r


**NOTE:** 

This `pdf` was generated from the vignette
`R4SPISE2018::cheeseMCA` from the `R` Package 
`R4SPISE2018`. Check the help for the 
very last version of this document.

Prelude

If you want to make sure that you have a clean start, you can execute the following commands:

rm(list = ls())
graphics.off()

Or, better, you can use an Rproject for this project (see preamble below).

Preamble

Rproject is best

Make sure that you start this analysis as a new Rproject so that the default directory will be correctly set and the memory has been cleared.

(Install the packages if not already done, and then) Load the libraries

Before we start the analysis, we need to have our three standard packages installed (from Github) and the corresponding libraries loaded:

We also need some other packages namely:

# Decomment all/some these lines if the packages are not installed
# devtools::install_github('HerveAbdi/PTCA4CATA')
# devtools::install_github('HerveAbdi/DistatisR')
#  install.packages(prettyGraphs)
#  install.packages('Matrix')
#
#  load the libraries that we will need
suppressMessages(library(Matrix))
suppressMessages(library(factoextra))
suppressMessages(library(DistatisR))
suppressMessages(library(PTCA4CATA))
suppressMessages(library(prettyGraphs))
suppressMessages(library(ExPosition))

Introduction

The data set can be found from the package R4SPISE2018. The data are stored in an excel file called multiculturalSortingSpices.xlsx whose location can be found and stored in the variable path2file using the R function system.functionfile() as shown in the following command: ```{R findDataPath} path2file <- system.file("extdata", "multiculturalSortingSpices.xlsx", package = "R4SPISE2018")

```r", fig.height=3, fig.width=4, include=TRUE, out.width='70%'}
# label.spicesxl = capFigNo
# ![Toto ](Screen Shot 2018-07-09 at 13.51.37.png)
knitr::include_graphics('../man/figures/SpiceXls.png')

if you open this excel file, it will look like the Figure above. The sorting data are stored in the sheet DataSort. In this sheet, the first column gives the names of the products (here spices) and the following columns give how each Judge sorted the products: The products that were sorted together are assigned the same number (arbitrarily chosen).

When you record you own data, make sure that you follow the same format, this way the script described in this vignette will apply to your own analysis with minimal change.

We will first compute the results of the analysis, then create the graphics, and finally save everything into a powerpoint.

Run the statistical analysis

Read the data

The excel file name and location (i.e., path) are stored in the variable path2file. To read the data we will use the function DistatisR::read.df.excel() (based upon the function readxl::read_excel()).

multiSort.list <- read.df.excel(path = path2file, sheet = 'DataSort')
multiSort <- multiSort.list$df.data

The raw data are now stored into the list multiSort.list where the sorting data are in the dataframe called multiSort.list$df.data. (this structure is used so that when the assessors describe their sort with words, the vocabulary is stored in the same list here the vocabulary would have been stored in multiSort.list$df.voc). The sorting data are stored in the dataframe multiSort.

To save the file

To save the data file (under the original name of myDataFile or maybe a under a more informative name) in a directory (called, for example, Downloads), use the following command:

saveFile <- file.copy(from = path2file, to = '~/Downloads/myDataFile.xlsx')

Check (eye-balling) the data

To make sure that we have read the correct file we can peek at the dataframe multiSort and look at the data for the first 5 spices of the first 10 assessors:.

knitr::kable(multiSort[1:5,1:10])

The nationality of the assessors is indicated by the first letter of their name and and so we can create a vector (called descJudges) giving the nationality of the assessors by extracting the first letter of their names:

descJudges <- substr(colnames(multiSort),1,1)

Also, we are going to associate a color to each country for the Judges, using the function createColorVectorsByDesign() from the package prettyGraphs.

# Create a 0/1 group matrix with ExPosition::makeNominalData()
nominal.Judges <- makeNominalData(as.data.frame(descJudges))
# get the colors
color4Judges.list <- prettyGraphs::createColorVectorsByDesign(
              nominal.Judges)
# color4Judges.list

Get the brick of distance

The first step of the analysis is to transform the sorting data into a brick of distance matrices. This is done with the function DistanceFromSort()'

DistanceCube <- DistanceFromSort(multiSort)

Run plain DISTATIS

The brick of distance matrices (multiSort) is used as the argument of the function distatis that will compute the plain DISTATIS for the sorting task.

resDistatis <- distatis(DistanceCube)

The list resDistatis contains the results of the analysis; This list,, in turnn contains two other lists: The first list, called resDistatis$res4Cmat, contains the results for the $R_V$ analysis (the $R_V$ matrix is also called the C matrix); The second list, called resDistatis$res4Splus, contains the results for the analysis of the compromise (called the S matrix) . If you have forgotten the output format of distatis, or want to have more information: have a look at the help for the functiondistatis(): in the console type ?distatis, or print resDistatis.

Get group means in the RV space

We want to find if there are differences between the different national groups of judges. The first step is to compute the mean of each of the groups and their bootstrap confidence intervals

# Get the factors from the Cmat analysis
G <- resDistatis$res4Cmat$G 
countryMeans.tmp <- aggregate(G, list(descJudges), mean) 
countryMeans <- countryMeans.tmp[,2:ncol(countryMeans.tmp )] 
rownames(countryMeans) <- countryMeans.tmp[,1]
# countryMeans[,1:4]
BootCube <- PTCA4CATA::Boot4Mean(G, design = descJudges,
                       niter = 100,
                       suppressProgressBar = TRUE)
# head(BootCube)

Compute partial map (countries)

The compromise is obtained as a weighted sum of the $\alpha_j\mathbf{S}_j$ with each of the $J$ judges belonging to one of $K$ groups (i.e., country). In plain DISTATIS, each observation can be projected onto the compromise; in a similar manner, when the assessors are nested into another group factor (e.g., countries), we can project these groups onto the compromise. The easiest way to do so is to use the partial projections and then compute the weighted sum corresponding to each group.

F_j     <- resDistatis$res4Splus$PartialF
alpha_j <- resDistatis$res4Cmat$alpha
# create the groups of Judges
groupsOfJudges <- substr(names(alpha_j),1,1)
code4Groups <- unique(groupsOfJudges)
nK <- length(code4Groups)
# initialize F_K and alpha_k
F_k <- array(0, dim = c(dim(F_j)[[1]], dim(F_j)[[2]],nK))
dimnames(F_k) <- list(dimnames(F_j)[[1]], 
                         dimnames(F_j)[[2]], code4Groups)
alpha_k <- rep(0, nK)
names(alpha_k) <- code4Groups
Fa_j <- F_j
# A horrible loop
for (j in 1:dim(F_j)[[3]]){ Fa_j[,,j]  <- F_j[,,j] * alpha_j[j] }
# Another horrible loop
for (k in 1:nK){
  lindex <- groupsOfJudges == code4Groups[k]
  alpha_k[k] <- sum(alpha_j[lindex])
  F_k[,,k] <- (1/alpha_k[k])*apply(Fa_j[,,lindex],c(1,2),sum)
}

Graphics

Most of the graphics will be created with either prettyGraphs or with PTCA4CATA, The graphs will be saved in a powerpoint file.

The RV analysis

Scree plot for RV

In the first map we create the RV map

# 5.A. A scree plot for the RV coef. Using standard plot (PTCA4CATA)
scree.rv.out <- PlotScree(ev = resDistatis$res4Cmat$eigValues, 
                   p.ev = NULL, max.ev = NULL, alpha = 0.05,
                   col.ns = "#006D2C", col.sig = "#54278F",
                   title = "RV-map: Explained Variance per Dimension", 
                   plotKaiser = FALSE,
                   color4Kaiser = "darkorchid4", 
                   lwd4Kaiser = 2.5)
a1.Scree.RV <- recordPlot() # Save the plot

Factor Map for RV

# Create the layers of the map
gg.rv.graph.out <- createFactorMap(X = resDistatis$res4Cmat$G, 
                            axis1 = 1, axis2 = 2, 
                            title = "Judges: RVMap", 
                            col.points = color4Judges.list$oc, 
                            alpha.points = 0.5,
                            display.points = TRUE, 
                            pch = 19, cex = 2.5, 
                            display.labels = TRUE,
                            col.labels = color4Judges.list$oc, 
                            alpha.labels = 1, text.cex = 4,
                            font.face = "bold", font.family = "sans", 
                                   col.axes = "darkorchid",
                                   alpha.axes = 0.2, width.axes = 1.1,
                            col.background = adjustcolor("lavender", 
                                                       alpha.f = 0.2), 
                            force = 1,
                            segment.size = 0)
# # Create the map from the layers
a2a.gg.RVmap <- gg.rv.graph.out$zeMap + 
     createxyLabels.gen(
       lambda = resDistatis$res4Cmat$eigValues , 
       tau    = resDistatis$res4Cmat$tau,
       axisName = "Dimension ")
# NB could be of interest to color the Assessors by condition 
#        rather than using their names. Done below
a2b.gg.RVmap <- gg.rv.graph.out$zeMap_background +
                gg.rv.graph.out$zeMap_dots +
                createxyLabels.gen(
                  lambda = resDistatis$res4Cmat$eigValues , 
                  tau    = resDistatis$res4Cmat$tau,
                  axisName = "Dimension ")

Print the RV map

To print the RV map we simply use the function print() as described below:

print(a2a.gg.RVmap )

An RV map with group means and confidence intervals

# First the means
# A tweak for colors
in.tmp    <- sort(rownames(color4Judges.list$gc), index.return = TRUE)$ix
col4Group <- color4Judges.list$gc[in.tmp]
#
gg.rv.means <- PTCA4CATA::createFactorMap(countryMeans,
axis1 = 1, axis2 = 2, 
constraints = gg.rv.graph.out$constraints, title = NULL,
col.points =  col4Group ,
alpha.points = 1, # no transparency
display.points = TRUE,
pch = 19, cex = 5,
display.labels = TRUE,
col.labels = col4Group , 
  text.cex = 4,
font.face = "bold",
font.family = "sans", col.axes = "darkorchid", 
alpha.axes = 0.2, width.axes = 1.1, 
col.background = adjustcolor("lavender", alpha.f = 0.2), 
force = 1, segment.size = 0)
#
 dimnames(BootCube$BootCube)[[2]] <- 
                    paste0('dim ',1: dim(BootCube$BootCube)[[2]])
  #c('Dim1','Dim2') 
GraphElli.rv <- MakeCIEllipses(BootCube$BootCube[,1:2,],
                 names.of.factors = c("dim 1","dim 2"), 
                 col = col4Group, 
                 p.level = .95)
a2d.gg.RVMap.CI <- a2b.gg.RVmap + gg.rv.means$zeMap_dots + GraphElli.rv 

# countryMeans[,1:2]

The means of the assessors from the 5 countries differs as can be seen in the table of mans for the first three dimension

knitr::kable(countryMeans[,1:3])

To evaluate the significance of these differences, we plot the means and their bootstrapped derived confidence intervals on the RV map. This map pinpoints one significant difference between the Vietnamese and the American group (see below).

print(a2d.gg.RVMap.CI )

Post-Hoc Analysis: Exploring the individual participants between participants

Here we conduct a hierarchical cluster analysis on the factor scores of the RV matrix.

 D <- dist(resDistatis$res4Cmat$G, method = "euclidean")
 fit <- hclust(D, method = "ward.D2")
 a05.tree4participants <- fviz_dend(fit,  k = 1, 
                        k_colors = 'burlywood4', 
                        label_cols = color4Judges.list$oc[fit$order],
                        cex = .7, xlab = 'Participants',
                        main = 'Cluster Analysis: Participants') 
 print(a05.tree4participants)

The cluster analysis suggests that there are groups of participants, but how many groups maybe more difficult to evaluate. Also, the cluster analysis confirms that nationalities is not a strongly segmenting factor (because the colors do not strongly cluster on the tree).

The compromise

Scree for the compromise

First a scree plot of the compromise

#---------------------------------------------------------------------
# A scree plot for the Compromise.
S.eig <- eigen(resDistatis$res4Splus$Splus, 
               symmetric = TRUE, 
               only.values = TRUE)$values
scree.S.out <- PlotScree(
              ev = S.eig, 
              p.ev = NULL, max.ev = NULL, alpha = 0.05,
              col.ns = "#006D2C", col.sig = "#54278F",
              title = "Compromise: Explained Variance per Dimension", 
              plotKaiser = FALSE,
              color4Kaiser = 
              "darkorchid4", lwd4Kaiser = 2.5)
b1.Scree.S <- recordPlot()
#---------------------------------------------------------------------

The compromise

# 4.1 Get the bootstrap factor scores (with default 1000 iterations)
BootF <- BootFactorScores(resDistatis$res4Splus$PartialF)
# 5.2 a compromise plot
# General title for the compromise factor plots:
genTitle4Compromise = 'Spices: Compromise.'
# To get graphs with axes 1 and 2:
h_axis = 1
v_axis = 2
# To get graphs with say 2 and 3 
# change the values of v_axis and h_axis
color4Spices <- #  Create color for the Products from prettyGraph
 prettyGraphsColorSelection(n.colors = nrow(resDistatis$res4Splus$F))
gg.compromise.graph.out <- createFactorMap(resDistatis$res4Splus$F,
                                    axis1 = h_axis, 
                                    axis2 = v_axis,
                                    title = genTitle4Compromise,
                                    col.points = color4Spices ,
                                    col.labels = color4Spices 
) 

b2.gg.Smap <-  gg.compromise.graph.out$zeMap + createxyLabels.gen(
  x_axis = h_axis, y_axis = v_axis,
  lambda = S.eig , 
  tau    = round(100* S.eig / sum(S.eig)),
  axisName = "Dimension ")
#  
# 5.4 a bootstrap confidence interval plot 
# 5.3  create the ellipses
gg.boot.graph.out.elli <- MakeCIEllipses(
                              data = BootF[,c(h_axis,v_axis),],
                              names.of.factors = 
                                c(paste0('Factor ',h_axis),
                                  paste0('Factor ',v_axis)),
                              col = color4Spices,
)  
# Add ellipses to compromise graph
b3.gg.map.elli <- gg.compromise.graph.out$zeMap + 
  gg.boot.graph.out.elli +
  createxyLabels.gen(
    x_axis = h_axis, y_axis = v_axis,
    lambda = S.eig , 
    tau    = round(100* S.eig / sum(S.eig)) ,
    axisName = "Dimension ")
#

The compromise map

# Print the product graph
print(b2.gg.Smap)

The compromise map with confidence ellipsoids

# Print the product graph
print(b3.gg.map.elli)

Map of compromise with partial factor scores

# get the partial map
map4PFS <- createPartialFactorScoresMap(
  factorScores = resDistatis$res4Splus$F,      
  partialFactorScores = F_k,  
  axis1 = 1,
  axis2 = 2,
  colors4Items = as.vector(color4Spices), # default is NULL
  colors4Blocks = NULL,  #  c('blue','red')
  names4Partial = dimnames(F_k)[[3]], # Short names default is NULL
  alpha.lines  = .5,
  size.lines   = .75,
  type.lines = 1,
  arrow.length = 0,
  alpha.points = .7,
  shape.points = 23,
  size.points  = 1,
  alpha.labels = .7,
  font.labels = 'bold',
  family.labels = 'sans',
  size.labels  = 2)

d1.partialFS.map.byProducts <- gg.compromise.graph.out$zeMap + map4PFS$mapColByItems
d2.partialFS.map.byCategories  <- gg.compromise.graph.out$zeMap + map4PFS$mapColByBlocks

Map of the partial factor scores: Colored by products

print(d1.partialFS.map.byProducts )

Map of the partial factor scores: Colored by Countries

print(d2.partialFS.map.byCategories)

Save the graphics as a powerpoint

name4Graphs = 'SortingSpices.pptx'

The graphics are saved as a powerpoint with the following command r name4Graphs

toto <- PTCA4CATA::saveGraph2pptx(file2Save.pptx = name4Graphs, 
                 title = '5 Cultures and 16 Spices ', 
                 addGraphNames = TRUE)

Note that we could also have created a powerpoint with Rmarkdown by using the following options in the preamble:

output:
      powerpoint_presentation:
           slide_level: 4

instead of (for example):

output:
       rmarkdown::html_vignette:
          toc: true
          number_sections: true


HerveAbdi/R4SPISE2018 documentation built on Sept. 3, 2020, 6:40 a.m.