A PCA can be performed on the exposures of the exposome dataset. To do so, there's the ds.exposome_pca. The exposures should be standardized in order to perform the PCA properly, to do so, there's the arguments standar and method, which standardize the Exposome Set before performing the PCA following the specified method. The available methods are normal (default method), which scales the exposures using the mean as the center and the standard variation as dispersion; the robust method, which uses the median and median absolute deviation respectively; and, interquartile range, which uses the median as the center and the coeficient between the interquartile range of the exposure and the normal range between the percentile 75 and 25 as variance. It is important noting that this function is sensitive to be disclosive, specially for very rectangular data frames (similar number of variables as individuals). To illustrate this problem, let's try to perform a PCA on the whole exposures test data.

ds.exposome_pca("exposome_object", datasources = conns)

If that is the case, one option is to reduce the families of exposures of the Exposome Set. The ds.exposome_pca function has the argument fam to select the families to subset the Exposome Set to perform the PCA.

ds.exposome_pca("exposome_object", fam = c("Metals", "Noise"))

The PCA function saves the results on the study server to prevent any dislosures, the default variable they take is called "ds.exposome_pca.Results", which has to be passed to the visualization function. To visualize the results of the PCA there is the function ds.exposome_pca_plot, this function relies on the visualization methods already implemented on rexposome for the PCA analysis, it does it however on a non-disclosive way, by passing the scatter plot points through an anonimization process, hence the arguments k, method and noise. The visualization is controlled with the set argument, which takes "all" (mosaic of plots of the PCA), "exposures" (plot of the exposures space on the first two principal components, color coded by family), "samples" (plot of the individuals space on the first two principal components, this plot can take the phenotype argument to color code the individuals by phenotypes), "variance" and "variance_explained", the two variance plots are quite self explanatory, the color code on the "variance" highlights the first two principal components as they are the ones shown on the other drawings.

ds.exposome_pca_plot("ds.exposome_pca.Results", set = "all", method = 1, k = 3, noise = 5)
ds.exposome_pca_plot("ds.exposome_pca.Results", set = "samples", phenotype = "sex", method = 1, k = 3, noise = 5)
ds.exposome_pca_plot("ds.exposome_pca.Results", set = "exposures", method = 1, k = 3, noise = 5)
ds.exposome_pca_plot("ds.exposome_pca.Results", set = "variance", method = 1, k = 3, noise = 5)
ds.exposome_pca_plot("ds.exposome_pca.Results", set = "variance_explained", method = 1, k = 3, noise = 5)

Furthermore, the ds.exposome_pca_plot function can plot the correlations betweeen the principal components and the exposures and the association of the phenotypes with the principal components. This two visualizations can be obtained by setting the set argument to "exposures_correlation" and "phenotypes_correlation" respectively.

ds.exposome_pca_plot("ds.exposome_pca.Results", set = "exposures_correlation")
ds.exposome_pca_plot("ds.exposome_pca.Results", set = "phenotypes_correlation")

By default, the ds.exposome_pca function only uses the numeric exposures, as a principal component analysis is by definition an analysis applied to numeric variables. If our exposures data contains non-numeric variable (categorical) we may be interested on including them on the analysis. A principal components method that contemplates mixed data is the factor analysis of mixed data (FAMD). To use this method instead of a traditional PCA, there is an argument on the ds.exposome_pca function called pca, set it to FALSE to perform a FAMD.

ds.exposome_pca("exposome_object", fam = c("Metals", "Noise"), pca = FALSE)

The same functions used to plot the regular PCA can be used to plot the results of the FAMD.



isglobal-brge/dsExposomeClient documentation built on March 5, 2024, 12:26 p.m.