dssPrincomp: PCA on a distributed dataset

View source: R/dssPrincomp.R

dssPrincompR Documentation

PCA on a distributed dataset

Description

This function is similar to the R function princomp applied on the covariance matrix of the distributed dataset. It has the side effect of creating a scores dataframe on each node - that can be used by subsequent calls to 'biplot'.

Usage

dssPrincomp(
  df,
  type = "combine",
  center = TRUE,
  scale = FALSE,
  scores.suffix = "_scores",
  async = TRUE,
  datasources = NULL
)

Arguments

df

a character name of the dataframe. The dataframe can contain character columns or factors in which case only the numeric columns will be considered.

type

a character which represents the type of analysis to carry out. If type is set to 'combine', global column means are calculated if type is set to 'split', the column means are calculated separately for each node.

center

a logical, should the columns be centered? Default TRUE.

scale

a logical, should the columns be scaled? Default FALSE.

scores.suffix

a character. The name of the scores dataframe will be the concatenation between df and scores.suffix.

async

a logical, see datashield.aggregate

datasources

a list of opal objects obtained after logging into the opal servers (see datashield.login)

Value

a list with one element for each node (or one $global element if type='combine'). Each element contains a stripped down princomp object (the 'scores' element is replaced with the name of the scores dataframe on the remote nodes)


sib-swiss/dsSwissKnifeClient documentation built on July 16, 2025, 6:25 p.m.