load_rcc: Produce a "nacho" object from RCC NanoString files
In NACHO: NanoString Quality Control Dashboard

load_rcc

R Documentation

Produce a "nacho" object from RCC NanoString files

Description

This function is used to preprocess the data from NanoString nCounter.

Usage

load_rcc(
  data_directory,
  ssheet_csv,
  id_colname = NULL,
  housekeeping_genes = NULL,
  housekeeping_predict = FALSE,
  housekeeping_norm = TRUE,
  normalisation_method = "GEO",
  n_comp = 10
)

Arguments

`data_directory`	[character] A character string of the directory where the data are stored.
`ssheet_csv`	[character] or [data.frame] Either a string with the name of the CSV of the samplesheet or the samplesheet as a `data.frame`. Should contain a column that matches the file names in the folder.
`id_colname`	[character] Character string of the column in `ssheet_csv` that matches the file names in `data_directory`.
`housekeeping_genes`	[character] A vector of names of the miRNAs/mRNAs that should be used as housekeeping genes. Default is `NULL`.
`housekeeping_predict`	[logical] Boolean to indicate whether the housekeeping genes should be predicted (`TRUE`) or not (`FALSE`). Default is `FALSE`.
`housekeeping_norm`	[logical] Boolean to indicate whether the housekeeping normalisation should be performed. Default is `TRUE`.
`normalisation_method`	[character] Either `"GEO"` or `"GLM"`. Character string to indicate normalisation using the geometric mean (`"GEO"`) or a generalized linear model (`"GLM"`). Default is `"GEO"`.
`n_comp`	[numeric] Number indicating the number of principal components to compute. Cannot be more than n-1 samples. Default is `10`.

Value

[list] A list object of class "nacho":

access: [character] Value passed to load_rcc() in id_colname.
housekeeping_genes: [character] Value passed to load_rcc().
housekeeping_predict: [logical] Value passed to load_rcc().
housekeeping_norm: [logical] Value passed to load_rcc().
normalisation_method: [character] Value passed to load_rcc().
remove_outliers: [logical] FALSE.
n_comp: [numeric] Value passed to load_rcc().
data_directory: [character] Value passed to load_rcc().
pc_sum: [data.frame] A data.frame with n_comp rows and four columns: "Standard deviation", "Proportion of Variance", "Cumulative Proportion" and "PC".
nacho: [data.frame] A data.frame with all columns from the sample sheet ssheet_csv and all computed columns, i.e., quality-control metrics and counts, with one sample per row.
outliers_thresholds: [list] A list of the (default) quality-control thresholds used.

Examples


if (interactive()) {
  library(GEOquery)
  library(NACHO)

  # Import data from GEO
  gse <- GEOquery::getGEO(GEO = "GSE74821")
  targets <- Biobase::pData(Biobase::phenoData(gse[[1]]))
  GEOquery::getGEOSuppFiles(GEO = "GSE74821", baseDir = tempdir())
  utils::untar(
    tarfile = file.path(tempdir(), "GSE74821", "GSE74821_RAW.tar"),
    exdir = file.path(tempdir(), "GSE74821")
  )
  targets$IDFILE <- list.files(
    path = file.path(tempdir(), "GSE74821"),
    pattern = ".RCC.gz$"
  )
  targets[] <- lapply(X = targets, FUN = iconv, from = "latin1", to = "ASCII")
  utils::write.csv(
    x = targets,
    file = file.path(tempdir(), "GSE74821", "Samplesheet.csv")
  )

  # Read RCC files and format
  nacho <- load_rcc(
    data_directory = file.path(tempdir(), "GSE74821"),
    ssheet_csv = file.path(tempdir(), "GSE74821", "Samplesheet.csv"),
    id_colname = "IDFILE"
  )
}

NACHO documentation built on May 29, 2024, 2:05 a.m.