buildData: buildData builds the data included in this package

View source: R/buildAll.R

buildDataR Documentation

buildData builds the data included in this package

Description

This function generates the data set for this package. All parameters are optional; by default the function will generate a normalised dataset based on downloading the accessions in 'inst/hsapiens_colData_transitions_v3.5.csv' for species "hsapiens", and save the dataset to a file called 'homosapienDEE2Data.rds' in the current directory.

Usage

buildData(
  species = "hsapiens",
  name_prefix = "homosapienDEE2Data",
  name_suffix = ".csv",
  build_raw = FALSE,
  build_srx_agg = FALSE,
  build_deseq2 = TRUE,
  build_tsne = TRUE,
  build_rank = TRUE,
  generate_qc_pass = TRUE,
  generate_qc_warn = TRUE,
  base = getwd(),
  quiet = TRUE,
  metadata = if (!(build_raw || build_srx_agg || build_deseq2 || build_tsne ||
    build_rank) || !(generate_qc_pass || generate_qc_warn)) {
     return(list())
 } else
    {
     getDEE2Metadata(species, quiet = quiet)
 },
  counts.cutoff = 10,
  accessions = as.list(unique(cols$SRR_accession)),
  in_data = if (!(build_raw || build_srx_agg || build_deseq2 || build_tsne || build_rank)
    || !(generate_qc_pass || generate_qc_warn)) {
     return(list())
 } else {
    
    buildRaw(species = species, accessions = accessions, quiet = quiet, metadata =
    metadata)
 },
  dds_design = ~1,
  write_files = TRUE
)

Arguments

species

The species to fetch data for; default is "hsapiens".

name_prefix

The output file name prefix; default is "homosapienDEE2Data".

name_suffix

The output file name suffix; default is ".csv"

build_raw

Whether to build the raw normalisation.

build_srx_agg

Whether to build the srx aggregation normalisation.

build_deseq2

Whether to build the deseq2 normalisation.

build_tsne

Whether to build the tsne normalisation.

build_rank

Whether to build the rank normalisation.

generate_qc_pass

Generate output from the input data that passed quality control

generate_qc_warn

Generate output from the the conbination of input data that passed quality control and input data that had warnings in quality control

base

The directory to output the file to; default is the current working directory.

quiet

Whether to suppress notification output where possible; default TRUE.

metadata

If you have already downloaded metadata for the species, you can pass it in here. Otherwise the metadata will be downloaded.

counts.cutoff

Cutoff value for minimum gene expression; default is 10.

accessions

Which sample ids to download from DEE2 (we refer to these as accessions); default is derived from 'hsapiens_colData.csv' in this package. For subsets, you can see the internal 'cols' objects 'SRR_accession' member.

in_data

If you have already downloaded the accession data from DEE2, you can pass it through here. Otherwise this data will be downloaded.

dds_design

The design formula used as part of DESeq2 normalisation. Default is '~ 1'. See the documentation for 'DESeq2::DESeqDataSetFromMatrix' for more details.

write_files

Write out normalised data to files. If this is false, the function will not write out the normalised data, but will only return it.

Value

A named list of SummarizedExperiment objects. The exact set depends on the options you select when calling the function.

See Also

downloadAllTheData

Examples

# To build the default, full dataset, and write it out to several csv files:
#homosapienDEE2CellScore::buildData()

# To build a restricted set of data, with a cached metadata file,
# only running deseq2 normalisation, to "data_PASS_deseq2.csv" and "data_WARN_deseq2.csv"
metadata <- getDEE2::getDEE2Metadata("hsapiens", quiet=TRUE)
homosapienDEE2CellScore::buildData(
  metadata=metadata, accessions=as.list(unique(cols$SRR_accession)[c(1,3)]),
  build_deseq2=TRUE, build_tsne=FALSE, build_rank=FALSE, name_prefix="data")

# Process a subset of the data, but do not write it out into files
processed_data <- homosapienDEE2CellScore::buildData(
  metadata=metadata, accessions=as.list(unique(cols$SRR_accession)[c(1,3)]),
  build_deseq2=TRUE, build_tsne=FALSE, write_files=FALSE)

# Get PCA form of the deseq2 normalised data that passed quality control
pca_form <- prcomp(t(SummarizedExperiment::assay(processed_data$qc_pass_deseq2, "counts")))

flaviusb/homosapienDEE2CellScore documentation built on April 11, 2024, 1:47 p.m.