SaveFullinfRepDat: Save the datasets that contain data across all inferential...

View source: R/utility_functions.R

SaveFullinfRepDatR Documentation

Save the datasets that contain data across all inferential replicates and samples for a particular subset of genes

Description

Save the datasets that contain data across all inferential replicates and samples for a particular subset of genes

Usage

SaveFullinfRepDat(
  SalmonFilesDir,
  QuantSalmon,
  abDatasetsFiltered,
  save_dir,
  GibbsSamps,
  filteredgenenames,
  cntGene,
  key,
  nparts,
  curr_part_num
)

Arguments

SalmonFilesDir

is the directory the Salmon quantification results are saved in

QuantSalmon

is the Salmon quantification object output using tximport (see file (1)DataProcessing.R in the package's SampleCode folder for example code)

abDatasetsFiltered

is a list of dataframes that contains filtered TPM measurements for genes/transcripts that pass filtering. Is output by DRIMSeqFilter

save_dir

is the outer directory to save the full inferential replicate datasets in. Datasets can get quite large with a large number of samples or small number of parts so choose a directory with plenty of free space. Specified directory should be the same in SaveFullinfRepDat, SaveWithinSubjCovMatrices, and SaveGeneLevelFiles.

GibbsSamps

is TRUE if the inferential replicates are Gibbs samples and FALSE if the replicates are bootstrap samples

filteredgenenames

is a character vector of all genenames that pass filtering

cntGene

is the data.frame of counts and lengths for each sample saved by sumToGene

key

is a data.frame with columns "Sample" (corresponding to the unique biological identifier for the analysis), "Condition" (giving the condition/treatment effect variables for the data), and "Identifier", which should be named "Sample1", "Sample2", ... up to the number of rows of key. This "Identifier" needs to be created like this even if the observations don't correspond to unique biological samples.

nparts

is the total number of parts to split the filteredgenenames list into when saving the datasets. See details.

curr_part_num

is the current part number that is being run. The genelist specified in filteredgenenames is split into nparts equally sized chunks. See the example in (3)SaveNecessaryDatasetsForCompDTUReg.R within the package's SampleCode folder

Details

This function is used to save the necessary inferential replicate datasets. These files can get quite large with a large number of genes and/or a large number or biological samples or inferential replicates. The parameter nparts controls how many chunks the full data is split into based on splitting gene name list filteredgenenames into nparts chunks. Increasing nparts can help mitigate issues with the resulting files becoming too large. For example, for a ten sample analysis with 100 bootstrap samples we set nparts to be 10 to result in files that are around 150 MB each. The files saved by this function will be used by SaveWithinSubjCovMatrices and SaveGeneLevelFiles. See the file (3)SaveNecessaryDatasetsForCompDTUReg.R in the package's SampleCode folder for example code.

Value

The function will save three separate .RData files that contains all bootstrap or Gibbs replicates for all samples for the genes that exist for the current part of the data (specified by curr_part_num). Specifically, the abDataset cntDataset files containing all bootstrap/Gibbs samples for the current part will be saved in the sub directories "infRepsabDatasets/" and "infRepscntDatasets/" respectively. See the documentation for the function sumToGene for more information on the abDataset and cntDataset files. Additionally, a data.frame that contains TPM and count information for all genes in the current part for all samples is saved in the subdirectory "infRepsFullinfRepDat/".


skvanburen/CompDTUReg documentation built on Jan. 23, 2025, 9:01 a.m.