View source: R/utility_functions.R
SaveFullinfRepDat | R Documentation |
Save the datasets that contain data across all inferential replicates and samples for a particular subset of genes
SaveFullinfRepDat(
SalmonFilesDir,
QuantSalmon,
abDatasetsFiltered,
save_dir,
GibbsSamps,
filteredgenenames,
cntGene,
key,
nparts,
curr_part_num
)
SalmonFilesDir |
is the directory the Salmon quantification results are saved in |
QuantSalmon |
is the Salmon quantification object output using tximport (see file (1)DataProcessing.R in the package's SampleCode folder for example code) |
abDatasetsFiltered |
is a list of dataframes that contains filtered TPM measurements for genes/transcripts that pass filtering. Is output by |
save_dir |
is the outer directory to save the full inferential replicate datasets in. Datasets can get quite large with a large number of samples or small number of parts so choose a directory with plenty of free space.
Specified directory should be the same in |
GibbsSamps |
is TRUE if the inferential replicates are Gibbs samples and FALSE if the replicates are bootstrap samples |
filteredgenenames |
is a character vector of all genenames that pass filtering |
cntGene |
is the data.frame of counts and lengths for each sample saved by |
key |
is a data.frame with columns "Sample" (corresponding to the unique biological identifier for the analysis), "Condition" (giving the condition/treatment effect variables for the data), and "Identifier", which should be named "Sample1", "Sample2", ... up to the number of rows of key. This "Identifier" needs to be created like this even if the observations don't correspond to unique biological samples. |
nparts |
is the total number of parts to split the |
curr_part_num |
is the current part number that is being run. The genelist specified in |
This function is used to save the necessary inferential replicate datasets. These files can get quite large with a large number of genes and/or a large number or biological samples or inferential replicates. The parameter nparts
controls how many chunks the full data is split into based
on splitting gene name list filteredgenenames
into nparts
chunks. Increasing nparts
can help mitigate issues with the resulting files becoming too large.
For example, for a ten sample analysis with 100 bootstrap samples we set nparts to be 10 to result in files that are around 150 MB each. The files saved by this function will be used by SaveWithinSubjCovMatrices
and SaveGeneLevelFiles
.
See the file (3)SaveNecessaryDatasetsForCompDTUReg.R in the package's SampleCode folder for example code.
The function will save three separate .RData files that contains all bootstrap or Gibbs replicates for all samples for the genes that exist for the current part of the data (specified by curr_part_num
). Specifically, the abDataset cntDataset files containing all bootstrap/Gibbs samples
for the current part will be saved in the sub directories "infRepsabDatasets/" and "infRepscntDatasets/" respectively. See the documentation for the function sumToGene
for more information on the abDataset and cntDataset files. Additionally, a data.frame that contains TPM and count information for all genes
in the current part for all samples is saved in the subdirectory "infRepsFullinfRepDat/".
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.