postanalysisloop: Batch analysis of samples in a QDNAseq-object for which...
In tgac-vumc/ACE: Absolute Copy Number Estimation from Low-coverage Whole Genome Sequencing

postanalysisloop

R Documentation

Batch analysis of samples in a QDNAseq-object for which models have been chosen

Description

When models have been chosen for all (or just multiple) samples in a QDNAseq-object, this function can be used to perform a batch analysis on those samples. This encompasses printing segment data, printing copy number plots, and linking mutation data.

Usage

postanalysisloop(copyNumbersSegmented, modelsfile, variantdata, 
                 prefix="", postfix="", trncname=FALSE, inputdir=FALSE,
                 hetSNPs=FALSE, chrindex=1, posindex=2, freqindex,
                 altreadsindex, totalreadsindex, refreadsindex,
                 confidencelevel=FALSE, append=TRUE,
                 dontmatchnames=FALSE, printsegmentfiles=TRUE,
                 printnewplots=TRUE, imagetype='pdf',
                 onlyautosomes=TRUE,outputdir="./", log=FALSE, 
                 segext='tsv', genderci)

Arguments

`copyNumbersSegmented`	QDNAseq-object with segmented data or file path of an rds-file containing a QDNAseq-object
`modelsfile`	Character string or data frame. When a character, it specifies the file path of a tab-delimited text containing model variables of samples. Expects columns with a header. It contains at least two columns: the first specifying the sample names and the second specifying the cellularity. The third column is the ploidy of the samples. When omitted, it is assumed to be 2. The fourth column is the standard of the samples. When omitted, it is calculated from the data in the object. The fitpicker.tsv file created by `runACE` can be used as modelsfile after cellularity of the likely fit is specified in the second column.
`variantdata`	Character string. Specifies directory containing variant data of samples. Optional. When argument inputdir is used, the function will first see if this argument is specified, if not it will check if the directory inputdir/variantdata exists, if not it will look for the variant files in inputdir itself. When inputdir is not used and this argument is omitted, the function will not link variant data. Mutation files need to have the same file extensions, which can be either .csv, .tsv, .txt, or .xls.
`prefix`	Character string. Used when a uniform character string precedes the sample name in the file name. E.g. "mutations_sample1.csv" has prefix "mutations_". Default = ""
`postfix`	Character string. As `prefix`, but then after the sample name. E.g. "sample1_somatics.csv" has postfix "_somatics". Default = ""
`trncname`	Logical. When TRUE, truncates the sample names of the QDNAseq-object starting from the first "_", or specify a character string with your regular expression of choice (`trncname` uses the `gsub` function). NOTE: use only when this will provide matches with the mutation files and the sample names in the modelsfile. Default = FALSE
`inputdir`	Character string. Specifies the directory which contains the files to be analyzed. Convenience function. Reduces the amount of arguments required when all data is available in the same directory: the QDNAseq-object, a file named "models.tsv" with the model parameters, and the mutation data (either in the inputdir itself or in a subdirectory "mutationdata"). Specifying the first arguments (copyNumbersSegmented, modelsfile, mutationdata) will take priority. When missing it will look in inputdir. When multiple rds-files are present in the inputdir, it will try the first one. Note: the path specified has no consequences for the location of the output. Default = FALSE
`hetSNPs`	Logical. If TRUE, half of the germline copies are assumed to be variant. Default = FALSE
`chrindex`	Integer. Column index in input file specifying the chromosome associated with the genomic location. Default = 1
`posindex`	Integer. Column index in input file specifying the position on the chromosome associated with the genomic location. Default = 2
`freqindex`	Integer. Column index in input file specifying the frequency (as a percentage) of the variant
`altreadsindex`	Integer. Column index in input file specifying the number of variant-supporting reads
`totalreadsindex`	Integer. Column index in input file specifying the read depth at the genomic location of the variant
`refreadsindex`	Integer. Column index in input file specifying the number of reference-supporting reads
`confidencelevel`	Numeric or logical. If read depth information is available, calculate the upper and lower bounds of this confidence level for the frequency and the number of variant copies of each variant. Will be skipped if FALSE. Default = FALSE
`append`	Logical. When TRUE, appends the output columns to the original mutation input file, but it still saves the result in a new file. When FALSE, the output file will only contain the columns "Chromosome", "Position", "Frequency", "Copynumbers", and "Mutant_copies" (and including the upper and lower bounds of the frequency and variant copies confidence interval, when applicable). Default = TRUE
`dontmatchnames`	Logical. When TRUE, the model variables are called by the index of the sample in the QDNAseq-object. This will only work if the order of samples in the object exactly matches the order of samples in the modelsfile. Use with caution! This is somewhat of an emergency option if for some reason the name matching is not working. I recommend trying to get the name matching to work. Default = FALSE
`printsegmentfiles`	Logical. When TRUE, prints a tab-delimited text file for each sample into a "segmentfiles" folder. Default = TRUE
`printnewplots`	Logical. When TRUE, prints plots into a "newplots" folder in the specified image type. Default = TRUE
`imagetype`	Character string specifying the image type graphics device. Default = "pdf"
`onlyautosomes`	Logical or integer. Specifies whether only or which autosomes are plotted. For more documentation, see `singleplot`
`outputdir`	Character string. Save output into this custom directory. Default = "./"
`log`	Logical or integer. Use log conversion for creating segments output. Default = FALSE
`segext`	Character string specifying the extension for the segments output. Default = "tsv"
`genderci`	Integer. Column index in modelsfile or data frame specifying the gender of the corresponding sample. See note

Details

If your input is tailored for this function, you could run it without any arguments! Most arguments help with matching sample names in the QDNAseq-object, the modelsfile, and the names and columns of the files containing variant data. You can "trim" the name of the file with variant data using the prefix (everything before the name) and postfix (everything after the name, but before the file extension) arguments to match your sample names. trncname migth help trimming the name in the QDNAseq-object, but be sure it still matches the sample names in the modelsfile (and mutation data file names when applicable).

Value

Prints the specified output to an indicated directory. Returns a list of copy number plots.

Note

The use of inputdir and outputdir should be fairly robust. However, using irregular file paths might cause problems. If you suspect problems with file paths, try setting the working directory to the intended inputdir.

If you intend to plot or analyze variant data on sex chromosomes, make sure you specify the gender of each individual using the genderci option. The function will look for the gender in the indicated column number of the modelsfile (or data frame). Suggested indication within this column is "M" for male and "F" for female. When missing, the function defaults to "F".

Author(s)

Jos B. Poell

Examples

## see the vignette for examples
## Not run: 
  data("copyNumbersSegmented")
  postanalysisloop(copyNumbersSegmented, "models.tsv", "variantdata", 
  outputdir = "loop_output")
  
## End(Not run)

tgac-vumc/ACE documentation built on Nov. 29, 2022, 12:15 a.m.