Description Usage Arguments Details Value Examples
read_results
takes the results of a CNVs calling pipeline and return
them in a standardized object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | read_results(
DT_path,
res_type,
DT_type,
pref = NA,
suff = NA,
sample_list,
markers,
chr_col,
start_col,
end_col,
CN_col,
samp_ID_col,
end_vcf = "END",
CN_vcf = "CN",
do_merge = TRUE,
merge_prop = 0.5,
method_ID
)
|
DT_path, |
path to the directory containing the individual files, if
|
res_type, |
can be either "directory" or "file", indicates whether the function must expect a single file for all samples or one file per sample. |
DT_type, |
can be either "VCF" or "TSV/CSV", indicate the file type. |
pref, suff, |
eventual prefix an suffix (e.g. ".txt") to the files to be used
when |
sample_list, |
minimal cohort metadata, a |
markers, |
a |
chr_col, |
name of the column containing the chromosome information in the input data. |
start_col, |
name of the column containing the start information in the input data. |
end_col, |
name of the column containing the end information in the input data. |
CN_col, |
name of the column containing the Copy Number information in the input data. |
samp_ID_col, |
name of the column containing the sample ID information in
the input file, required if |
end_vcf, |
name of the field containing the segment end information in the
VCF file(s), passed to the function |
CN_vcf, |
name of the field containing the segment copy number
information in the VCF file(s), passed to the function |
do_merge, |
logical, indicates whether the function
|
merge_prop |
minimum reciprocal overlap proportion in order to merge. |
method_ID, |
character identifying the method (algorithms/pipeline), one letter code is strongly encouraged (e.g. "P" for PennCNV and "M" for GATK ModSeg). Numeric are converted to character. |
This function aims to convert a variety of possible types of CNVs calling/segmentation pipelines and/or algorithms results into a standardized format in order to easily integrate with the other functions in this package. Currently two main files type and two main file-organization structures are considered, for a total of four generic situations:
VCF files, one per sample (e.g. the results of GATK gCNV pipeline);
VCF file, all sample of a cohort in the same file (not yet fully implemented);
TSV/CSV file, one file per sample (e.g. the results of GATK ModSeg pipeline, or the results of running "manually" PennCNV);
TSV/CSV file, all samples of a cohort in the same file (e.g. the results of EnsembleCNV).
If multiple files containing results for multiple samples are present (e.g. the
results of PennCNV joint calling on trios) at the moment it is recommended that
the user concatenated those individual file in a single one prior loading them
with read_results
.
Note that any line occurring before the columns header are automatically skipped
by fread
.
a CNVresults
object.
1 2 3 4 | DT <- read_results(DT_path = system.file("extdata", "chrs_14_22_cnvs_penn.txt",
package = "CNVgears"), res_type = "file", DT_type = "TSV/CSV", chr_col = "chr",
start_col = "posStart", end_col = "posEnd", CN_col = "CN", samp_ID_col = "Sample_ID",
sample_list = cohort_examples, markers = markers_examples, method_ID = "P")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.