read_results: Read CNVs calling or semgentation results

Description Usage Arguments Details Value Examples

View source: R/read_results.R

Description

read_results takes the results of a CNVs calling pipeline and return them in a standardized object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
read_results(
  DT_path,
  res_type,
  DT_type,
  pref = NA,
  suff = NA,
  sample_list,
  markers,
  chr_col,
  start_col,
  end_col,
  CN_col,
  samp_ID_col,
  end_vcf = "END",
  CN_vcf = "CN",
  do_merge = TRUE,
  merge_prop = 0.5,
  method_ID
)

Arguments

DT_path,

path to the directory containing the individual files, if res_type is set to "directory" or to the single file if res_type is set to "file".

res_type,

can be either "directory" or "file", indicates whether the function must expect a single file for all samples or one file per sample.

DT_type,

can be either "VCF" or "TSV/CSV", indicate the file type.

pref, suff,

eventual prefix an suffix (e.g. ".txt") to the files to be used when res_type is set to "directory". If not necessary must be set to NA.

sample_list,

minimal cohort metadata, a data.table produced by the function read_metadt.

markers,

a data.table containing the marker list, the output read_finalreport_snps with DT_type set to "markers" or read_NGS_intervals.

chr_col,

name of the column containing the chromosome information in the input data.

start_col,

name of the column containing the start information in the input data.

end_col,

name of the column containing the end information in the input data.

CN_col,

name of the column containing the Copy Number information in the input data.

samp_ID_col,

name of the column containing the sample ID information in the input file, required if res_type is set to "file".

end_vcf,

name of the field containing the segment end information in the VCF file(s), passed to the function read_vcf.

CN_vcf,

name of the field containing the segment copy number information in the VCF file(s), passed to the function read_vcf.

do_merge,

logical, indicates whether the function merge_calls should be automatically called for each sample (strongly suggested).

merge_prop

minimum reciprocal overlap proportion in order to merge.

method_ID,

character identifying the method (algorithms/pipeline), one letter code is strongly encouraged (e.g. "P" for PennCNV and "M" for GATK ModSeg). Numeric are converted to character.

Details

This function aims to convert a variety of possible types of CNVs calling/segmentation pipelines and/or algorithms results into a standardized format in order to easily integrate with the other functions in this package. Currently two main files type and two main file-organization structures are considered, for a total of four generic situations:

If multiple files containing results for multiple samples are present (e.g. the results of PennCNV joint calling on trios) at the moment it is recommended that the user concatenated those individual file in a single one prior loading them with read_results. Note that any line occurring before the columns header are automatically skipped by fread.

Value

a CNVresults object.

Examples

1
2
3
4
DT <- read_results(DT_path = system.file("extdata", "chrs_14_22_cnvs_penn.txt",
package = "CNVgears"), res_type = "file", DT_type = "TSV/CSV", chr_col = "chr",
start_col = "posStart", end_col = "posEnd", CN_col = "CN", samp_ID_col = "Sample_ID",
sample_list = cohort_examples, markers = markers_examples, method_ID = "P")

SinomeM/CNVgears documentation built on Nov. 21, 2021, 5:34 a.m.