inter_comp: Inter results CNV calls comparison

Description Usage Arguments Details Value Author(s)

View source: R/inter_comp.R

Description

inter_comp compare results from different methods to find the common calls, or find the unique calls between groups in cases/controls scenario.

Usage

1
2
3
inter_comp(in.a, in.b, markers, threshold = 0.5, met.a = "methodA",
  met.b = "methodB", comp.type = "inter", min.markers = 10,
  n.cores = 4, keepCols = T)

Arguments

in.b

First dataset or Case(s)

markers

Data-frame containing the SNP informations: "chr", "pos" (chromosomal coordinate), "snp" (name of the marker)

threshold

Desired threshold for the comparison, e.g. if 0.5 two calls will be treated as the same if sharing 50 more of the markers

met.a

Name of the algorithm/pipeline of the first dataset, useful to keep the information after the merge

met.b

Name for the second dataset

comp.type

type of comparison, can be either "inter", "matched" or "case/control"

min.markers

minimum number of marker a calls need to contains

n.cores

number of CPU core to use

keepCols

set it to F in order to discard the intermediate colums from inter_comp and locus.

in.b

Second dataset or Control(s)

Details

This function compare the results of CNV calling methods. It can be useful when merging results from different pipelines on the same data, in order to highlight the common calls (of higher confidence) and avoid duplicate calls. It can also handle case/control situations (selecting the calls present in cases only), and family-based studies. \newline The function is specifically designed to work on SNP array data and require SNPs position information to process calls. In particular, the actual comparison is made on the markers rather than on the raw genomic coordinates. As an example, in default settings, two calls will be treated as the same if they share 50 markers. Inside the function there is a filter based on the number of markers, default behavior is to eliminate the calls with less than 10 markers. If this is undesired simply set min.markers to 0. \newline There are three possible modes that can be set with the parameter comp.type. If set to "inter" the function will scan in.a and in.b for replicate calls, it will add a new column, "uniq" (if 0 call is replicated) and then it will eliminate the replicated calls from in.b before merging the two datasets. In contrast, "matched" and "case/control" assume that in.a contains calls for the case(s) and in.b for the control(s). It will then attempt to select the calls of the case that are not replicated in control. The difference between the two is that "case/control" assume one-sample-one-object while "matched" account for family ID and can handle more than one sample per object. \newline Required input files are: markers is a data-frame containing information about the SNP markers of the array used (required columns: "chr" "position" "snp"); in.a in.b are two data-frame containing the actual CNV calls (required columns: "chr" "start" "end" "CN" "loc.start" "loc.end"). \newline The function uses a for loop and this is its major bottleneck. In order to speed up the process the input dataset is splitted according to the n.cores parameter and the splits are processed in parallel. Default number of cores is 4, in this way it should work with default parameters even on a laptop.

Value

res

Author(s)

Simone Montalbano simone.montalbano@protonmail.com


SinomeM/cnv_geaRs documentation built on Dec. 4, 2020, 3:06 a.m.