Description Usage Arguments Details Value Author(s)
inter_comp
compare results from different
methods to find the common calls, or find the unique calls
between groups in cases/controls scenario.
1 2 3 | inter_comp(in.a, in.b, markers, threshold = 0.5, met.a = "methodA",
met.b = "methodB", comp.type = "inter", min.markers = 10,
n.cores = 4, keepCols = T)
|
in.b |
First dataset or Case(s) |
markers |
Data-frame containing the SNP informations: "chr", "pos" (chromosomal coordinate), "snp" (name of the marker) |
threshold |
Desired threshold for the comparison, e.g. if 0.5 two calls will be treated as the same if sharing 50 more of the markers |
met.a |
Name of the algorithm/pipeline of the first dataset, useful to keep the information after the merge |
met.b |
Name for the second dataset |
comp.type |
type of comparison, can be either "inter", "matched" or "case/control" |
min.markers |
minimum number of marker a calls need to contains |
n.cores |
number of CPU core to use |
keepCols |
set it to F in order to discard the intermediate
colums from |
in.b |
Second dataset or Control(s) |
This function compare the results of CNV calling methods. It can
be useful when merging results from different pipelines on the same
data, in order to highlight the common calls (of higher
confidence) and avoid duplicate calls.
It can also handle case/control situations (selecting the
calls present in cases only), and family-based
studies.
\newline
The function is specifically designed to work on SNP array data
and require SNPs position information to process calls. In particular,
the actual comparison is made on the markers rather than on the raw
genomic coordinates. As an example, in default settings, two calls
will be treated as the same if they share 50
markers.
Inside the function there is a filter based on the number of markers,
default behavior is to eliminate the calls with less than 10
markers. If this is undesired simply set min.markers
to 0.
\newline
There are three possible modes that can be set with the parameter
comp.type
. If set to "inter" the function will scan in.a
and in.b
for replicate calls, it will add a new column,
"uniq" (if 0 call is replicated) and then it will eliminate the
replicated calls from in.b
before merging the two datasets.
In contrast, "matched" and "case/control" assume that in.a
contains calls for the case(s) and in.b
for the control(s).
It will then attempt to select the calls of the case that are not
replicated in control. The difference between the two is that
"case/control" assume one-sample-one-object while "matched" account
for family ID and can handle more than one sample per object.
\newline
Required input files are:
markers
is a data-frame containing information about the SNP
markers of the array used (required columns: "chr" "position"
"snp");
in.a in.b
are two data-frame containing the actual CNV calls
(required columns: "chr" "start" "end" "CN" "loc.start" "loc.end").
\newline
The function uses a for loop and this is its major bottleneck.
In order to speed up the process the input dataset is splitted
according to the n.cores
parameter and the splits are
processed in parallel.
Default number of cores is 4, in this way it should
work with default parameters even on a laptop.
res
Simone Montalbano simone.montalbano@protonmail.com
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.