qc.samples: Join and QC the reference samples

Description Usage Arguments Value Author(s)

View source: R/qc.samples.R

Description

Join bin counts of the samples to analyze and compute some QC metrics to help defining the set of reference samples.

Usage

1
2
3
4
qc.samples(files.df, bin.df, outfile.prefix, ref.samples = NULL,
  nb.ref.samples = NULL, plot = TRUE, appendIndex.outfile = TRUE,
  chunk.size = 1e+05, col.bc = "bc.gc.gz", nb.cores = 1,
  median.norm = TRUE)

Arguments

files.df

a data.frame with the information about the files to use. Columns 'sample' and 'bc.gc.bg' are required and should be present after running 'initFileNames' function. Files should exist if 'correct.GC' was run.

bin.df

a data.frame with the information about the bins. Columns 'chr', 'start' and 'end' are required.

outfile.prefix

the prefix of the output file name. The suffix '.bgz' will be appended if compressed ('appendIndex.outfile=TRUE').

ref.samples

a vector with the names of the samples to use as reference.

nb.ref.samples

the number of reference samples desired. If NULL, the size of 'ref.samples'.

plot

should PCA graphs be outputed ? Default is TRUE.

appendIndex.outfile

if TRUE (default), the results will be appended regularly on the output file which will be ultimately compressed and indexed. This is recommend when a large number of bins are analyzed. If FALSE, a data.frame with the bin counts will be returned and no file are created.

chunk.size

the number of bins to analyze at a time (for memory optimization). Default is 100 000. Reduce this number if memory problems arise.

col.bc

the column from 'files.df' defining the bin count file names.

nb.cores

number of cores to use. If higher than 1, parallel package is used to parallelize the counting.

median.norm

Should the merged bin counts be median-normalized. Default is TRUE.

Value

a list with

bc

the name of the file with the joined bin counts OR a data.frame with these bin counts.

ref.samples

a vector with the reference samples names.

cont.sample

the name of the sample to use as control among the reference samples (for normalization).

pc.all.df

a data.frame with the first 3 principal components for all input reference samples.

pc.ref.df

a data.frame with the first 3 principal components for the final reference samples.

Author(s)

Jean Monlong


jmonlong/PopSV documentation built on Sept. 15, 2019, 9:29 p.m.