spector_qc: Compute QC (spector LAS) for bam files.

Description Usage Arguments Details Value Examples

Description

Wavelet based technique to compute a quality LAS for regions across the genome. The spector_qc() is the recommended function to access spector since it includes several checks and allows for more flexibility than some of the downstream functions.

Usage

1
2
3
4
5
spector_qc(f_bam = NULL, f_bed = NULL, regions = "giab",
  region_size = NULL, file_type = "bam", out_F = NULL, save_out = FALSE,
  silent = FALSE, smr_var = "rms", file_cores = 1, chr_cores = 1,
  bed_header = FALSE, region_overlap = 0, genome = "hg19",
  res_verbose = TRUE)

Arguments

f_bam

a string with path to bam file(s), it can link to *.bam file (the full relative path is required), a folder with *.bam files, or a file with with structure specified later.

f_bed

file path for bed file to override default giab.

regions

character. Indicates the type of region to be used to process the bam file. The default value is "giab", other options are "full.genome"/"genome"/"full" for regions spanning the full genome, or "custom" for custom bed files (f_bed = needed).

region_size

integer. Choose size of regions to calculate LAS value. The default = NULL means region_size = maximum power of 2 that fits in the smallest region.

file_type

type of file passed to f_bam (Optional). This is to ensure the automated checks pick up the correct format. The possible options are "list", "bam", and "dir".

out_F

(Optional) Folder path to save output. If omitted results will be returned, but not saved.

save_out

logical. Indicating if output from spector_qc() should be saved.

silent

logical. Default FALSE, if TRUE there is no progress update for the code.

smr_var

deprecated. Variable to use to compute summary.

file_cores

integer. Optional number indicating if the QC should be computed in parallel across all input files.

chr_cores

integer. Optional number indicating if the QC should be computed in parallel across chromosomes. Default value is 1.

bed_header

logical. TRUE if bed file has a header, the default value is FALSE.

region_overlap

numeric. This is a number used when computing full genome regions in the package. It indicated the fractional overlap between neighbouring regions.

genome

character. The genome version of the bam file. Unless a f_bed file is provided, the only possible options are "hg19" or "hg38".

res_verbose

logical. TRUE for a verbose output including information regions. FALSE for summary results only.

Details

The spector_qc function is the main function to use for QC in the spector package. It will compute a quality control LAS for specific regions across the genome. The default regions, supplied in the package, are based on the genome in a bottle project (giab) reliable regions, calculated using ReliableGenome (RG). It is also possible to supply custom regions as a bed file or a data.frame object.

It is important to supply full paths to f_bam, and f_bed. Though the path can be relative to the current working directory, which can be set with base::setwd(). This also applies to the first column of a parameter file that can be supplied to f_bam.

Value

Output is a tbl_df object with a LAS value for each region. Optionally the output can also be saved to file, but only if out_F is provided.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
# Compute QC on sampl1.bam with default options
spector_qc(f_bam = "sample1.bam")

## End(Not run)

s1_path <- spector_sample("sample1.bam")
basic_path <- spector_sample("basic.bed")

# Compute QC on sample1.bam with custom region size
spector_qc(f_bam = s1_path, region_size = 2^14)

# Compute QC on sample1.bam with custom bed file
spector_qc(f_bam = s1_path, f_bed = basic_path)

# Compute QC and save output results
spector_qc(f_bam = s1_path, f_bed = basic_path, out_F = "~/",
 save_out = TRUE)

anasrana/spector documentation built on May 14, 2019, 2:36 p.m.