FishHook: title

Description Usage Arguments Format Value Author(s)

Description

Stores Events, Hypotheses, Eligible, Covariates.

Stores Events, Hypotheses, Eligible, Covariates.

Usage

1
2
3
4
5
6
7
8
Fish(hypotheses = NULL, events = NULL, covariates = NULL,
  eligible = NULL, out.path = NULL, use_local_mut_density = FALSE,
  local_mut_density_bin = 1e+06, mc.cores = 1, na.rm = TRUE,
  pad = 0, verbose = TRUE, max.slice = 1e+05, ff.chunk = 1e+06,
  max.chunk = 1e+12, idcol = NULL, idcap = 1, weightEvents = FALSE,
  nb = TRUE)

FishHook

Arguments

hypotheses

Examples of hypotheses are genes, enhancers, or even 1kb tiles of the genome that we can then convert into a rolling/tiled window. This param must be of class "GRanges".

events

Events are the given mutational regions and must be of class "GRanges". Examples of events are SNVs (e.g. C->G) somatic copy number alterations (SCNAs), fusion events, etc.

covariates

Covariates are genomic covariates that you belive will cause your given type of event (mutations, CNVs, fusions, case control samples) that are not linked to the process you are investigating (e.g. cancer drivers). In the case of cancer drivers, we are looking for regions that are mutated as part of cancer progression. As such, regions that are more suceptable to random mutagenesis such as late replicating or non-expressed region (transcription coupled repair) could become false positives. Including covariates for these biological processes will reduce thier visible effect in the final data. This param must be of type "Covariate".

eligible

Eligible regions are the regions of the genome that have enough statistical power to score. For example, in the case of exome sequencing where all regions are not equally represented, eligible can be a set of regions that meet an arbitrary exome eligible threshold. Another example of when to use eligibility is in the case of whole genomes, where your hypotheses are 1kb tiles. Regions of the genome you would want to exclude in this case are highly repetative regions such as centromeres, telomeres, and satelite repeates. This param must be of class "GRanges".

out.path

A character that will indicate a system path in which to save the results of the analysis.

use_local_mut_density

A logical that when true, creates a covariate that will represent the mutational density in the genome, whose bin size will be determined by local_mut_density_bin. This covariate can be used when you have no other covariates as a way to correct for variations in mutational rates along the genome under the assumption that driving mutations will cluster in local regions as opposed to global regions. This is similar to saying, in the town of foo, there is a crime rate of X that we will assume to be the local crime rate If a region in foo have a crime rate Y such that Y >>>>> X, we can say that region Y has a higher crime rate than we would expect.

local_mut_density_bin

A numeric value that will indicate the size of the genomic bins to use if use_local_mut_density = TRUE. Note that this paramter should be a few orders of magnitude greater than the size of your targetls

e.g. if your hypotheses are 1e5 bps long, you may want a local_mut_density_bin of 1e7 or even 1e8

mc.cores

A numeric value that indicates the amount of computing cores to use when running fishHook. This will mainly be used during the annotation step of the analysis, or during initial instantiation of the object if use_local_mut_density = T

na.rm

A logical indicating how you handle NAs in your data, mainly used in fftab and gr.val, see these function documentations for more information

pad

A numeric indicating how far each covariate range should be extended, see Covariate for more information, not that this will only be used if atleast on of the Covariates have pad = NA

max.slice

integer Max slice of intervals to evaluate with gr.val (default = 1e3)

ff.chunk

integer Max chunk to evaluate with fftab (default = 1e6)

max.chunk

integer gr.findoverlaps parameter (default = 1e11)

idcol

A character, that indicates the column name containing the patient ids, this is for use in conjunction with idcap. If max patientpergene is specified and and the column referenced by idcol exists, we will limit the contributions of each patient to each target to idcap. e.g. if Patient A has 3 events in target A and Patient B has 1 event in target A, and idcap is set to 2, with thier ID column specified, target A will have a cournt of 3, 2 coming from patient A and 1 coming from patient B

idcap

a numeric that indicates the max number of events any given patient can contribute to a given target. for use in conjction with idcol. see idcol for more info.

weightEvents

a logical that indicates if the events should be weighted by thier overlap with the hypotheses. e.g. if we have a SCNA spanning 0:1000 and a target spanning 500:10000, the overlap of the SCNA and target is 500:1000 which is half of the original width of the SCNA event. thus if weightEvent = T, we will credit a count of 0.5 to the target for this SCNA. This deviates from the expected input for the gamma poisson as the gamma poisson measures whole event counts.

nb

boolean negative binomial, if false then use poisson

genome

A character value indicating which build of the human genome to use, by default set to hg19

vebose

A logical indicating whether or not to print information to the console when running FishHook

hypotheses

Examples of hypotheses are genes, enhancers, or even 1kb tiles of the genome that we can then convert into a rolling/tiled window. This param must be of class "GRanges".

events

Events are the given mutational regions and must be of class "GRanges". Examples of events are SNVs (e.g. C->G) somatic copy number alterations (SCNAs), fusion events, etc.

eligible

Eligible regions are the regions of the genome that have enough statistical power to score. For example, in the case of exome sequencing where all regions are not equally represented, eligible can be a set of regions that meet an arbitrary exome eligible threshold. Another example of when to use eligibility is in the case of whole genomes, where your hypotheses are 1kb tiles. Regions of the genome you would want to exclude in this case are highly repetative regions such as centromeres, telomeres, and satelite repeates. This param must be of class "GRanges".

covariates

Covariates are genomic covariates that you belive will cause your given type of event (mutations, CNVs, fusions, case control samples) that are not linked to the process you are investigating (e.g. cancer drivers). In the case of cancer drivers, we are looking for regions that are mutated as part of cancer progression. As such, regions that are more suceptable to random mutagenesis such as late replicating or non-expressed region (transcription coupled repair) could become false positives. Including covariates for these biological processes will reduce thier visible effect in the final data. This param must be of type "Covariate".

out.path

A character that will indicate a system path in which to save the results of the analysis.

use_local_mut_density

A logical that when true, creates a covariate that will represent the mutational density in the genome, whose bin size will be determined by local_mut_density_bin. This covariate can be used when you have no other covariates as a way to correct for variations in mutational rates along the genome under the assumption that driving mutations will cluster in local regions as opposed to global regions. This is similar to saying, in the town of foo, there is a crime rate of X that we will assume to be the local crime rate If a region in foo have a crime rate Y such that Y >>>>> X, we can say that region Y has a higher crime rate than we would expect.

local_mut_density_bin

A numeric value that will indicate the size of the genomic bins to use if use_local_mut_density = TRUE. Note that this paramter should be a few orders of magnitude greater than the size of your targetls

e.g. if your hypotheses are 1e5 bps long, you may want a local_mut_density_bin of 1e7 or even 1e8

genome

A character value indicating which build of the human genome to use, by default set to hg19

mc.cores

A numeric value that indicates the amount of computing cores to use when running fishHook. This will mainly be used during the annotation step of the analysis, or during initial instantiation of the object if use_local_mut_density = T

na.rm

A logical indicating how you handle NAs in your data, mainly used in fftab and gr.val, see these function documentations for more information

pad

A numeric indicating how far each covariate range should be extended, see Covariate for more information, not that this will only be used if atleast on of the Covariates have pad = NA

vebose

A logical indicating whether or not to print information to the console when running FishHook

max.slice

integer Max slice of intervals to evaluate with gr.val (default = 1e3)

ff.chunk

integer Max chunk to evaluate with fftab (default = 1e6)

max.chunk

integer gr.findoverlaps parameter (default = 1e11)

idcol

A character, that indicates the column name containing the patient ids, this is for use in conjunction with idcap. If max patientpergene is specified and and the column referenced by idcol exists, we will limit the contributions of each patient to each target to idcap. e.g. if Patient A has 3 events in target A and Patient B has 1 event in target A, and idcap is set to 2, with thier ID column specified, target A will have a cournt of 3, 2 coming from patient A and 1 coming from patient B

idcap

a numeric that indicates the max number of events any given patient can contribute to a given target. for use in conjction with idcol. see idcol for more info.

weightEvents

a logical that indicates if the events should be weighted by thier overlap with the hypotheses. e.g. if we have a SCNA spanning 0:1000 and a target spanning 500:10000, the overlap of the SCNA and target is 500:1000 which is half of the original width of the SCNA event. thus if weightEvent = T, we will credit a count of 0.5 to the target for this SCNA. This deviates from the expected input for the gamma poisson as the gamma poisson measures whole event counts.

nb

boolean negative binomial, if false then use poisson

Format

An object of class R6ClassGenerator of length 24.

Value

FishHook object ready for annotation/scoring.

FishHook object ready for annotation/scoring.

Author(s)

Zoran Z. Gajic

Zoran Z. Gajic


mskilab/fish.hook documentation built on Oct. 4, 2020, 11:42 a.m.