getRegions: Generate list of regions, classify each as differentially...

Description Usage Arguments Details Value Author(s) See Also

View source: R/getRegions.R

Description

Using one of three methods, divides the genome (or chromosome) into regions by putting each nucleotide into a state and grouping contiguous nucleotides of the same state into "regions." Regions of states 3 and 4 are "differentially expressed."

Usage

1
2
3
getRegions(method, chromosome, pos, tstats, transprobs = c(0.999, 1e-12),
  stateprobs = NULL, params = NULL, K = 25, tcut = 2, includet = F,
  includefchange = F, fchange = NULL)

Arguments

method

Can be one of "HMM" (Hidden Markov Model), "CBS" (circular binary segmentation), or "smoothcut" (t statistics with high enough absolute values are called differentially expressed).

chromosome

Name of chromosome being analyzed - will be printed in output table.

pos

Vector giving genomic positions of the provided t statistics. Must have length equal to that of tstats. pos is returned by getLimmaInput.

tstats

Vector giving moderated t statistics, in proper genomic order.

transprobs

Vector denoting transition probabilities between states, for use in the "HMM" method. Should have length 2, with first element denoting the probability of staying in the same state (should be large), and the second element denoting the probability of moving directly from a differentially expressed state to an equally expressed state or vice versa, or from an overexpressed state to an underexpressed state or vice versa (should be very small). Defaults to c(.999, 1e-12).

stateprobs

Marginal probabilities of being in each of the four hidden states, for use with the "HMM" method. The stateprobs element of getParams generates this.

params

Parameters of the normal distributions representing the four states in the "HMM" method. The params element of getParams generates this.

K

Smoothing parameter used in the "smoothcut" method: t statistics are smoothed using running median; how wide should the window be? Default 25.

tcut

Cutoff used in the "smoothcut" method to classify differential expression: how large in absolute value should a moderated t statistic be in order to be classified as having been generated from a differentially expressed nucleotide? Default 2.

includet

If TRUE, the table in the output will include the average t statistic for each region.

includefchange

If TRUE, the table in the output will include the average estimated fold change (as estimated from the linear models) for each region.

fchange

Required if includefchange = TRUE. Estimated log2 fold changes from the linear models - should have length equal to that of tstats. Usually obtained from the logfchange element of the output of getTstats.

Details

States are labeled numerically in the output as follows: 1="not expressed," 2="equally expressed," 3="overexpressed," 4="underexpressed."

Value

A list with elements

states.norle

data frame with one row per nucleotide, giving its genomic location and predicted hidden state

states

data frame with one row per region, giving its genomic location, length, predicted hidden state, and (if applicable) average t statistic and/or fold change.

Author(s)

Alyssa Frazee

See Also

getTstats, getParams


leekgroup/derfinder documentation built on May 20, 2019, 11:30 p.m.