sharpr2: sharpr2
In sharpr2: Estimating Regulatory Scores and Identifying ATAC-STARR Data

Description Usage Arguments Details Value References Examples

View source: R/sharpr2.r

For a HiDRA dataset on a given chromosome, this function calls tiled regions (the regions covered by at least one fragment), and calculates regulatory scores for each tiled region. The regulatory scores are based on standardized log(RNA/PLASMID).

1
2
3

sharpr2(data, l_min = 150, l_max = 600, f_rna = 10, f_dna = 0,
  s_a = 300, verbose = FALSE, auto = TRUE, sig = TRUE, len = FALSE, 
  alpha = 0.05, win = 5, mse = FALSE, max_t = 1)

`data`	A data.frame containing an ATAC-STARR dataset for one chromosome. The data.frame must contain four columns: 'start', 'end', 'PLASMID', 'RNA'. 'PLASMID' and 'RNA' are the values for DNA and RNA, which should be non-negative real numbers (average value over multiple replicates) or integers (counts).
`l_min`	The fragments with a length smaller than l_min will not be processed. The default is 150.
`l_max`	The fragments with a length larger than l_max will not be processed. The default is 600.
`f_rna`	The fragments with an RNA count smaller than f_rna will not be processed. The default is 10.
`f_dna`	The fragments with an DNA count smaller than f_rna will not be processed. The default is 0.
`s_a`	A variance hyperparameter in the prior for the latent regulatory scores. The default is 1000.
`verbose`	An indicator of whether to show processing information. The default is FALSE.
`auto`	An indicator of whether to automatically estimate the ridge coefficient λ from the data for each tiled region using a data-driven way described in the reference. The default is TRUE. If auto is TRUE, s_a is ignored and a ridge coefficient is estimated for each tiled region separately. If auto is FALSE, a global user-defined ridge coefficient (1/s_a) is used.
`sig`	An indicator of whether to identify significant motif regions for the estimated scores. Only valid if auto=TRUE. The default is TRUE.
`len`	An indicator of whether to model log(RNA/PLASMID) of each fragment as the average or the sum of the latent regulatory scores. The default is FALSE, which is the sum.
`alpha`	A regional FWER to call high resolution driver elements (the significant regulatory region). The default is 0.05.
`win`	A window size for removing sporadic identified significant regions. If a significant consecutive region is small than win, it will be treated as false signals. The default is 5.
`mse`	An indicator of whether mean square errors are included in the output results. The default is FALSE.
`max_t`	A value between 0 and 1, indicating the proportion of non-zero eigenvectors used to calculate λ when auto=TRUE. The default is 1.

The default value of s_a is set to be 300, which is equivalent to a ridge coefficient of 0.0033. This default ridge coefficient value is selected by the median of the estimated λ from the first library.

score: the regulatory scores for each tiled region. This list contains four components: est_a (the regulatory scores at each locus), sd_e (the sqare root of the mean square error), var_nb (the variance of the esitmate at each locus), λ (the ridge coefficient).

region: the start and end positions for each tiled region.

n_reg: total number of tiled regions.

n_read: the number of reads in each tiled region.

sig_reg: identified high resolution driver elements based on the cutoff.

motif: predicted 20bp motifs

cutoff: the cutoff used to call high resolution driver elements for the tiled region.

Xinchen Wang, Liang He, Sarah Goggin, Alham Saadat, Li Wang, Melina Claussnitzer, Manolis Kellis. High-resolution genome-wide functional dissection of transcriptional regulatory regions in human.