pA_multi_logit: pA_multi_logit

View source: R/Tests.R

pA_multi_logitR Documentation

pA_multi_logit

Description

Function to compare the usage of alternative poly A site(s) to a reference (canonical) site.

Usage

pA_multi_logit(data, model, design = NULL, sample_ID = NULL,
  long_output = FALSE)

Arguments

data

Dataset containing poly A (pA) site read counts. This dataset must have a long shape, meaning that there should be only one column containing read counts (and it MUST be named "count"). The first four columns must be called "transcript", "pA.site", "sample" and "count". Thus, each row in data contains the read count for one pA - transcript - sample combination. Other sample attributes beyond sample ID may be recorded in additional variables in this dataset, or provided separately through a design matrix and a key variable (e.g. sample ID) connecting the data and design matrices.

model

Regression model describing the dependence of pA site usage on sample attribute(s).

design

(optional) Design matrix. A matrix describing sample attributes which can be used as predictors in the regression model.

sample_ID

(optional) A key variable connecting the counts dataset (data) and the design matrix.

long_output

Logical variable describing output format. FALSE: Only regression coefficients and p-values are reported. TRUE: Standard error of regression coefficients, and z scores are also included in the output. Default: FALSE.

Details

This function uses a multinomial logistic regression algorithm from the nnet package. For each transcript, one poly A site (pA) is set as the canonical (reference) site and the usage of alternative pA(s) is compared to this reference pA. By default, the pA that comes first alphabetically is used as reference. The user can specify the reference pA for each transcript by adding a prefix like 0_ to its name. If a transcript has n pA sites, n-1 comparisons will be made. Transcripts with only one pA site should be removed from data before running this function.

Value

Log ratios (multinomial logistic regression coefficients) and p-values describing the effect of predictor(s) specified in the model on the usage ratio of each alternative to reference pA site per trancript. If a long output is requested, SE and z scores are also reported.

Examples

fit1_pA <- pA_multi_logit(pA.toy2, pA.site ~ cell_line, pA_design, "sample")

goodarzilab/APAlog documentation built on March 25, 2022, 3:40 p.m.