universegroup: Define Universe and Group of Genes Based on Expression Data

View source: R/universegroup.R

universegroupR Documentation

Define Universe and Group of Genes Based on Expression Data

Description

This function categorizes genes into a "Universe" and assigns them into groups such as "Attenuated" or "Outgroup" based on transcription data and thresholds. The universe is defined by thresholds for window size, missing data count, mean transcription levels, and p-values. Genes are further classified into groups based on conditions related to AUC and p-value thresholds.

Usage

universegroup(completedf, expdf, controlname = "ctrl", stressname = "HS",
windsizethres = 50, countnathres = 20, meanctrlthres = 0.5,
meanstressthres = 0.5, pvaltheorythres = 0.1, aucctrlthreshigher = -10,
aucctrlthreslower = 15, aucstressthres = 15, attenuatedpvalksthres = 2,
outgrouppvalksthres = 0.2, showtime = FALSE, verbose = TRUE)

Arguments

completedf

A data frame obtained with the function attenuation.

expdf

A data frame containing experiment data that should have columns named 'condition', 'replicate', 'strand', and 'path'.

controlname

A string representing the control condition name. Default is "ctrl".

stressname

A string representing the stress condition name. Default is "HS".

windsizethres

A numeric threshold for the minimum window size. Default is 50.

countnathres

A numeric threshold for the maximum number of missing data points (NA values). Default is 20.

meanctrlthres

A numeric threshold for the minimum mean transcription value in the control condition. Default is 0.5.

meanstressthres

A numeric threshold for the minimum mean transcription value in the stress condition. Default is 0.5.

pvaltheorythres

A numeric threshold for the minimum p-value used to define the universe of genes. Default is 0.1.

aucctrlthreshigher

A numeric threshold for the lower bound of the control AUC value in the outgroup classification. Default is -10.

aucctrlthreslower

A numeric threshold for the upper bound of the control AUC value in the outgroup classification. Default is 15.

aucstressthres

A numeric threshold for the minimum stress AUC value used to classify attenuated genes. Default is 15.

attenuatedpvalksthres

A numeric threshold for the negative log10 of the p-value (from KS test) for defining attenuated genes. Default is 2.

outgrouppvalksthres

A numeric threshold for the maximum KS p-value used to define the outgroup. Default is 0.2.

showtime

A logical value indicating if the duration of the function processing should be indicated before ending. Defaults to FALSE.

verbose

A logical flag indicating whether to print progress messages. Defaults to TRUE.

Details

A transcript belongs to "Universe" if: window_size > windsizethres & Count_NA < countnathres & meanctrl > meanctrlthres & meanstress > meanstressthres & pvaltheory > pvaltheorythres

If only one condition is provided, a transcript belongs to "Universe" if: window_size > windsizethres & Count_NA < countnathres & meanctrl > meanctrlthres & pvaltheory > pvaltheorythres

A transcript belongs to the groups: - Attenuated: if Universe == TRUE & aucstress > aucstressthres & -log10(pvalks) > attenuatedpvalksthres - Outgroup: if Universe == TRUE & pvalks > outgrouppvalksthres & aucctrl > aucctrlthreshigher & aucctrl < aucctrlthreslower

If only one condition is provided: - Attenuated: if Universe == TRUE & aucctrl > aucctrlthreslower - Outgroup: if Universe == TRUE & aucctrl > aucctrlthreshigher & aucctrl < aucctrlthreslower

This function is useful for classifying genes in transcriptomics data based on their transcriptional response to different experimental conditions.

Value

A modified data frame with two additional columns: Universe, indicating whether each gene is part of the universe, and Group, classifying the genes into groups such as "Attenuated", "Outgroup", or NA.

See Also

[attenuation]

Examples

exppath <-  system.file("extdata", "exptab.csv", package="tepr")
transpath <- system.file("extdata", "cugusi_6.tsv", package="tepr")
expthres <- 0.1

## Calculating necessary results
expdf <- read.csv(exppath)
transdf <- read.delim(transpath, header = FALSE)
avfilt <- averageandfilterexprs(expdf, transdf, expthres,
       showtime = FALSE, verbose = FALSE)
rescountna <- countna(avfilt, expdf, nbcpu = 1, verbose = FALSE)
ecdf <- genesECDF(avfilt, expdf, verbose = FALSE)
resecdf <- ecdf[[1]]
nbwindows <- ecdf[[2]]
resmeandiff <- meandifference(resecdf, expdf, nbwindows,
    verbose = FALSE)
bytranslistmean <- split(resmeandiff, factor(resmeandiff$transcript))
resknee <- kneeid(bytranslistmean, expdf, verbose = FALSE)
resauc <- allauc(bytranslistmean, expdf, nbwindows, verbose = FALSE)
resatt <- attenuation(resauc, resknee, rescountna, bytranslistmean, expdf,
        resmeandiff, verbose = FALSE)
## Testing universegroup
resug <- universegroup(resatt, expdf, verbose = FALSE)


tepr documentation built on June 8, 2025, 10:46 a.m.