makeTrainingAndPredictionData: All features from sequence, Riboseq and RNAseq

View source: R/PipelineParts.R

makeTrainingAndPredictionDataR Documentation

All features from sequence, Riboseq and RNAseq

Description

Step 5 of uORFome pipeline

Usage

makeTrainingAndPredictionData(
  df.rfp,
  df.rna,
  organism = get("organism", mode = "character", envir = .GlobalEnv),
  biomart = get("biomart_dataset", envir = .GlobalEnv),
  mode = "uORF",
  features = c("countRFP", "disengagementScores", "entropyRFP", "floss", "fpkmRFP",
    "ioScore", "ORFScores", "RRS", "RSS", "startCodonCoverage", "startRegionCoverage",
    "startRegionRelative"),
  max.artificial.length,
  requiredActiveCds = 30,
  BPPARAM = bpparam()
)

Arguments

df.rfp

ORFik experiment of Ribo-seq

df.rna

ORFik experiment of RNA-seq, set to NULL if you don't have RNA-seq

organism

scientific name of organism, like Homo sapiens, Danio rerio, etc.

biomart

character or NULL, default: get("biomart_dataset", envir = .GlobalEnv)

mode

character, default: "uORF". alternative "aCDS". Do you want to predict on uORFs or artificial CDS. if "aCDS" will run twice once for whole length CDS and one for truncated CDS to validate model works for short ORFs. "CDS" is option to predict on whole CDS.

features

features to train model on, any of the features created during ORFik::computeFeatures, default: c("countRFP", "disengagementScores", "entropyRFP", "floss", "fpkmRFP","ioScore", "ORFScores", "RRS", "RSS", "startCodonCoverage", "startRegionCoverage","startRegionRelative")

max.artificial.length

integer, default: 100, only applies if mode = "aCDS", so ignore this for most people, when creating artificial ORFs from CDS, how large should maximum ORFs be, this number is 1/6 of maximum size of ORFs (max size 600 if artificialLength is 100) Will sample random size from 6 to that number, if max.artificial.length is 2, you can get artificial ORFs of size (6, 9 or 12) (6, + 6 + (3x1), 6 + (3x2))

requiredActiveCds

numeric, default 30. How many CDSs are required to be detected active. Size of minimum positive training set. Will abort if not bigger than this number.

BPPARAM

An instance of a BiocParallelParam class, e.g., MulticoreParam, SnowParam, DoparParam.

Value

invisible(NULL)


Roleren/uORFomePipe documentation built on Jan. 14, 2024, 5:11 a.m.