makeSummarizedExperimentFromBam: Make a count matrix from a library or experiment
In Roleren/ORFik: Open Reading Frames in Genomics

View source: R/SummarizedExperiment_helpers.R

makeSummarizedExperimentFromBam

R Documentation

Make a count matrix from a library or experiment

Description

Make a summerizedExperiment / matrix object from bam files or other library formats sepcified by lib.type argument. Works like HTSeq, to give you count tables per library.

Usage

makeSummarizedExperimentFromBam(
  df,
  saveName = NULL,
  longestPerGene = FALSE,
  geneOrTxNames = "tx",
  region = "mrna",
  type = "count",
  lib.type = "ofst",
  weight = "score",
  forceRemake = FALSE,
  force = TRUE,
  library.names = bamVarName(df),
  BPPARAM = BiocParallel::SerialParam()
)

Arguments

`df`	an ORFik `experiment`
`saveName`	a character (default NULL), if set save experiment to path given. Always saved as .rds., it is optional to add .rds, it will be added for you if not present. Also used to load existing file with that name.
`longestPerGene`	a logical (default FALSE), if FALSE all transcript isoforms per gene. Ignored if "region" is not a character of either: "mRNA","tx", "cds", "leaders" or "trailers".
`geneOrTxNames`	a character vector (default "tx"), should row names keep trancript names ("tx") or change to gene names ("gene")
`region`	a character vector (default: "mrna"), make raw count matrices of whole mrnas or one of (leaders, cds, trailers). Can also be a `GRangesList`, then it uses this region directly. Can then be uORFs or a subset of CDS etc.
`type`	default: "count" (raw counts matrix), alternative is "fpkm", "log2fpkm" or "log10fpkm"
`lib.type`	a character(default: "default"), load files in experiment or some precomputed variant, either "ofst", "bedo", "bedoc" or "pshifted". These are made with ORFik:::convertLibs() or shiftFootprintsByExperiment(). Can also be custom user made folders inside the experiments bam folder.
`weight`	numeric or character, a column to score overlaps by. Default "score", will check for a metacolumn called "score" in libraries. If not found, will not use weights.
`forceRemake`	logical, default FALSE. If TRUE, will not look for existing file count table files.
`force`	logical, default TRUE If TRUE, reload library files even if matching named variables are found in environment used by experiment (see `envExp`) A simple way to make sure correct libraries are always loaded. FALSE is faster if data is loaded correctly already.
`library.names`	character, default: bamVarName(df). Names to load libraries as to environment and names to display in plots.
`BPPARAM`	how many cores/threads to use? default: BiocParallel::SerialParam()

Details

If txdb or gtf path is added, it is a rangedSummerizedExperiment NOTE: If the file called saveName exists, it will then load file, not remake it!
There are different ways of counting hits on transcripts, ORFik does it as pure coverage (if a single read aligns to a region with 2 genes, both gets a count of 1 from that read). This is the safest way to avoid false negatives (genes with no assigned hits that actually have true hits).

Value

a SummarizedExperiment object or data.table if "type" is not "count, with rownames as transcript / gene names.

Examples

##Make experiment
df <- ORFik.template.experiment()
# makeSummarizedExperimentFromBam(df)
## Only cds (coding sequences):
# makeSummarizedExperimentFromBam(df, region = "cds")
## FPKM instead of raw counts on whole mrna regions
# makeSummarizedExperimentFromBam(df, type = "fpkm")
## Make count tables of pshifted libraries over uORFs
uorfs <- GRangesList(uorf1 = GRanges("chr23", 17599129:17599156, "-"))
#saveName <- file.path(dirname(df$filepath[1]), "uORFs", "countTable_uORFs")
#makeSummarizedExperimentFromBam(df, saveName, region = uorfs)
## To load the uORFs later
# countTable(df, region = "uORFs", count.folder = "uORFs")

Roleren/ORFik documentation built on April 12, 2025, 5:31 a.m.