matchGeneIDs: Match Gene IDs from query GTF/GFF3 file

Description Usage Arguments Value

View source: R/matchGeneIDs.R

Description

This function will match and correct Gene IDs from a query assembled transcript file, using a transcript annotation as reference.

The default approach to this correction relies on finding overlaps between transcripts in query with transcripts in reference. Using this method alone could result in false positive matches (19 percent false positives). To improve this, users have an option to activate two additional layers of matching. (1) Matching by ENSEMBL Gene_IDs. If both query and reference transcript annotations containg Ensembl-style Gene IDs, this program will try to match both IDs in a less stringent manner. This correction can be invoked by providing the 'primary_gene_id' argument

(2) Matching by secondary Gene_IDs. Depending on the transcript assembly program, GTF/GFF3 annotations may contain additional comments on the transcript information. This may include a distinct secondary Gene ID annotation that potentially matches with the reference. To invoke this correction, provide 'primary_gene_id' and 'secondary_gene_id' arguments. To determine if your transcript assembly contain possible secondary Gene IDs, try importing query GTF file using rtracklayer package and check its metadata columns

Usage

1
2
3
4
5
6
matchGeneIDs(
  inputGRanges,
  basicGRanges,
  primary_gene_id = NULL,
  secondary_gene_id = NULL
)

Arguments

primary_gene_id

Name of the primary gene id in query file. Input to this argument is typically 'gene_id'

secondary_gene_id

Name of the secondary gene id in query file. Example of input to this arguement is 'ref_gene_id'

query

Mandatory. Path to query GTF/GFF3 transcript annotation file

ref

Mandatory. Path to reference GTF/GFF3 transcript annotation file.

Value

Gene_id-matched query GRanges


fursham-h/ponder documentation built on Dec. 27, 2019, 12:15 a.m.