extractTranscriptModels: Extract transcript models from the GAF

Description Usage Arguments Details Value GAF transcript names Errors Warnings Todo Examples

View source: R/gafGeneModels.R

Description

This extracts a tab delimited file of all canonical transcript models from the GAF. Each model in the output file (row) can be refereed to uniquely by transcript name. This is much simpler than extracting genes

Usage

1
2
extractTranscriptModels(gaf, outFile = paste0(gaf, ".transcriptModels"),
  force = FALSE)

Arguments

gaf

The full-path name of the GAF [REQ]

outFile

The output filename. By default will be the same as the GAF, with ".transcriptModels" appended (hence it will be created in the same directory by default). This will not overwrite an existing file unless force = TRUE.

force

If the output file exists, setting this TRUE will allow overwriting it. Doing so generates a warning.

Details

This function is implemented using a unix system command and requires the "grep" program, so this only works on linux/mac systems. [TODO - reimplement in pure R.] The GAF version this works with is the version used for the TCGA RNAseq expression data files. It is available for download at the NCI uncompressed: TCGA.hg19.June2011.gaf or gzipped: TCGA.hg19.June2011.gaf.gz

Value

The number of transcripts in the output file (should be 73,707)

GAF transcript names

Transcript names correspond to the UCSCgene.Dec2009 release. Transcript names are unique.

Errors

These errors are fatal and will terminate processing.

Unsafe character in GAF filename!

An invalid characters was passed as part of the GAF filename. This is important as the filename is used in a system command as a parameter and could be used for command injection.

Can't find the specified GAF: "file"

The specified GAF doesn't seem to exist on the file system. Probably have the name wrong or are using a relative name from the wrong directory, but could also be that permissions are hiding it.

Unsafe character in output transcriptModel filename!

An invalid characters was passed as part of the output filename. This is important as the filename is used in a system command as a parameter and could be used for command injection.

Output file already exists; use force= TRUE to overwrite: "file"

The specified GAF transcript extract output file already exists. You probably don't want to overwrite it. However, you can set force= TRUE to allow this. It will still generate a warning.

Warnings

Forcing overwrite of output file: "file"

Just letting you know an existing file is actually being overwritten. This won't happen unless explicitly allowed by setting force= TRUE). Having a warning allows distinguishing between the cases where an overwrite occurred vs those where one was allowed but did not occur.

Various warnings from failed system commands

System commands are used for several things in this function. If they fail, error messages are returned as warnings.

Todo

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 

# Extract transcripts to default output file
count <- extractTranscriptModels( 'path/to/GAF' )

# Same, with all defaults made explicit
count <- extractTranscriptModels(
   gaf= 'path/to/GAF', outFile= 'path/to/GAF.transcriptModels', force= FALSE
)

# Extract transcripts to gaf.transcripts in run directory
count <- extractTranscriptModels( 'path/to/GAF', 'gaf.transcripts' )

# Overwrite outFile if it exists (here using the default name)
count <- extractTranscriptModels( 'path/to/GAF', force= TRUE )

## End(Not run)

jefferys/FusionExpressionPlot documentation built on May 19, 2019, 3:59 a.m.