This extracts a tab delimited file of all canonical transcript models from the GAF. Each model in the output file (row) can be refereed to uniquely by transcript name. This is much simpler than extracting genes
The full-path name of the GAF [REQ]
The output filename. By default will be the same as the
GAF, with ".transcriptModels" appended (hence it will be created in the same
directory by default). This will not overwrite an existing file unless
If the output file exists, setting this
This function is implemented using a unix system command and requires the "grep" program, so this only works on linux/mac systems. [TODO - reimplement in pure R.] The GAF version this works with is the version used for the TCGA RNAseq expression data files. It is available for download at the NCI uncompressed: TCGA.hg19.June2011.gaf or gzipped: TCGA.hg19.June2011.gaf.gz
The number of transcripts in the output file (should be 73,707)
Transcript names correspond to the UCSCgene.Dec2009 release. Transcript names are unique.
These errors are fatal and will terminate processing.
Unsafe character in GAF filename!
An invalid characters was passed as part of the GAF filename. This is important as the filename is used in a system command as a parameter and could be used for command injection.
Can't find the specified GAF: "file"
The specified GAF doesn't seem to exist on the file system. Probably have the name wrong or are using a relative name from the wrong directory, but could also be that permissions are hiding it.
Unsafe character in output transcriptModel filename!
An invalid characters was passed as part of the output filename. This is important as the filename is used in a system command as a parameter and could be used for command injection.
Output file already exists; use force= TRUE to overwrite: "file"
The specified GAF transcript extract output file already exists. You
probably don't want to overwrite it. However, you can set
TRUE to allow this. It will still generate a warning.
Forcing overwrite of output file: "file"
Just letting you know an existing file is actually being overwritten.
This won't happen unless explicitly allowed by setting
TRUE). Having a warning allows distinguishing between the cases where
an overwrite occurred vs those where one was allowed but did not occur.
System commands are used for several things in this function. If they fail, error messages are returned as warnings.
Add test for corner case - one exon transcript
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## Not run: # Extract transcripts to default output file count <- extractTranscriptModels( 'path/to/GAF' ) # Same, with all defaults made explicit count <- extractTranscriptModels( gaf= 'path/to/GAF', outFile= 'path/to/GAF.transcriptModels', force= FALSE ) # Extract transcripts to gaf.transcripts in run directory count <- extractTranscriptModels( 'path/to/GAF', 'gaf.transcripts' ) # Overwrite outFile if it exists (here using the default name) count <- extractTranscriptModels( 'path/to/GAF', force= TRUE ) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.