toGRanges | R Documentation |
Transforms a file or an object containing a region set into a
GRanges
object.
toGRanges(A, ..., genome=NULL, sep=NULL, comment.char="#")
A |
a |
... |
further arguments to be passed to other methods. |
genome |
(character or BSgenome) The genome info to be attached to the created GRanges. If NULL no genome info will be attached. (defaults to NULL) |
sep |
(character) The field separator in the text file. If NULL it will be automatically guessed. Only used when reading some file formats. (Defaults to NULL) |
comment.char |
(character) The character marking comment lines. Only used when reading some file formats. (Defaults to "#") |
If A is already a GRanges
object, it will be returned untouched.
If A is a data frame, the function will assume the first three columns are
chromosome, start and end and create a GRanges
object. Any
additional column will be considered metadata and stored as such in the
GRanges
object. There are 2 special cases: 1) if A is a
data.frame with only 2 columns, it will assume the first one is the
chromosome and the second one the position and it will create a GRanges with
single base regions and 2) if the data.frame has the first 3 columns named
"SNP", "CHR" and "BP" it will shuffle the columns and repeat "BP" to build
a GRanges of single base regions (this is the standard ouput format of plink).
If A is not a data.frame and there are more parameters, it will try to build
a data.frame with all parameters and use that data.frame to build the
GRanges. This allows the user to call it like
toGRanges("chr1", 10, 20)
.
If A is a character or a character vector and it's not a file or a URL, it assumes it's a genomic position description in the form used by UCSC or IGV, "chr2:1000-2000". It will try to parse the character strings into chromosome, start and end and create a GRanges. The parser can deal with commas separating thousands (e.g. "chr2:1,000-2,000") and with the comma used as a start/end separator (e.g. "chr2:1000,2000"). These different variants can be mixed in the same character vector.
If A is a "SimpleRleList" it will be interpreted as the result from GenomicRanges::coverage and the function will return a GRanges with a single metadata column named "coverage".
If A is a file name (local or remote) or a connection to a file, it will try
to load it in different ways:
* BED files (identified by a "bed" extension): will be loaded using
rtracklayer::import
function. Coordinates are 0 based as
described in the BED specification (https://genome.ucsc.edu/FAQ/FAQformat.html#format1).
* PLINK assoc files (identified by ".assoc", ".assoc.fisher",
".assoc.dosage", ".assoc.linear", ".assoc.logistic"): will be loaded
as single-base ranges with all original columns present and the SNPs ids
as the ranges names
* Any other file: It assumes the file is a "generic" tabular file. To load
it it will ignore any header line starting with comment.char
,
autodetect the field separator (if not provided by the user),
autodetect if it has a header and read it accordingly.
The genome
parameter can be used to set the genome information of
the created GRanges. It can be either a BSgenome
object or a
character string defining a genome (e.g. "hg19", "mm10"...) as accepted
by the BSgenome::getBSgenome
function. If a valid genome is
given and the corresponding BSgenome package is installed, the genome
information will be attached to the GRanges. If the chromosome naming style
from the GRanges and the genome object are different, it will try to change
the GRanges styles to match those of the genome using
GenomeInfoDb::seqlevelsStyle
.
A GRanges
object with the regions in A
**IMPORTANT:** Regarding the coordinates, BED files are 0 based
while data.frames
and generic files are treated as 1 based. Therefore
reading a line "chr9 100 200" from a BED file will create a 99 bases wide
interval starting at base 101 and ending at 200 but reading it from a txt
file or from a data.frame
will create a 100 bases wide interval
starting at 100 and ending at 200. This is specially relevant in 1bp
intervals. For example, the 10th base of chromosome 1 would be
"chr1 9 10" in a BED file and "chr1 10 10" in a txt file.
toDataframe
A <- data.frame(chr=1, start=c(1, 15, 24), end=c(10, 20, 30), x=c(1,2,3), y=c("a", "b", "c"))
gr1 <- toGRanges(A)
#No need to give the data.frame columns any specific name
A <- data.frame(1, c(1, 15, 24), c(10, 20, 30), x=c(1,2,3), y=c("a", "b", "c"))
gr2 <- toGRanges(A)
#We can pass the data without building the data.frame
gr3 <- toGRanges("chr9", 34229289, 34982376, x="X")
#And each argument can be a vector (they will be recycled as needed)
gr4 <- toGRanges("chr9", c(34229289, 40000000), c(34982376, 50000000), x="X", y=c("a", "b"))
#toGRanges will automatically convert the second and third argument into numerics
gr5 <- toGRanges("chr9", "34229289", "34982376")
#It can be a file from disk
bed.file <- system.file("extdata", "my.special.genes.txt", package="regioneR")
gr6 <- toGRanges(bed.file)
#Or a URL to a valid file
#gr7 <- toGRanges("http://path.to/myfile.bed")
#It can also parse genomic location strings
gr8 <- toGRanges("chr9:34229289-34982376")
#more than one
gr9 <- toGRanges(c("chr9:34229289-34982376", "chr10:1000-2000"))
#even with strange and mixed syntaxes
gr10 <- toGRanges(c("chr4:3873-92928", "chr4:3873,92928", "chr5:33,444-45,555"))
#if the genome is given it is used to annotate the resulting GRanges
gr11 <- toGRanges(c("chr9:34229289-34982376", "chr10:1000-2000"), genome="hg19")
#and the genome is added to the GRanges even if A is a GRanges
gr12 <- toGRanges(gr6, genome="hg19")
#And it will change the chromosome naming of the GRanges to match that of the
#genome if it is possible (using GenomeInfoDb::seqlevelsStyle)
gr2
gr13 <- toGRanges(gr2, genome="hg19")
#in addition, it can convert other objects into GRanges such as the
#result of GenomicRanges::coverage
gr14 <- toGRanges(c("1:1-20", "1:5-25", "1:18-40"))
cover <- GenomicRanges::coverage(gr14)
gr15 <- toGRanges(cover)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.