Convert a data.frame with positional information to RangedData/GRanges

Share:

Description

Convert a data.frame containing chromosome and position information to a RangedData or GRanges object. Assumes the position information is contained in columns named 'chr', 'start' and 'end' respectively (not case sensitive) although you can enter alternative column names for each as parameters. 'seqnames' will be automatically detected as an alternative to 'chr' if present. If there is a column 'pos' but no columns 'start' and 'end' this will be detected automatically without needing to change the default parameters and start will equal end equals pos (ie., SNPs). Column names that are default GRanges slot names such as 'seqnames', 'ranges', 'strand', 'seqlevels', etc, will be removed during conversion, so rename these if you want them to be translated into the resulting object.

Usage

1
2
3
df.to.ranged(dat, ids = NULL, start = "start", end = "end",
  width = NULL, chr = "chr", exclude = NULL, build = NULL,
  GRanges = FALSE, fill.missing = TRUE)

Arguments

dat

a data.frame with chromosome and position information

ids

character string, an optional column name containing ids which will be used for rownames in the new object, as long as the ids are unique. If not, this option is overridden and the ids will simply be a normal column in the new object.

start

character, the name of a column in the data.frame contain the start point of each range. Not case sensitive. In the case of SNP data, a column called 'pos' will also be automatically detected without modifying 'start' or 'end', and will be used for both start and end.

end

character, the name of a column in the data.frame containing the end point of each range, can also use 'width' as an alternative specifier, in which case 'end' should be set to NULL. Not case sensitive. In the case of SNP data, a column called 'pos' will also be automatically detected without modifying 'start' or 'end', and will be used for both start and end.

width

the name of a column in the data.frame containing 'width' of ranges, e.g, SNPs would be width=0. This is optional, with 'start' and 'end' being the default way to specify an interval. If using 'width' you must also set 'end' to NULL. Not case sensitive.

chr

character, the name of the column in the data.frame containing chromosome values. The default is 'chr' but 'seqnames' will also be detected automatically even when chr='chr'. Not case sensitive.

exclude

character string, and column names from the data.frame to NOT include in the resulting S4 object.

build

the ucsc build for the result object which will apply to the 'universe' (RangedData) or 'genome' slot (GRanges) of the new object.

GRanges

logical, whether the resulting object should be GRanges (TRUE), or RangedData (FALSE)

fill.missing

logical, GRanges/RangedData objects cannot handle missing chrs/positions, so if fill missing is selected, will insert values of chr99, and start=end=1, and if FALSE, will exclude any row with a missing value from the resulting object.

Value

A RangedData or GRanges object. If 'dat' doesn't use the default column names 'chr', 'start'/'end' or 'pos', specify these using parameters 'ids', 'start', and 'end' or 'width'. Exclude will remove prevent any column names of 'dat' specified not to be translated to the returned GRanges object. 'build' specifies the 'genome' slot of the resulting object. 'ids' allows specification of a column to be converted to the rownames of the new object.

See Also

ranged.to.data.frame, df.to.ranged

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
chr <- sample(1:22,10)
start <- end <- sample(1000000,10)
df1 <- cbind(CHR=chr,Start=start,enD=end)
print(df1)
df.to.GRanges(df1) # not case sensitive!
width <- rep(0,10)
df2 <- cbind(chr,start,width)
df.to.GRanges(df2,end=NULL,width="width") # define ranges with start and width
id.col <- paste0("ID",1:10)
rs.id <- paste0("rs",sample(10000,10))
df3 <- cbind(chr,start,end,id.col,rs.id)
df.to.GRanges(df3) # additional columns kept
df4 <- cbind(chr,start,end,id.col,rs.id, ranges=1:10)
df.to.GRanges(df4) # 'ranges' column excluded as illegal name
df.to.GRanges(df4, exclude="rs.id") # manually exclude column
df5 <- cbind(chr,start,end,rs.id)
rownames(df5) <- paste0("ID",1:10)
df.to.GRanges(df5) # rownames are kept
df.to.GRanges(df4,ids="id.col") # use column of 'dat' for rownames

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.