Description Usage Arguments Author(s) See Also Examples
Functionality to create SQLite databases from several types of VCF including those that represent the genetic diversity of populations (e.g. 1000 genomes) or inbred strains (Sanger mouse genomes project).
1 2 3 | populate.db.tbl.schema.list(db.con, db.schema, ins.vals, use.tables = NULL, should.debug = FALSE)
make.vcf.table(db.schema, window.size, vcf.name, db.con, probe.grange, vcf.type="SNV", use.tables=NULL, limit=NULL, should.debug=FALSE, vcf.param=NULL, filter.func=NULL, filter.params=list())
create.sanger.mouse.vcf.db(vcf.files, vcf.labels, probe.tab.file, strain.names, bs.genome, db.schema, db.name="test.db", keep.category="main", window.size=1000, max.mismatch=1, limit.chr=NULL, should.debug=FALSE, package.info=NULL)
|
db.con |
A connection to an SQLite database. |
db.schema |
An object of class |
ins.vals |
List or other type of data to be inserted into the database using the function in the |
use.tables |
The subset of tables the procedure should be limited to or NULL. |
should.debug |
Logical indicating whether additional messages should be displayed to the user |
window.size |
The number of aligned probes to be processed at a given time |
vcf.name |
Name of the VCF file |
strain.names |
character vector containing the genotype columns to be used |
probe.grange |
GRanges object containing the probe alignments information |
vcf.type |
Label associate with the VCF file, usually 'SNV' or 'INDEL' |
limit |
Number of iterations to be limited to for testing purposes |
vcf.param |
An object of class |
filter.func |
A function to be used to filter a list as returned by |
filter.params |
A list containing named elements with any additional values to be passed to |
vcf.files |
A character vector containing the path to one or more VCF files. |
vcf.labels |
A character vector containing the desired label for each VCF file (e.g. 'SNV' or 'INDEL'). |
probe.tab.file |
A character vector containing the path to the tab delimited probe sequence file distributed by Affymetrix. |
bs.genome |
An object of class |
db.name |
A character vector containing the name of the output database name. |
keep.category |
Categories of probes to keep from the |
max.mismatch |
maximum number of mismatches to allow in the realignments. |
limit.chr |
A character vector containing the chromosomes to consider or NULL in which case all available chromosomes from the specified genome will be used. |
package.info |
Either NULL or a named list containing the following elements: "AUTHOR", "AUTHOREMAIL", "BOWTIE_PATH", "GENOME_PATH", "VCF_QUERY_CMD", "VCF_TYPE". If NULL is supplied, a database will be generated at the path specified in db.name. If the list is provided, a package will be generated containing the database along with other relevant metadata supplemented by the values supplied in package.info. For this list, AUTHOR and AUTHOREMAIL indicates the author of the package and their email. BOWTIE_PATH, GENOME_PATH and VCF_QUERY_CMD indicates the path to a bowtie executable, the genome used for creation of the database and the path to a vcf-query tool from vcftools or htslib, these are not necessary unless the package is to be tested afterwards and can be set to an arbitrary string. They can always be added in later. VCF_TYPE is simply a desription of the type of VCF database created for instance NODvB6 or CC. |
Daniel Bottomly
scanVcf
, TableSchemaList
, ScanVcfParam
, BSgenome
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | if (require(BSgenome.Mmusculus.UCSC.mm9))
{
vcf.files <- om.vcf.file()
vcf.labels <- "SNV"
probe.tab.file <- om.tab.file()
#seven strains besides the reference strain that are used in the CC
strain.names <- c("CASTEiJ", "AJ", "PWKPhJ", "129S1", "NZO", "NODShiLtJ", "WSBEiJ")
bs.genome <- BSgenome.Mmusculus.UCSC.mm9
#probably shouldn't do this in real analyses unless you have to
seqlevels(bs.genome) <- sub("chr", "", seqlevels(bs.genome))
db.schema <- SangerTableSchemaList()
db.name=tempfile()
keep.category <- "main"
window.size <- 1000
max.mismatch <- 0
should.debug <- TRUE
#duplications in this file, remove and recreate...
tab.aln <- read.delim(om.lo.file(), sep="\t", header=TRUE, stringsAsFactors=FALSE)
limit.chr <- GRanges(seqnames="19", ranges=IRanges(start=min(tab.aln$start), end=max(tab.aln$end)/4), strand="*")
create.sanger.mouse.vcf.db(vcf.files, vcf.labels, probe.tab.file, strain.names, bs.genome, db.schema, db.name, keep.category, window.size, max.mismatch, limit.chr, should.debug)
db.con <- dbConnect(SQLite(), db.name)
var.ovls <- dbGetQuery(db.con, "SELECT * FROM probe_to_snp NATURAL JOIN reference NATURAL JOIN probe_align NATURAL JOIN probe_info limit 5")
var.ovls
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.