get_vcf: This function is from the 'proxysnps' package available at...

Description Usage Arguments Details Value Examples

View source: R/extraction_1KG.R

Description

Get data for a genomic region from a remote VCF file.

Usage

1
get_vcf(chrom, start, end, pop = NA)

Arguments

chrom

a chromosome name (1-22,X) without "chr"

start

a positive integer indicating the start of a genomic region

end

a positive integer indicating the end of a genomic region

pop

the name of a 1000 Genomes population (AMR,AFR,ASN,EUR,...)

Details

Returns a list with three dataframes for individuals, SNPs, and genotypes.

Currently, this is hard-coded to access 1000 Genomes phase3 data hosted by Brian Browning (author of BEAGLE):

http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/

This implementation discards multi-allelic markers that have a "," in the ALT column.

The pop can be any of: ACB, ASW, BEB, CDX, CEU, CHB, CHS, CLM, ESN, FIN, GBR, GIH, GWD, IBS, ITU, JPT, KHV, LWK, MSL, MXL, PEL, PJL, PUR, STU, TSI, YRI. It can also be any super-population: AFR, AMR, EAS, EUR, SAS.

Find more details here: http://www.1000genomes.org/faq/which-populations-are-part-your-study

Value

A list with three dataframes:

ind

A dataframe with information about individuals: Family.ID, Individual.ID, Paternal.ID, Maternal.ID, Gender, Population, Relationship, Siblings, Second.Order, Third.Order, Other.Comments, SuperPopulation

meta

First 8 columns of the VCF file: CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO

geno

Columns 10 onward of the VCF file. All genotypes are converted to 0s and 1s representing REF and ALT alleles. This dataframe has two columns for each individual.

Examples

1
2
vcf <- get_vcf(chrom = "12", start = 533090, end = 623090, pop = "AFR")
names(vcf)

vincela/VarExp documentation built on May 29, 2019, 12:42 p.m.