get_vcf: Get data for a genomic region from a remote VCF file.

Description Usage Arguments Details Value Examples

Description

Returns a list with three dataframes for individuals, SNPs, and genotypes.

Usage

1
get_vcf(chrom, start, end, pop = NA)

Arguments

chrom

a chromosome name (1-22,X) without "chr"

start

a positive integer indicating the start of a genomic region

end

a positive integer indicating the end of a genomic region

pop

the name of a 1000 Genomes population (AMR,AFR,ASN,EUR,...)

Details

Currently, this is hard-coded to access 1000 Genomes phase3 data hosted by Brian Browning (author of BEAGLE):

http://bochet.gcc.biostat.washington.edu/beagle/1000_Genomes_phase3_v5a/

This implementation discards multi-allelic markers that have a "," in the ALT column.

The pop can be any of: ACB, ASW, BEB, CDX, CEU, CHB, CHS, CLM, ESN, FIN, GBR, GIH, GWD, IBS, ITU, JPT, KHV, LWK, MSL, MXL, PEL, PJL, PUR, STU, TSI, YRI. It can also be any super-population: AFR, AMR, EAS, EUR, SAS.

Find more details here: http://www.1000genomes.org/faq/which-populations-are-part-your-study

Value

A list with three dataframes:

ind

A dataframe with information about individuals: Family.ID, Individual.ID, Paternal.ID, Maternal.ID, Gender, Population, Relationship, Siblings, Second.Order, Third.Order, Other.Comments, SuperPopulation

meta

First 8 columns of the VCF file: CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO

geno

Columns 10 onward of the VCF file. All genotypes are converted to 0s and 1s representing REF and ALT alleles. This dataframe has two columns for each individual.

Examples

1
2
3
vcf <- get_vcf(chrom = "12", start = 533090, end = 623090, pop = "AFR")
names(vcf)
 

slowkow/proxysnps documentation built on May 30, 2019, 3:06 a.m.