read_vcf: Data Input VCF

View source: R/read_mappoly_vcf.R

read_vcfR Documentation

Data Input VCF

Description

Reads an external VCF file and creates an object of class mappoly.data

Usage

read_vcf(
  file.in,
  parent.1,
  parent.2,
  ploidy = NA,
  filter.non.conforming = TRUE,
  thresh.line = 0.05,
  min.gt.depth = 0,
  min.av.depth = 0,
  max.missing = 1,
  elim.redundant = TRUE,
  verbose = TRUE,
  read.geno.prob = FALSE,
  prob.thres = 0.95
)

Arguments

file.in

a character string with the name of (or full path to) the input file which contains the data (VCF format)

parent.1

a character string containing the name of parent 1

parent.2

a character string containing the name of parent 2

ploidy

the species ploidy (optional, it will be automatically detected)

filter.non.conforming

if TRUE (default) converts data points with unexpected genotypes (i.e. no double reduction) to 'NA'. See function segreg_poly for information on expected classes and their respective frequencies.

thresh.line

threshold used for p-values on segregation test (default = 0.05)

min.gt.depth

minimum genotype depth to keep information. If the genotype depth is below min.gt.depth, it will be replaced with NA (default = 0)

min.av.depth

minimum average depth to keep markers (default = 0)

max.missing

maximum proportion of missing data to keep markers (range = 0-1; default = 1)

elim.redundant

logical. If TRUE (default), removes redundant markers during map construction, keeping them annotated to export to the final map.

verbose

if TRUE (default), the current progress is shown; if FALSE, no output is produced

read.geno.prob

If genotypic probabilities are available (PL field), generates a probability-based dataframe (default = FALSE).

prob.thres

probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than prob.thres are considered as missing data for the dosage calling purposes (default = 0.95)

Details

This function can handle .vcf files versions 4.0 or higher. The ploidy can be automatically detected, but it is highly recommended that you inform it to check for mismatches. All individual and marker names will be kept as they are in the .vcf file.

Value

An object of class mappoly.data which contains a list with the following components:

ploidy

ploidy level

n.ind

number individuals

n.mrk

total number of markers

ind.names

the names of the individuals

mrk.names

the names of the markers

dosage.p1

a vector containing the dosage in parent P for all n.mrk markers

dosage.p2

a vector containing the dosage in parent Q for all n.mrk markers

chrom

a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence

genome.pos

Physical position of the markers into the sequence

seq.ref

Reference base used for each marker (i.e. A, T, C, G)

seq.alt

Alternative base used for each marker (i.e. A, T, C, G)

prob.thres

(unused field)

geno.dose

a matrix containing the dosage for each markers (rows) for each individual (columns). Missing data are represented by ploidy_level + 1

geno

a dataframe containing all genotypic probabilities columns for each marker and individual combination (rows). Missing data are represented by ploidy_level + 1

nphen

(unused field)

phen

(unused field)

all.mrk.depth

DP information for all markers on VCF file

chisq.pval

a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers

kept

if elim.redundant = TRUE, holds all non-redundant markers

elim.correspondence

if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones

Author(s)

Gabriel Gesteira, gdesiqu@ncsu.edu

References

Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. doi: 10.1534/g3.119.400620

Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi: 10.1534/g3.119.400378

Examples


## Hexaploid sweetpotato: Subset of chromosome 3
fl = "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch3.vcf.gz"
tempfl <- tempfile(pattern = 'chr3_', fileext = '.vcf.gz')
download.file(fl, destfile = tempfl)
dat.dose.vcf = read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2")
print(dat.dose.vcf)
plot(dat.dose.vcf)



mappoly documentation built on Jan. 6, 2023, 1:16 a.m.