read_geno_prob: Data Input

View source: R/read_mappoly_prob.R

read_geno_probR Documentation

Data Input

Description

Reads an external data file. The format of the file is described in the Details section. This function creates an object of class mappoly.data

Usage

read_geno_prob(
  file.in,
  prob.thres = 0.95,
  filter.non.conforming = TRUE,
  elim.redundant = TRUE,
  verbose = TRUE
)

Arguments

file.in

a character string with the name of (or full path to) the input file which contains the data to be read

prob.thres

probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than prob.thres are considered as missing data for the dosage calling purposes (default = 0.95)

filter.non.conforming

if TRUE (default) converts data points with unexpected genotypes (i.e. no double reduction) to 'NA'. See function segreg_poly for information on expected classes and their respective frequencies.

elim.redundant

logical. If TRUE (default), removes redundant markers during map construction, keeping them annotated to export to the final map.

verbose

if TRUE (default), the current progress is shown; if FALSE, no output is produced

Details

The first line of the input file contains the string ploidy followed by the ploidy level of the parents. The second and third lines contains the strings n.ind and n.mrk followed by the number of individuals in the dataset and the total number of markers, respectively. Lines number 4 and 5 contain the string mrk.names and ind.names followed by a sequence of the names of the markers and the name of the individuals, respectively. Lines 6 and 7 contain the strings dosageP and dosageQ followed by a sequence of numbers containing the dosage of all markers in parent P and Q. Line 8, contains the string seq followed by a sequence of integer numbers indicating the chromosome each marker belongs. It can be any 'a priori' information regarding the physical distance between markers. For example, these numbers could refer to chromosomes, scaffolds or even contigs, in which the markers are positioned. If this information is not available for a particular marker, NA should be used. If this information is not available for any of the markers, the string seq should be followed by a single NA. Line number 9 contains the string seqpos followed by the physical position of the markers into the sequence. The physical position can be given in any unity of physical genomic distance (base pairs, for instance). However, the user should be able to make decisions based on these values, such as the occurrence of crossing overs, etc. Line number 10 should contain the string nphen followed by the number of phenotypic traits. Line number 11 is skipped (Usually used as a spacer). The next elements are strings containing the name of the phenotypic trait with no space characters followed by the phenotypic values. The number of lines should be the same number of phenotypic traits. NA represents missing values. The line number 12 + nphen is skipped. Finally, the last element is a table containing the probability distribution for each combination of marker and offspring. The first two columns represent the marker and the offspring, respectively. The remaining elements represent the probability associated with each one of the possible dosages. NA represents missing data.

Value

an object of class mappoly.data which contains a list with the following components:

ploidy

ploidy level

n.ind

number individuals

n.mrk

total number of markers

ind.names

the names of the individuals

mrk.names

the names of the markers

dosage.p1

a vector containing the dosage in parent P for all n.mrk markers

dosage.p2

a vector containing the dosage in parent Q for all n.mrk markers

chrom

a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence

genome.pos

physical position of the markers into the sequence

seq.ref

NULL (unused in this type of data)

seq.alt

NULL (unused in this type of data)

all.mrk.depth

NULL (unused in this type of data)

prob.thres

probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than 'prob.thres' were considered as missing data in the 'geno.dose' matrix

geno.dose

a matrix containing the dosage for each markers (rows) for each individual (columns). Missing data are represented by ploidy_level + 1

geno

a data.frame containing the probability distribution for each combination of marker and offspring. The first two columns represent the marker and the offspring, respectively. The remaining elements represent the probability associated to each one of the possible dosages. Missing data are converted from NA to the expected segregation ratio using function segreg_poly

n.phen

number of phenotypic traits

phen

a matrix containing the phenotypic data. The rows correspond to the traits and the columns correspond to the individuals

chisq.pval

a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers

kept

if elim.redundant = TRUE, holds all non-redundant markers

elim.correspondence

if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones

Author(s)

Marcelo Mollinari, mmollin@ncsu.edu

References

Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1534/g3.119.400620")}

Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1534/g3.119.400378")}

Examples


#### Tetraploid Example
ft = "https://raw.githubusercontent.com/mmollina/MAPpoly_vignettes/master/data/hexa_sample"
tempfl <- tempfile()
download.file(ft, destfile = tempfl)
SolCAP.dose.prob <- read_geno_prob(file.in  = tempfl)
print(SolCAP.dose.prob, detailed = TRUE)
plot(SolCAP.dose.prob)



mappoly documentation built on May 29, 2024, 6:05 a.m.