merge_datasets: Merge datasets

View source: R/utils.R

merge_datasetsR Documentation

Merge datasets

Description

This function merges two datasets of class mappoly.data. This can be useful when individuals of a population were genotyped using two or more techniques and have datasets in different files or formats. Please notice that the datasets should contain the same number of individuals and they must be represented identically in both datasets (e.g. Ind_1 in both datasets, not Ind_1 in one dataset and ind_1 or Ind.1 in the other).

Usage

merge_datasets(dat.1 = NULL, dat.2 = NULL)

Arguments

dat.1

the first dataset of class mappoly.data to be merged

dat.2

the second dataset of class mappoly.data to be merged (default = NULL); if dat.2 = NULL, the function returns dat.1 only

Value

An object of class mappoly.data which contains all markers from both datasets. It will be a list with the following components:

ploidy

ploidy level

n.ind

number individuals

n.mrk

total number of markers

ind.names

the names of the individuals

mrk.names

the names of the markers

dosage.p1

a vector containing the dosage in parent P for all n.mrk markers

dosage.p2

a vector containing the dosage in parent Q for all n.mrk markers

chrom

a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence

genome.pos

Physical position of the markers into the sequence

seq.ref

if one or both datasets originated from read_vcf, it keeps reference alleles from sequencing platform, otherwise is NULL

seq.alt

if one or both datasets originated from read_vcf, it keeps alternative alleles from sequencing platform, otherwise is NULL

all.mrk.depth

if one or both datasets originated from read_vcf, it keeps marker read depths from sequencing, otherwise is NULL

prob.thres

(unused field)

geno.dose

a matrix containing the dosage for each markers (rows) for each individual (columns). Missing data are represented by ploidy_level + 1

geno

if both datasets contain genotype distribution information, the final object will contain 'geno'. This is set to NULL otherwise

nphen

(0)

phen

(NULL)

chisq.pval

a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers in both datasets

kept

if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers

elim.correspondence

if elim.redundant = TRUE when reading any dataset, holds all non-redundant markers and its equivalence to the redundant ones

Author(s)

Gabriel Gesteira, gdesiqu@ncsu.edu

References

Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. doi: 10.1534/g3.119.400378

Examples


## Loading a subset of SNPs from chromosomes 3 and 12 of sweetpotato dataset 
## (SNPs anchored to Ipomoea trifida genome)
dat <- NULL
for(i in c(3, 12)){
  cat("Loading chromosome", i, "...\n")
    tempfl <- tempfile(pattern = paste0("ch", i), fileext = ".vcf.gz")
    x <- "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch"
    address <- paste0(x, i, ".vcf.gz")
    download.file(url = address, destfile = tempfl)
    dattemp <- read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2",
                        ploidy = 6, verbose = FALSE)
    dat <- merge_datasets(dat, dattemp)
  cat("\n")
}
dat
plot(dat)



mappoly documentation built on Jan. 6, 2023, 1:16 a.m.