prepare_base_and_mix_for_mixed_MCMC: Prepares a baseline and mixture file of genotypes for mixed...

Description Usage Arguments Value Examples

Description

This takes two data frames that should be in the same format: two columns for every locus, the first column of each locus bearing a name of the locus and the second one being whatever it is (but consistent between the mixture and baseline samples). Missing loci should be NA in each column of the locus. Note that the columns of locus data should be the last thing in the data frame, i.e. no columns that are not genetic data should be to the right of any columns that are not genetic data! Both B and M should have rownames which are the IDs of each individual. The function does this:

  1. Extract loci in the mixture file from the baseline file. So, names of loci must be consistent between the files. Error occurs if loci in mixture file do not appear in baseline file.

  2. Define alleles as 0's and 1's and return counts of 0s and 1s in each baseline population, and also return the mixture as an L x n array of 0s, 1s, and 2s.

Usage

1
prepare_base_and_mix_for_mixed_MCMC(B, B_locstart, B_pops, M, M_locstart)

Arguments

B

Baseline data frame

B_locstart

The index of the first column of genetic data in the Baseline.

B_pops

a factor vector giving the population of origin of each individual in the Baseline. This should have levels which are ordered the way they should be.

M

Mixture data frame

M_locstart

The index of the first column of genetic data in the Mixture.

Value

This returns a list. Letting P be the number of populations in the baseline, N be the number of individuals in the mixture and L be the number of SNP loci that have exactly two alleles, this list contains the following components:

zeros

An L x P matrix giving the number of "0" alleles at each locus in each population. The order of loci is as given in the data frame M, and the order of populations is determined by the factor B_pop.

one

Same as above but for the "1" alleles.

mixmat

An L x N matrix of 0s, 1s, 2s, or NAs, giving the genotypes of the fish in the mixture.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# first make baseline and mixture samples
set.seed(5)
grab <- sample(1:nrow(swfsc_chinook_baseline), 400)  # grab these as a mixture
Base <- swfsc_chinook_baseline[-grab, ]
Mix <- swfsc_chinook_baseline[grab, ]

# then prep em
prepped <- prepare_base_and_mix_for_mixed_MCMC(B = Base, B_locstart = 5, B_pops = Base$Pop, M = Mix, M_locstart = 5)

names(prepped)

eriqande/SNPcontam documentation built on May 16, 2019, 8:44 a.m.