parent.assign.fun: parent.assign.fun

Description Usage Arguments Value References Examples

View source: R/parent.assign.fun.R

Description

This function assigns parents to pooled samples using one of seven approaches:

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
parent.assign.fun(
  method,
  beta.min.ss = FALSE,
  discrete.method = "geno.probs",
  threshold.indiv = NULL,
  threshold.pools = NULL,
  snp.dat.indiv,
  snp.dat.pools,
  n.in.pools,
  min.intensity = 0,
  snp.error.assumed = NULL,
  snp.error.underlying = NULL,
  snp.param.indiv = NULL,
  snp.param.pools = NULL,
  min.sd = 0,
  fams,
  fam.set.combns = NULL,
  fam.set.combns.by.pool = NULL,
  skip.checks = FALSE
)

Arguments

method

is a vector of methods to be implemented (e.g. c("Quantitative", "Discrete", "Exclusion", "Least_squares"))

beta.min.ss

is a logical variable appicable to least_squares method only (default = FALSE). If TRUE, the sum of squares of all parental combinations are computed and the combination with the minimum value is identified. Refer to Hamilton 2020.

discrete.method

is a character variable applicable to the "Discrete" or "Exclusion" methods only (default = "geno.probs"). It must equal either:

  • "geno.probs" in which case discrete genotypes for parents and pools are derived from genotype probabilities.

  • "assigned.genos" in which case discrete genotypes for parents and pools are obtained directly from the snp.dat.indiv and snp.dat.pools inputs.

threshold.indiv

is a numeric variable between 0 and 1 inclusive applicable to the "Discrete" or "Exclusion" methods only when discrete.method = "geno.probs" (default = NULL). A discrete genotype is assigned to the the most likely genotype in the quantitative ordered genotype probability matrix Gij if it is greater than threshold.indiv (or threshold.indiv / 2 for the two heterozygous genotypes). Otherwise the genotype is deemed missing (refer to the left hand side of page 5 of Henshall et al. 2014)

threshold.pools

is a numeric variable between 0 and 1 inclusive applicable to the "Discrete" or "Exclusion" methods only when discrete.method = "geno.probs" (default = NULL). Equivalent to threshold.indiv for pooled DNA samples.

snp.dat.indiv

is a data frame with the following headings (class in parentheses):

  • 'SAMPLE_ID' is the sample identifier. Samples must be from diploid individuals (i.e. not pools) (integer).

  • 'SNP_ID' is the SNP identifier (character).

  • 'INTENSITY_A' is the area/intensity for allele A (numeric).

  • 'INTENSITY_B' is the area/intensity for allele B (numeric).

  • 'A_ALLELE' is the base represented by allele A (i.e. 'A', 'C', 'G' or 'T') (character).

  • 'B_ALLELE' is the base represented by allele B (i.e. 'A', 'C', 'G' or 'T') (character).

  • 'GENOTYPE' is the SNP genotype call (e.g. 'AT', 'TT'). NA if missing (character).

snp.dat.pools

is a data frame with the following headings. Note that all pooled DNA samples in this dataframe must be comprised of DNA from the same number of individuals (see n.in.pools) (class in parentheses):

  • SAMPLE_ID is the pooled sample identifier (integer).

  • SNP_ID is the SNP identifier (character).

  • INTENSITY_A is the signal intensity for allele A. Not required if method does not include 'Quantitative' or 'Least_squares' and discrete.method = "geno.probs" (numeric).

  • INTENSITY_B is the signal intensity for allele B. Not required if method does not include 'Quantitative' or 'Least_squares' and discrete.method = "geno.probs" (numeric).

  • GENOTYPE is the assigned unordered genotype. Not required if discrete.method = "geno.probs" (character).

n.in.pools

is an integer variable representing the number of individual that contributed DNA to each sample in snp.dat.pools

min.intensity

is a numeric variable (default = 0). If the square root of the sum of INTENSITY_A squared and INTENSITY_B squared in snp.dat.indiv or snp.dat.pools is less than min.intensity then this record is excluded. That is, observations that fall into an arc with a radius equal to min.intensity in the lower left of signal intensity scatter plots are excluded.

snp.error.assumed

Must be one of (default = NULL):

  • NULL. Note that if snp.error.assumed is NULL then snp.error.underlying must not be NULL.

  • a numeric variable between 0 and 1, in which case the 'assumed error rate' (see Henshall et al 2014) is the same across all SNP.

  • a data frame with columns SNP_ID and SNP_ERROR_TILDE (see Henshall et al 2014).

snp.error.underlying

is not used if snp.error.assumed is not NULL (default = NULL). Must be either:

  • NULL.

  • a numeric variable between 0 and 1 inclusive. Used to comptute SNP_ERROR_TILDE from SNP_ERROR_HAT according to the approach outlined on the left of page 5 of Henshall et al. 2014 using individual (i.e. not pooled) data only. If snp.error.underlying = 0 then SNP_ERROR_TILDE = SNP_ERROR_HAT.

snp.param.indiv

is the output of snp.param.indiv.fun (default = NULL). That is, it is a data frame with the following headings (class in parentheses):

  • 'SNP_ID' is the SNP identifier (character).

  • 'N_AA' is the count of homozygous A (AA) genotypes (integer).

  • 'MEAN_P_AA' is the mean of allelic proportion for homozygous A genotypes (numeric).

  • 'SD_P_AA' is the standard deviation of allelic proportion for homozygous A genotypes (numeric).

  • 'N_AB' is the count of heterozygous (AB) genotypes (integer).

  • 'MEAN_P_AB' is the mean of allelic proportion for heterozygous (AB) genotypes (numeric).

  • 'SD_P_AB' is the standard deviation of allelic proportion for heterozygous (AB) genotypes (numeric).

  • 'N_BB' is the count of homozygous B (BB) genotypes (integer).

  • 'MEAN_P_BB' is the mean of allelic proportion for homozygous B genotypes (numeric).

  • 'SD_P_BB' is the standard deviation of allelic proportion for homozygous B genotypes (numeric).

  • 'WELCH_A' is the welsh statistic for the interval between MEAN_P_AA and MEAN_P_AB (numeric).

  • 'WELCH_B' is the welsh statistic for the interval between MEAN_P_AB and MEAN_P_BB (numeric).

  • 'A_ALLELE_FREQ' is the A allele frequency computed from genotype counts (numeric).

  • 'B_ALLELE_FREQ' is the B allele frequency computed from genotype counts (numeric).

  • 'A_ALLELE' is the base represented by allele A (i.e. 'A', 'C', 'G' or 'T') (character).

  • 'B_ALLELE' is the base represented by allele B (i.e. 'A', 'C', 'G' or 'T') (character).

snp.param.pools

is the output of snp.param.pools.fun. That is, it is a data frame with the following headings (class in parentheses)

  • 'SNP_ID' is the SNP identifier (character).

  • 'MEAN_P_AAAA' is the mean of allelic proportion for homozygous A genotypes (numeric).

  • 'SD_P_AAAA' is the standard deviation of allelic proportion for homozygous A genotypes (numeric).

  • 'MEAN_P_AAAB' is the mean of allelic proportion for unordered AAAB genotypes (numeric).

  • 'SD_P_AAAB' is the standard deviation of allelic proportion for unordered AAAB genotypes (numeric).

  • 'MEAN_P_AABB' is the mean of allelic proportion for unordered AABB genotypes (numeric).

  • 'SD_P_AABB' is the standard deviation of allelic proportion for unordered AABB genotypes (numeric).

  • 'MEAN_P_ABBB' is the mean of allelic proportion for unordered ABBB genotypes (numeric).

  • 'SD_P_ABBB' is the standard deviation of allelic proportion for unordered ABBB genotypes (numeric).

  • 'MEAN_P_BBBB' is the mean of allelic proportion for homozygous B genotypes (numeric).

  • 'SD_P_BBBB' is the standard deviation of allelic proportion for homozygous B genotypes (numeric).

  • 'A_ALLELE' is the base represented by allele A (i.e. 'A', 'C', 'G' or 'T') (character).

  • 'B_ALLELE' is the base represented by allele B (i.e. 'A', 'C', 'G' or 'T') (character).

min.sd

is a numberic variable defining a lower bound to be applied to estimates of the standard deviation of allelic proportion for genotypes in snp.param.indiv and snp.param.pools (default = 0)

fams

is a data frame with the following headings (class in parentheses):

  • 'FAMILY_ID' is the family identifier (integer).

  • 'SIRE_ID' is the sire identifier (integer).

  • 'DAM_ID' is the dam identifier (integer).

fam.set.combns

is a data frame with the following headings (class in parentheses). Note: if fam.set.combns = NULL (see 'pooling by phenotype' example below), FAMILY_ID is taken from the 'fams' and duplicated n.in.pools times, FAM_SET_ID = 1 for the first duplication of FAMILY_IDs, 2 for the second etc and FAM_SET_COMBN_ID = 1 (default = NULL):

  • 'FAM_SET_COMBN_ID' is the family set combination identifier (integer). A 'family set combination' is a combination of 'family sets'. Each pooled sample must be associated with one only family set combination but a family set combination may be assoicated with multiple pooled samples using the fam.set.combns.by.pool input below.

  • 'FAM_SET_ID' is the family set identifier (integer). A 'family set' is a group of families of which one is known to be the true family of one of the individuals in a pooled sample. Within each 'family set combination' there must be a 'family set' for each individual in a pooled sample (i.e. if n.in.pools = 2 there must be two family sets in each family set combination)

  • 'FAMILY_ID' is the family identifier (integer).

fam.set.combns.by.pool

is a data frame linking pooled samples with family set combinations. It has the following headings (class in parentheses). Note: if fam.set.combns is NULL (see 'pooling by phenotype' example below), fam.set.combns.by.pool is made NULL. If fam.set.combns.by.pool = NULL, FAM_SET_COMBN_ID = 1 and SAMPLE_ID is taken from the 'snp.dat.pools' input (default = NULL):

  • SAMPLE_ID is the pooled sample identifier (integer).

  • 'FAM_SET_COMBN_ID' is the family set combination identifier (integer).

skip.checks

is a logical variable. If FALSE parent.assign.fun data checks are not undertaken.

Value


Primary outputs

most.like.parents.quant

Applicable when method = "Quantitative". Identifies the most likely parental combination and delta LODs for individual parents. Second most likely (alternative) parents are also presented. Refer to Hamilton 2020. Example fields for n.in.pools = 2:


most.like.parents.discrete

Applicable when method = "Discrete". Identifies the most likely parental combination and delta LODs for individual parents. Second most likely (alternative) parents are also presented. Refer to Hamilton 2020. Example fields for n.in.pools = 2:


most.like.parents.excl

Applicable when method = "Exclusion". Identifies the most likely parental combination. Refer to Hamilton 2020. Second most likely (alternative) parental combination is also presented. Example fields for n.in.pools = 2:


most.like.parents.excl.non.dup

Applicable when method = "Exclusion". Identifies the most likely parental combination - simplified output with multiple combinations with the same number of mismatches (duplicated SAMPLE_IDs) removed. Refer to Hamilton 2020. Example fields for n.in.pools = 2:


beta

Applicable when method = "Least_squares". Identifies the most likely parental combination:


Primary plots

bar.png

discrete.png

quantitative.png


Intermediate outputs

'Dij' Applicable when method = "Discrete" or "Exclusion". Refer to Henshall et al. 2014 #:

'dkj' Applicable when method = "Discrete" or "Exclusion". Refer to Hamilton 2020 #:

'dklj.adj' Applicable when method = "Discrete" or "Exclusion". Refer to Hamilton 2020 (dkj.star) #:

'fkj.and.weight' Applicable when method = "Least_squares". Refer to Henshall et al. 2014 #:

'Gij' Applicable when method = "Discrete" or ""Quantitative". Refer to Henshall et al. 2014 #:

'gkj' Applicable when method = "Discrete" or ""Quantitative". Refer to Hamilton 2020 #:

'gklj.adj' Applicable when method = "Discrete" or ""Quantitative". Refer to Hamilton 2020 (gkj.star) #:

'flj.probs' Applicable when discrete.method = "geno.probs". Refer to Hamilton 2020 (fj) #:

'flj.geno' Applicable when discrete.method = "assigned.genos". Refer to Hamilton 2020 (fj) #:

'lambda.kj' Applicable when method = "Discrete" or ""Quantitative". Refer to Hamilton 2020. #Example output for n.in.pools = 2#:

'lod.duos.discrete' Applicable when method = "Discrete". Refer to Hamilton 2020. #:

'lod.duos.quant' Applicable when method = "Quantitative". Refer to Hamilton 2020. #:

'logl.duos.discrete' Applicable when method = "Discrete". Refer to Hamilton 2020. Only outputted for the final SAMPLE_ID #:

'logl.duos.quant' Applicable when method = "Quantitative". Refer to Hamilton 2020. Only outputted for the final SAMPLE_ID #:

'mismatches' Applicable when method = "Exclusion". Refer to Hamilton 2020. #:

'mismatches.by.snp' Applicable when method = "Exclusion". Refer to Hamilton 2020. #Example output for n.in.pools = 2:

'nlj.probs' Applicable when discrete.method = "geno.probs". Refer to Hamilton 2020 (nj). #Example output for n.in.pools = 2:

'nlj.geno' Applicable when discrete.method = "geno.probs". Refer to Hamilton 2020 (nj). # Example output for n.in.pools = 2:

'parent.combns':

'phi.ij' Applicable when method = "Discrete" or ""Quantitative". Refer to Henshall et al. 2014 #:

'snp.error.probs' Applicable when discrete.method = "geno.probs". Refer to Henshall et al. 2014 #:

'snp.error.geno' Applicable when discrete.method = "assigned.genos". Refer to Henshall et al. 2014 #:

'tclj.adj.quant' Applicable when method = "Quantitative". Refer to Hamilton 2020 (tcj.star). #Example output for n.in.pools = 2:

'tclj.adj.discrete' Applicable when method = "Discrete". Refer to Hamilton 2020 (tcj.star). #Example output for n.in.pools = 2:

'tclj.discrete' Applicable when method = "Discrete". Refer to Hamilton 2020 (tcj). #Example output for n.in.pools = 2:

'tclj.ls' Applicable when method = "Least_squares". Refer to Hamilton 2020 (tcj). 3:

'tclj.quant' Applicable when method = "Quantitative". Refer to Hamilton 2020 (tcj). #Example output for n.in.pools = 2::

'Xl.mat' List Applicable when method = "Quantitative". Refer to Henshall et al 2014 (X) #:

References

Henshall JM, Dierens, L Sellars MJ (2014) Quantitative analysis of low-density SNP data for parentage assignment and estimation of family contributions to pooled samples. Genetics Selection Evolution 46, 51. https://doi 10.1186/s12711-014-0051-y

Hamilton MG (2020) Maximum likelihood parentage assignment using quantitative genotypes

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
#' #Retrieve data for 'pooling by phenotype' example from Hamilton 2020
data(shrimp.snp.dat.indiv)
data(shrimp.snp.dat.pools)
data(shrimp.fams)

#Compute SNP parameters
shrimp.snp.param.indiv <- snp.param.indiv.fun(shrimp.snp.dat.indiv)
shrimp.snp.param.pools <- snp.param.pools.fun(shrimp.snp.param.indiv, n.in.pools = 2)

#Assign parentage using the quantitative maximum likelihood method
parent.assign.fun(method= "Quantitative",
                  snp.dat.indiv = shrimp.snp.dat.indiv, 
                  snp.dat.pools = shrimp.snp.dat.pools,
                  n.in.pools = 2,
                  snp.error.assumed = 0.01,
                  snp.param.indiv = shrimp.snp.param.indiv,
                  snp.param.pools = shrimp.snp.param.pools,                  
                  fams = shrimp.fams)  

#Retrieve data for 'pooling for individual parentage assignment' example from Hamilton 2020
data(ab.snp.dat.indiv)
data(ab.snp.dat.pools)
data(ab.fams)
data(ab.fam.set.combns)
data(ab.fam.set.combns.by.pool)

#Compute SNP parameters
ab.snp.param.indiv <- snp.param.indiv.fun(ab.snp.dat.indiv)
ab.snp.param.pools <- snp.param.pools.fun(ab.snp.param.indiv, n.in.pools = 3)

#Assign parentage using the quantitative maximum likelihood method
parent.assign.fun(method= "Quantitative",
                  snp.dat.indiv = ab.snp.dat.indiv, 
                  snp.dat.pools = ab.snp.dat.pools,
                  n.in.pools = 3,
                  snp.error.assumed = 0.01,
                  snp.param.indiv = ab.snp.param.indiv,
                  snp.param.pools = ab.snp.param.pools,                  
                  fams = ab.fams,
                  fam.set.combns = ab.fam.set.combns,
                  fam.set.combns.by.pool = ab.fam.set.combns.by.pool) 
                  
#Retrieve data for small worked example from Hamilton 2020
data(Ham.snp.dat.indiv)
data(Ham.snp.dat.pools)
data(Ham.fams)
data(Ham.fam.set.combns)
data(Ham.fam.set.combns.by.pool)

#Compute SNP parameters
Ham.snp.param.indiv <- snp.param.indiv.fun(Ham.snp.dat.indiv)
Ham.snp.param.pools <- snp.param.pools.fun(Ham.snp.param.indiv, n.in.pools = 2)

#Assign parentage using the least squares method
parent.assign.fun(method = "Least_squares",
                  beta.min.ss = TRUE, 
                  snp.dat.indiv = Ham.snp.dat.indiv, 
                  snp.dat.pools = Ham.snp.dat.pools,
                  n.in.pools = 2,
                  snp.error.assumed = 0.01,
                  snp.param.indiv = Ham.snp.param.indiv,
                  snp.param.pools = Ham.snp.param.pools,                  
                  fams = Ham.fams,
                  fam.set.combns = Ham.fam.set.combns,
                  fam.set.combns.by.pool = Ham.fam.set.combns.by.pool)
                  
#Assign parentage using the quantitative maximum likelihood method
parent.assign.fun(method= "Quantitative",
                  threshold.indiv = 0.98,         
                  threshold.pools = 0.98,         
                  snp.dat.indiv = Ham.snp.dat.indiv, 
                  snp.dat.pools = Ham.snp.dat.pools,
                  n.in.pools = 2,
                  snp.error.assumed = 0.01,
                  snp.param.indiv = Ham.snp.param.indiv,
                  snp.param.pools = Ham.snp.param.pools,                  
                  fams = Ham.fams,
                  fam.set.combns = Ham.fam.set.combns,
                  fam.set.combns.by.pool = Ham.fam.set.combns.by.pool)  
                  
#Assign parentage using the discrete maximum likelihood method 
#(discrete.method = "geno.probs")
parent.assign.fun(method= "Discrete",
                  discrete.method = "geno.probs",
                  threshold.indiv = 0.98,         
                  threshold.pools = 0.98,         
                  snp.dat.indiv = Ham.snp.dat.indiv, 
                  snp.dat.pools = Ham.snp.dat.pools,
                  n.in.pools = 2,
                  snp.error.assumed = 0.01,
                  snp.param.indiv = Ham.snp.param.indiv,
                  snp.param.pools = Ham.snp.param.pools,                  
                  fams = Ham.fams,
                  fam.set.combns = Ham.fam.set.combns,
                  fam.set.combns.by.pool = Ham.fam.set.combns.by.pool)  
                  
#Assign parentage using the discrete maximum likelihood method 
#(discrete.method = "assigned.genos")
parent.assign.fun(method= "Discrete",
                  discrete.method = "assigned.genos",
                  snp.dat.indiv = Ham.snp.dat.indiv, 
                  snp.dat.pools = Ham.snp.dat.pools,
                  n.in.pools = 2,
                  snp.error.assumed = 0.01,
                  fams = Ham.fams,
                  fam.set.combns = Ham.fam.set.combns,
                  fam.set.combns.by.pool = Ham.fam.set.combns.by.pool)    
  
#Assign parentage using the exclusion method 
#(discrete.method = "geno.probs")
parent.assign.fun(method= "Exclusion",
                  discrete.method = "geno.probs",
                  threshold.indiv = 0.98,         
                  threshold.pools = 0.98,         
                  snp.dat.indiv = Ham.snp.dat.indiv, 
                  snp.dat.pools = Ham.snp.dat.pools,
                  n.in.pools = 2,
                  snp.error.assumed = 0.01,
                  snp.param.indiv = Ham.snp.param.indiv,
                  snp.param.pools = Ham.snp.param.pools,                  
                  fams = Ham.fams,
                  fam.set.combns = Ham.fam.set.combns,
                  fam.set.combns.by.pool = Ham.fam.set.combns.by.pool)   
                                  
#Assign parentage using the exclusion method 
#(discrete.method = "assigned.genos")
parent.assign.fun(method= "Exclusion",
                  discrete.method = "assigned.genos",
                  snp.dat.indiv = Ham.snp.dat.indiv, 
                  snp.dat.pools = Ham.snp.dat.pools,
                  n.in.pools = 2,
                  fams = Ham.fams,
                  fam.set.combns = Ham.fam.set.combns,
                  fam.set.combns.by.pool = Ham.fam.set.combns.by.pool)   
 
#Assign parentage using multiple methods
#(discrete.method = "geno.probs")
parent.assign.fun(method = c("Least_squares", "Quantitative", "Discrete", "Exclusion"),
                  beta.min.ss = TRUE, 
                  discrete.method = "geno.probs",
                  threshold.indiv = 0.98,         
                  threshold.pools = 0.98,         
                  snp.dat.indiv = Ham.snp.dat.indiv, 
                  snp.dat.pools = Ham.snp.dat.pools,
                  n.in.pools = 2,
                  snp.error.assumed = 0.01,
                  snp.param.indiv = Ham.snp.param.indiv,
                  snp.param.pools = Ham.snp.param.pools,                  
                  fams = Ham.fams,
                  fam.set.combns = Ham.fam.set.combns,
                  fam.set.combns.by.pool = Ham.fam.set.combns.by.pool)  
                  

mghamilton/SNPpools documentation built on Feb. 13, 2021, 12:52 a.m.