haplo.em: EM Computation of Haplotype Probabilities, with Progressive...

Description Usage Arguments Details Value Note See Also Examples

View source: R/haplo.em.q

Description

For genetic marker phenotypes measured on unrelated subjects, with linkage phase unknown, compute maximum likelihood estimates of haplotype probabilities. Because linkage phase is unknown, there may be more than one pair of haplotypes that are consistent with the oberved marker phenotypes, so posterior probabilities of pairs of haplotypes for each subject are also computed. Unlike the usual EM which attempts to enumerate all possible pairs of haplotypes before iterating over the EM steps, this "progressive insertion" algorithm progressively inserts batches of loci into haplotypes of growing lengths, runs the EM steps, trims off pairs of haplotypes per subject when the posterior probability of the pair is below a specified threshold, and then continues these insertion, EM, and trimming steps until all loci are inserted into the haplotype. The user can choose the batch size. If the batch size is chosen to be all loci, and the threshold for trimming is set to 0, then this algorithm reduces to the usual EM algorithm.

Usage

1
2
haplo.em(geno, locus.label=NA, miss.val=c(0, NA), weight, control= 
           haplo.em.control())

Arguments

geno

matrix of alleles, such that each locus has a pair of adjacent columns of alleles, and the order of columns corresponds to the order of loci on a chromosome. If there are K loci, then ncol(geno) = 2*K. Rows represent the alleles for each subject.

locus.label

vector of labels for loci.

miss.val

vector of values that represent missing alleles in geno.

weight

weights for observations (rows of geno matrix).

control

list of control parameters. The default is constructed by the function haplo.em.control. The default behavior of this function results in the following parameter settings: loci.insert.order=1:n.loci, insert.batch.size=min(4,n.loci), min.posterior= 0.0001, tol=0.00001, max.iter=500, random.start=0 (no random start), iseed=NULL (no saved seed to start random start), verbose=0 (no printout during EM iterations). See haplo.em.control for more details.

Details

The basis of this progressive insertion algorithm is from the sofware snphap by David Clayton. Although some of the features and control parameters of this S-PLUS version are modeled after snphap, there are substantial differences, such as extension to allow for more than two alleles per locus, and some other nuances on how the alogrithm is implemented.

Value

list with components:

converge

indicator of convergence of the EM algorithm (1 = converge, 0 = failed).

lnlike

value of lnlike at last EM iteration (maximum lnlike if converged).

lnlike.noLD

value of lnlike under the null of linkage equilibrium at all loci.

lr

likelihood ratio statistic to test the final lnlike against the lnlike that assumes complete linkage equilibrium among all loci (i.e., haplotype frequencies are products of allele frequencies).

df.lr

degrees of freedom for likelihood ratio statistic. The df for the unconstrained final model is the number of non-zero haplotype frequencies minus 1, and the df for the null model of complete linkage equilibrium is the sum, over all loci, of (number of alleles - 1). The df for the lr statistic is df[unconstrained] - df[null]. This can result in negative df, if many haplotypes are estimated to have zero frequency, or if a large amount of trimming occurs, when using large values of min.posterior in the list of control parameters.

hap.prob

vector of mle's of haplotype probabilities. The ith element of hap.prob corresponds to the ith row of haplotype.

locus.label

vector of labels for loci, of length K (see definition of input values).

subj.id

vector of id's for subjects used in the analysis, based on row number of input geno matrix. If subjects are removed, then their id will be missing from subj.id.

rows.rem

now defunct, but set equal to a vector of length 0, to be compatible with other functions that check for rows.rem.

indx.subj

vector for row index of subjects after expanding to all possible pairs of haplotypes for each person. If indx.subj=i, then i is the ith row of geno. If the ith subject has n possible pairs of haplotypes that correspond to their marker genotype, then i is repeated n times.

nreps

vector for the count of haplotype pairs that map to each subject's marker genotypes.

max.pairs

vector of maximum number of pairs of haplotypes per subject that are consistent with their marker data in the matrix geno. The length of max.pairs = nrow(geno). This vector is computed by geno.count.pairs.

hap1code

vector of codes for each subject's first haplotype. The values in hap1code are the row numbers of the unique haplotypes in the returned matrix haplotype.

hap2code

similar to hap1code, but for each subject's second haplotype.

post

vector of posterior probabilities of pairs of haplotypes for a person, given their marker phenotypes.

haplotype

matrix of unique haplotypes. Each row represents a unique haplotype, and the number of columns is the number of loci.

control

list of control parameters for algorithm. See haplo.em.control

Note

Sorted order of haplotypes with character alleles is system-dependent, and can be controlled via the LC_COLLATE locale environment variable. Different locale settings can cause results to be non-reproducible even when controlling the random seed.

See Also

setupGeno, haplo.em.control

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(hla.demo)
attach(hla.demo)
geno <- hla.demo[,c(17,18,21:24)]
label <-c("DQB","DRB","B")
keep <- !apply(is.na(geno) | geno==0, 1, any)

save.em.keep <- haplo.em(geno=geno[keep,], locus.label=label)

# warning: output will not exactly match

print.haplo.em(save.em.keep)

Example output

================================================================================ 
                                   Haplotypes                                    
================================================================================ 
    DQB DRB  B hap.freq
1    21   1  8  0.00235
2    21   2  7  0.00229
3    21   2 18  0.00229
4    21   3  8  0.10515
5    21   3 18  0.00462
6    21   3 35  0.00571
7    21   3 44  0.00363
8    21   3 49  0.00229
9    21   3 57  0.00240
10   21   3 70  0.00229
11   21   4 62  0.00459
12   21   7  7  0.01430
13   21   7 13  0.01086
14   21   7 18  0.00250
15   21   7 35  0.00236
16   21   7 44  0.02189
17   21   7 45  0.00229
18   21   7 50  0.00459
19   21   7 57  0.00229
20   21   7 62  0.00773
21   21   8 18  0.00229
22   21   8 63  0.00229
23   21   9 51  0.00229
24   21  10  8  0.00229
25   31   1 27  0.00459
26   31   2 14  0.00229
27   31   2 44  0.00230
28   31   3  8  0.00459
29   31   4  7  0.00000
30   31   4 13  0.00520
31   31   4 27  0.00499
32   31   4 35  0.00459
33   31   4 41  0.00229
34   31   4 44  0.02876
35   31   4 57  0.00229
36   31   4 60  0.00461
37   31   4 61  0.00229
38   31   7  7  0.00000
39   31   7 44  0.00459
40   31   7 61  0.00229
41   31   8 35  0.00229
42   31   8 39  0.00229
43   31   8 45  0.00229
44   31   8 52  0.00229
45   31   8 60  0.00459
46   31  11  7  0.00229
47   31  11 18  0.00459
48   31  11 27  0.00608
49   31  11 35  0.01725
50   31  11 37  0.00688
51   31  11 38  0.00688
52   31  11 44  0.01011
53   31  11 49  0.00459
54   31  11 51  0.01095
55   31  11 56  0.00229
56   31  11 60  0.00233
57   31  11 61  0.00459
58   31  11 62  0.00603
59   31  13  8  0.00469
60   31  13 14  0.00229
61   31  13 41  0.00459
62   31  13 57  0.00219
63   31  14 58  0.00229
64   31  14 63  0.00229
65   32   1  7  0.00229
66   32   1 35  0.00224
67   32   1 62  0.00000
68   32   2 44  0.00291
69   32   3 35  0.00235
70   32   3 51  0.00229
71   32   4  7  0.01690
72   32   4  8  0.00473
73   32   4 14  0.00459
74   32   4 35  0.00368
75   32   4 44  0.00241
76   32   4 45  0.00167
77   32   4 51  0.00229
78   32   4 55  0.00251
79   32   4 60  0.03092
80   32   4 62  0.02371
81   32   7 14  0.00229
82   32   7 39  0.00229
83   32   7 57  0.00459
84   32   8  7  0.00688
85   32   9 51  0.00229
86   32  10 39  0.00229
87   33   7  7  0.00459
88   33   7 57  0.00688
89   33   9  8  0.00229
90   33   9 46  0.00229
91   33   9 48  0.00229
92   33   9 60  0.00688
93   33  10 51  0.00229
94   42   4 35  0.00229
95   42   8 18  0.00229
96   42   8 55  0.00229
97   42   8 60  0.00229
98   51   1  8  0.00461
99   51   1 18  0.00238
100  51   1 27  0.01416
101  51   1 35  0.02969
102  51   1 39  0.00459
103  51   1 44  0.01743
104  51   1 47  0.00229
105  51   1 48  0.00229
106  51   1 51  0.00740
107  51   1 58  0.00229
108  51   1 60  0.00231
109  51   2 51  0.00229
110  51   4 27  0.00229
111  51   8  7  0.00459
112  51  10  7  0.00229
113  51  10 14  0.00229
114  51  10 18  0.00229
115  51  10 35  0.00229
116  51  10 37  0.00459
117  51  10 62  0.00000
118  51  14 56  0.00229
119  52   2 27  0.00229
120  52   2 42  0.00229
121  52   2 56  0.00459
122  52   2 62  0.00459
123  52   4 39  0.00229
124  52   8 60  0.00229
125  53   8 57  0.00229
126  53   9 51  0.00229
127  53  14 35  0.00229
128  53  14 38  0.00459
129  53  14 44  0.00229
130  53  14 51  0.00229
131  53  14 55  0.00459
132  61   2 44  0.00229
133  61   2 60  0.00229
134  61   2 61  0.00459
135  61   2 62  0.00229
136  61   4 52  0.00229
137  61   8 44  0.00229
138  61  13 39  0.00229
139  61  13 44  0.00229
140  62   2  7  0.05139
141  62   2  8  0.00462
142  62   2 18  0.01573
143  62   2 27  0.00459
144  62   2 35  0.00759
145  62   2 44  0.01369
146  62   2 57  0.00229
147  62   2 60  0.00498
148  62   4  7  0.00230
149  62   4 45  0.00291
150  62   7 57  0.00229
151  62   8 35  0.00229
152  62   8 37  0.00229
153  62  10 51  0.00229
154  62  11 55  0.00229
155  62  13 51  0.00229
156  62  14 51  0.00229
157  63   2  7  0.01346
158  63   2 35  0.00031
159  63   2 44  0.00229
160  63   4 48  0.00229
161  63   8 18  0.00229
162  63   8 58  0.00229
163  63  10 44  0.00229
164  63  13  7  0.01605
165  63  13 35  0.00000
166  63  13 38  0.00688
167  63  13 44  0.01606
168  63  13 55  0.00437
169  63  13 60  0.00558
170  63  13 62  0.00840
171  64   2 51  0.00229
172  64  11 38  0.00229
173  64  11 51  0.00000
174  64  13  7  0.00945
175  64  13 14  0.00459
176  64  13 35  0.00911
177  64  13 44  0.00236
178  64  13 51  0.00229
179  64  13 57  0.00000
180  64  13 60  0.00660
181  64  13 61  0.00000
182  64  13 63  0.00688
================================================================================ 
                                    Details                                      
================================================================================ 
lnlike =  -1837.219 
lr stat for no LD =  627.1025 , df =  125 , p-val =  0 

haplo.stats documentation built on May 29, 2017, 7:54 p.m.

Search within the haplo.stats package
Search all R packages, documentation and source code