crossEntropy: Cross-entropy criterion for snmf runs

Description Usage Arguments Value Author(s) See Also Examples

Description

Return the cross-entropy criterion for runs of snmfcwith K ancestral populations. The cross-entropy criterion is based on the prediction of masked genotypes to evaluate the fit of a model with K populations. The cross-entropy criterion helps choosing the number of ancestral populations or a best run for a fixed value of K. A smaller value of cross-entropy means a better run in terms of prediction capability. The cross-entropy criterion is computed by the snmf function when the entropy Boolean option is TRUE.

Usage

1
cross.entropy(object, K, run)

Arguments

object

A snmfProject object.

K

The number of ancestral populations.

run

A vector of run labels.

Value

res

A matrix containing the cross-entropy criterion for runs with K ancestral populations.

Author(s)

Eric Frichot

See Also

geno snmf G Q

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
### Example of analyses using snmf ###

# creation of a genotype file: genotypes.geno.
# The data contains 400 SNPs for 50 individuals.
data("tutorial")
write.geno(tutorial.R, "genotypes.geno")

################
# running snmf #
################

# Runs with K = 3 populations 
# cross-entropy is computed for 2 runs.
project = NULL
project = snmf("genotypes.geno", 
                K = 3, 
                entropy = TRUE, 
                repetitions = 2, 
                project = "new")

# get the cross-entropy for all runs for K = 3 
ce = cross.entropy(project, K = 3)

# get the cross-entropy for the 2nd run for K = 3
ce = cross.entropy(project, K = 3, run = 2)

Example output

[1] "genotypes.geno"
The project is saved into :
 genotypes.snmfProject 

To load the project, use:
 project = load.snmfProject("genotypes.snmfProject")

To remove the project, use:
 remove.snmfProject("genotypes.snmfProject")

[1] 1135602856
[1] "*************************************"
[1] "*          create.dataset            *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)                 50
        -L (number of loci)                        400
        -s (seed random init)                      1135602856
        -r (percentage of masked data)             0.05
        -x (genotype file in .geno format)         /work/tmp/genotypes.geno
        -o (output file in .geno format)           /work/tmp/genotypes.snmf/masked/genotypes_I.geno

 Write genotype file with masked data, /work/tmp/genotypes.snmf/masked/genotypes_I.geno:		OK.

[1] "*************************************"
[1] "* sNMF K = 3  repetition 1      *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)             50
        -L (number of loci)                    400
        -K (number of ancestral pops)          3
        -x (input file)                        /work/tmp/genotypes.snmf/masked/genotypes_I.geno
        -q (individual admixture file)         /work/tmp/genotypes.snmf/K3/run1/genotypes_r1.3.Q
        -g (ancestral frequencies file)        /work/tmp/genotypes.snmf/K3/run1/genotypes_r1.3.G
        -i (number max of iterations)          200
        -a (regularization parameter)          10
        -s (seed random init)                  215883967656
        -e (tolerance error)                   1E-05
        -p (number of processes)               1
        - diploid

Read genotype file /work/tmp/genotypes.snmf/masked/genotypes_I.geno:		OK.


Main algorithm:
	[                                                                           ]
	[==================]
Number of iterations: 48

Least-square error: 5714.510057
Write individual ancestry coefficient file /work/tmp/genotypes.snmf/K3/run1/genotypes_r1.3.Q:		OK.
Write ancestral allele frequency coefficient file /work/tmp/genotypes.snmf/K3/run1/genotypes_r1.3.G:	OK.

[1] "*************************************"
[1] "*    cross-entropy estimation       *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)         50
        -L (number of loci)                400
        -K (number of ancestral pops)      3
        -x (genotype file)                 /work/tmp/genotypes.geno
        -q (individual admixture)          /work/tmp/genotypes.snmf/K3/run1/genotypes_r1.3.Q
        -g (ancestral frequencies)         /work/tmp/genotypes.snmf/K3/run1/genotypes_r1.3.G
        -i (with masked genotypes)         /work/tmp/genotypes.snmf/masked/genotypes_I.geno
        - diploid

Cross-Entropy (all data):	 0.478363
Cross-Entropy (masked data):	 0.600713
The project is saved into :
 genotypes.snmfProject 

To load the project, use:
 project = load.snmfProject("genotypes.snmfProject")

To remove the project, use:
 remove.snmfProject("genotypes.snmfProject")

[1] 1260507948
[1] "*************************************"
[1] "*          create.dataset            *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)                 50
        -L (number of loci)                        400
        -s (seed random init)                      1260507948
        -r (percentage of masked data)             0.05
        -x (genotype file in .geno format)         /work/tmp/genotypes.geno
        -o (output file in .geno format)           /work/tmp/genotypes.snmf/masked/genotypes_I.geno

 Write genotype file with masked data, /work/tmp/genotypes.snmf/masked/genotypes_I.geno:		OK.

[1] "*************************************"
[1] "* sNMF K = 3  repetition 2      *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)             50
        -L (number of loci)                    400
        -K (number of ancestral pops)          3
        -x (input file)                        /work/tmp/genotypes.snmf/masked/genotypes_I.geno
        -q (individual admixture file)         /work/tmp/genotypes.snmf/K3/run2/genotypes_r2.3.Q
        -g (ancestral frequencies file)        /work/tmp/genotypes.snmf/K3/run2/genotypes_r2.3.G
        -i (number max of iterations)          200
        -a (regularization parameter)          10
        -s (seed random init)                  1260507948
        -e (tolerance error)                   1E-05
        -p (number of processes)               1
        - diploid

Read genotype file /work/tmp/genotypes.snmf/masked/genotypes_I.geno:		OK.


Main algorithm:
	[                                                                           ]
	[====================================]
Number of iterations: 95

Least-square error: 5733.362921
Write individual ancestry coefficient file /work/tmp/genotypes.snmf/K3/run2/genotypes_r2.3.Q:		OK.
Write ancestral allele frequency coefficient file /work/tmp/genotypes.snmf/K3/run2/genotypes_r2.3.G:	OK.

[1] "*************************************"
[1] "*    cross-entropy estimation       *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)         50
        -L (number of loci)                400
        -K (number of ancestral pops)      3
        -x (genotype file)                 /work/tmp/genotypes.geno
        -q (individual admixture)          /work/tmp/genotypes.snmf/K3/run2/genotypes_r2.3.Q
        -g (ancestral frequencies)         /work/tmp/genotypes.snmf/K3/run2/genotypes_r2.3.G
        -i (with masked genotypes)         /work/tmp/genotypes.snmf/masked/genotypes_I.geno
        - diploid

Cross-Entropy (all data):	 0.477734
Cross-Entropy (masked data):	 0.641464
The project is saved into :
 genotypes.snmfProject 

To load the project, use:
 project = load.snmfProject("genotypes.snmfProject")

To remove the project, use:
 remove.snmfProject("genotypes.snmfProject")

LEA documentation built on Nov. 8, 2020, 8:19 p.m.