struct2geno: Conversion from the STRUCTURE format to the geno format.

Description Usage Arguments Value Author(s) See Also Examples

View source: R/struct2geno.R

Description

The function converts a multiallelic genotype file in the STRUCTURE format into a file in the 'geno' for snmf and the 'lfmm' format for lfmm.

Usage

1
struct2geno (input.file, ploidy, FORMAT, extra.row, extra.column)

Arguments

input.file

A character string. A path to a STRUCTURE or a TESS input file of multiallelic markers (eg, microsatellites) for haploid or diploid individuals. Missing data must be encoded as "-9" or as any negative value. Individual genotypes are encoded using either one or two rows of data.

ploidy

An integer value (1 or 2). Value 2 for diploids and 1 for haploids.

FORMAT

An integer value equal to 1 for markers encoded using one row of data for each individual, and 2 for markers encoded using two rows of data for each individual.

extra.row

An integer value indicating the number of extra rows in the header of the input file (eg, marker ids).

extra.column

an integer value indicating the number of extra columns in the input file. Extra columns can include individual ids, pop ids, geographic coordinates, etc.

Value

NULL. Output files in the 'geno' and the 'lfmm' format record individual genotypes for each allele at each marker.

Author(s)

Olivier Francois

See Also

lfmm.data geno lfmm snmf

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
### Example of conversion from a STRUCTURE format ###
### Artificial data with 10 diploid individuals and 10 STR markers 
### FORMAT = 1
### Input file: 'dat.str'

dat.str  <- matrix(sample(c(101:105,-9), 
                  200, prob = c(rep(1,5), 0.1),
                  replace = TRUE), 
                  nrow = 10, ncol = 20)
write.table(dat.str, 
            file = "dat.str", 
            col.names = FALSE, 
            row.names = FALSE, 
            quote = FALSE)

### Conversion 
struct2geno("dat.str", ploidy = 2, FORMAT = 1)

### snmf run and barplot
s  <- snmf("dat.str.geno", K = 2, project = "new")
barchart(s, K = 2, run = 1, xlab = "Individuals")

Example output

Input file in the STRUCTURE format. The genotypic matrix has 10 individuals and 10 markers. 
The number of extra rows is 0 and the number of extra columns is 0 .
Missing alleles are encoded as -9 , converted as 9.
Output files: dat.str.geno  .lfmm. 
The project is saved into :
 dat.str.snmfProject 

To load the project, use:
 project = load.snmfProject("dat.str.snmfProject")

To remove the project, use:
 remove.snmfProject("dat.str.snmfProject")

[1] "*************************************"
[1] "* sNMF K = 2  repetition 1      *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)             10
        -L (number of loci)                    48
        -K (number of ancestral pops)          2
        -x (input file)                        /work/tmp/dat.str.geno
        -q (individual admixture file)         /work/tmp/dat.str.snmf/K2/run1/dat.str_r1.2.Q
        -g (ancestral frequencies file)        /work/tmp/dat.str.snmf/K2/run1/dat.str_r1.2.G
        -i (number max of iterations)          200
        -a (regularization parameter)          10
        -s (seed random init)                  94313925559369
        -e (tolerance error)                   1E-05
        -p (number of processes)               1
        - diploid

Read genotype file /work/tmp/dat.str.geno:		OK.


Main algorithm:
	[                                                                           ]
	[==================]
Number of iterations: 47

Least-square error: 173.639311
Write individual ancestry coefficient file /work/tmp/dat.str.snmf/K2/run1/dat.str_r1.2.Q:		OK.
Write ancestral allele frequency coefficient file /work/tmp/dat.str.snmf/K2/run1/dat.str_r1.2.G:	OK.

The project is saved into :
 dat.str.snmfProject 

To load the project, use:
 project = load.snmfProject("dat.str.snmfProject")

To remove the project, use:
 remove.snmfProject("dat.str.snmfProject")

$order
 [1]  1  2  3  6  8  4  5  7  9 10

LEA documentation built on Nov. 8, 2020, 8:19 p.m.