convert_plink: Converts PLINK binary format to SNP formatted file.

Description Usage Arguments Details Assigning new IDs Filtering loci or samples Fragment file names References See Also

View source: R/converters.R

Description

Converts PLINK binary format to SNP formatted file.

Usage

1
2
3
4
5
convert_plink(bfile, outfn, na = 9, newID = 0, nlines = NULL,
  fam = NULL, bim = NULL, bed = NULL, countminor = TRUE, maf = 0,
  chr = NULL, extract = NULL, exclude = NULL, extract_chr = NULL,
  keep = NULL, remove = NULL, method = "simple", fragments = "chr",
  remerge = TRUE, fragmentfns = NULL)

Arguments

bfile

Filename of PLINK binary files, i.e. without extension.

outfn

Filename of new file.

na

Missing value.

newID

Integer scalar (default 0) for automatically assigning new IDs. See description for more.

nlines

Number of lines to process.

fam

If binary files have different stems, specify each of them with fam, bim, bed, and set bfile=NULL.

bim

See fam.

bed

See fam.

countminor

Logical: Should the output count minor allele (default), or major allele as plink --recode A.

maf

Numeric, restrict SNPs to SNPs with this frequency.

chr

Vector of chromosomes to limit output to.

extract

Extract only these SNPs, see Details.

exclude

Do not extract these SNPs, see Details.

extract_chr

Extract only these chromosomes, see Details.

keep

Keep only these samples, see Details.

remove

Removes these samples from output, see Details.

method

Character, which of following methods to use: simple, lowmem, or drymem. See Details.

fragments

"chr" or integer vector. Only used when method='lowmem'.

remerge

Logical, whether to re-merge fragmented blocks. Only used when method='lowmem'.

fragmentfns

Character vector or function for producing filenames.

Details

method simple stores entire genotype matrix in memory, as PLINK binary files are stored in locus-major mode, i.e. first m bits store first locus for all n animals. Since we are interested in writing out all m loci for each animal, for efficiency we need to read the entire file. lowmem breaks the loci into smaller chunks (e.g. by chromosome), writes each chunk to a file, and merges them back as with cbind_SNPs. dryrun does not call the Fortran subroutine, but returns the treated arguments that would have been sent to the subroutine.

For method='lowmem' use argument fragment to indicate how the loci are subdivided. When fragment='chr' (case unsensitive), loci are split according to 1st column of .bim file. If fragment is a scalar integer, loci are split into this number of blocks. If an integer vector of same length as ncol, it directly specifies which block a locus is sent to. max(fragment) specifies the number of blocks.

Assigning new IDs

The new integer IDs can be supplied. If not, they will be made for you. newID may be an integer vector and will be used as is. If data.frame with columns famID, sampID, and newID, they will be reordered to match input file.

Filtering loci or samples

Filters on loci or samples can be employed in a number of ways; filtering on loci and samples are handled independently. Inclusion criteria (extract and keep) reduces the output to only those loci or samples that pass the criteria. Exclusion criteria (exclude and remove) are applied after inclusion criteria, and reduces the output further.

extract and exclude can be any combination of:

Logical

Vector of same length as loci in input file.

Integer or numeric

Indicates positional which loci to include or exclude. Numeric vectors are coerced to integer vectors.

Character

Matched against probe IDs, i.e. 2nd column of .bim file.

For restricting the output to certain chromosomes, use extract_chr. The output is the intersect of extract and exctract_chr.

keep and remove are as exctract and exclude above, can be a combination of, and can additionally be:

Character

Matched against both famID or sampID, i.e. 1st and 2nd column of .fam file.

List with named elements famID and/or sampID

The named elements are matched against, respectively, the 1st and 2nd column of the .fam file.

Fragment file names

The argument fragmentfns is used for method 'lowmem', providing filenames (absolute or relative) for producing the final converted files and intermediate .bim files. When remerge=TRUE, the argument outfn is ignored.

fragmentfns defaults to temporary files, created with tempfile. If a character vector, the first $n_f$ elements are filenames for $n_f$ fragments (e.g. chromosomes). The following $n_f + 1 ... 2 n_f$ elements are for the intermediate .bim files. The vector is automatically padding with temporary files to the required length.

If fragmentfns is a function, it will be called with 0, 1, or 2 arguments. The first argument is a running number for the fragments, the second is the maximum number of fragments.

References

See Also

convert_plink is a direct conversion that does not rely on PLINK. See the alternate convert_plinkA which re-formats the output from plink --recode A.


stefanedwards/Siccuracy documentation built on Dec. 14, 2017, 7:41 p.m.