Description Usage Arguments Details Assigning new IDs Filtering loci or samples Fragment file names References See Also
Converts PLINK binary format to SNP formatted file.
1 2 3 4 5 |
bfile |
Filename of PLINK binary files, i.e. without extension. |
outfn |
Filename of new file. |
na |
Missing value. |
newID |
Integer scalar (default |
nlines |
Number of lines to process. |
fam |
If binary files have different stems, specify each of them with |
bim |
See |
bed |
See |
countminor |
Logical: Should the output count minor allele (default), or major allele as |
maf |
Numeric, restrict SNPs to SNPs with this frequency. |
chr |
Vector of chromosomes to limit output to. |
extract |
Extract only these SNPs, see Details. |
exclude |
Do not extract these SNPs, see Details. |
extract_chr |
Extract only these chromosomes, see Details. |
keep |
Keep only these samples, see Details. |
remove |
Removes these samples from output, see Details. |
method |
Character, which of following methods to use: |
fragments |
|
remerge |
Logical, whether to re-merge fragmented blocks. Only used when |
fragmentfns |
Character vector or function for producing filenames. |
method
simple stores entire genotype matrix in memory, as PLINK binary files are stored in locus-major mode,
i.e. first m bits store first locus for all n animals.
Since we are interested in writing out all m loci for each animal, for efficiency we need to read the entire file.
lowmem breaks the loci into smaller chunks (e.g. by chromosome), writes each chunk to a file, and merges them back as with cbind_SNPs
.
dryrun does not call the Fortran subroutine, but returns the treated arguments that would have been sent to the subroutine.
For method='lowmem'
use argument fragment
to indicate how the loci are subdivided.
When fragment='chr'
(case unsensitive), loci are split according to 1st column of .bim file.
If fragment
is a scalar integer, loci are split into this number of blocks.
If an integer vector of same length as ncol
, it directly specifies which block a locus is sent to. max(fragment)
specifies the number of blocks.
The new integer IDs can be supplied. If not, they will be made for you.
newID
may be an integer vector and will be used as is.
If data.frame with columns famID
, sampID
, and newID
, they will be reordered to match input file.
Filters on loci or samples can be employed in a number of ways; filtering on loci and samples are handled independently.
Inclusion criteria (extract
and keep
) reduces the output to only those loci or samples that pass the criteria.
Exclusion criteria (exclude
and remove
) are applied after inclusion criteria, and reduces the output further.
extract
and exclude
can be any combination of:
Vector of same length as loci in input file.
Indicates positional which loci to include or exclude. Numeric vectors are coerced to integer vectors.
Matched against probe IDs, i.e. 2nd column of .bim file.
For restricting the output to certain chromosomes, use extract_chr
. The output is the intersect of extract
and exctract_chr
.
keep
and remove
are as exctract
and exclude
above, can be a combination of, and can additionally be:
Matched against both famID or sampID, i.e. 1st and 2nd column of .fam file.
famID
and/or sampID
The named elements are matched against, respectively, the 1st and 2nd column of the .fam file.
The argument fragmentfns
is used for method 'lowmem'
, providing filenames
(absolute or relative) for producing the final converted files and intermediate .bim files.
When remerge=TRUE
, the argument outfn
is ignored.
fragmentfns
defaults to temporary files, created with tempfile
.
If a character vector, the first $n_f$ elements are filenames for $n_f$ fragments (e.g. chromosomes).
The following $n_f + 1 ... 2 n_f$ elements are for the intermediate .bim files.
The vector is automatically padding with temporary files to the required length.
If fragmentfns
is a function, it will be called with 0, 1, or 2
arguments. The first argument is a running number for the fragments, the second
is the maximum number of fragments.
PLINK v. 1.07 BED file format: https://www.cog-genomics.org/plink/1.9/formats#bed
Shaun Purvell and Christopher Chang. PLINK v. 1.90 https://www.cog-genomics.org/plink2
Chang CC, Chow CC, Tellier LCAM, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4. doi: 10.1186/s13742-015-0047-8 link.
convert_plink
is a direct conversion that does not rely on PLINK.
See the alternate convert_plinkA
which re-formats the output from plink --recode A
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.