View source: R/process_plink.R
process_plink | R Documentation |
bigsnpr
packagePreprocess PLINK files using the bigsnpr
package
process_plink(
data_dir,
data_prefix,
rds_dir = data_dir,
rds_prefix,
logfile = NULL,
impute = TRUE,
impute_method = "mode",
id_var = "IID",
parallel = TRUE,
quiet = FALSE,
overwrite = FALSE,
...
)
data_dir |
The path to the bed/bim/fam data files, without a trailing "/" (e.g., use |
data_prefix |
The prefix (as a character string) of the bed/fam data files (e.g., |
rds_dir |
The path to the directory in which you want to create the new '.rds' and '.bk' files. Defaults to |
rds_prefix |
String specifying the user's preferred filename for the to-be-created .rds file (will be create insie |
logfile |
Optional: the name (character string) of the prefix of the logfile to be written in 'rds_dir'. Default to NULL (no log file written). Note: if you supply a file path in this argument, it will error out with a "file not found" error. Only supply the string; e.g., if you want my_log.log, supply 'my_log', the my_log.log file will appear in rds_dir. |
impute |
Logical: should data be imputed? Default to TRUE. |
impute_method |
If 'impute' = TRUE, this argument will specify the kind of imputation desired. Options are:
* mode (default): Imputes the most frequent call. See |
id_var |
String specifying which column of the PLINK |
parallel |
Logical: should the computations within this function be run in parallel? Defaults to TRUE. See |
quiet |
Logical: should messages to be printed to the console be silenced? Defaults to FALSE |
overwrite |
Logical: if existing |
... |
Optional: additional arguments to |
Three files are created in the location specified by rds_dir
:
'rds_prefix.rds': This is a list with three items:
(1) X
: the filebacked bigmemory::big.matrix
object pointing to the imputed genotype data.
This matrix has type 'double', which is important for downstream operations in create_design()
(2) map
: a data.frame with the PLINK 'bim' data (i.e., the variant information)
(3) fam
: a data.frame with the PLINK 'fam' data (i.e., the pedigree information)
'prefix.bk': This is the backingfile that stores the numeric data of the genotype matrix
'rds_prefix.desc'" This is the description file, as needed by the
Note that process_plink()
need only be run once for a given set of PLINK
files; in subsequent data analysis/scripts, get_data()
will access the '.rds' file.
For an example, see vignette on processing PLINK files
The filepath to the '.rds' object created; see details for explanation.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.