Description Usage Arguments Details Value Author(s) See Also Examples
Splits the data file named file.name
in dir.file
, into TRAIN and TEST files, based on the percentage train.percent
- how many percent of the data should go into TRAIN file.
1 2 | pre8.split.train.test(file.name, dir.file, dir.out, train.percent = 80,
separ = "\t", index.prefix = "index", file.has.ext = TRUE, resample = FALSE)
|
file.name |
The name of the geno file. This file is expected to have the disease status as its last column (1 for CASE and 0 for CONTROL). |
dir.file |
The name of directory where |
dir.out |
The name of directory into which the TRAIN and TEST output files should go. |
train.percent |
The pecentage (0 to 100) of what portion of data (rows) should go into the TRAIN file; the rest will be in TEST file. Ex: for 1000 entries, if |
separ |
The separator used in the |
index.prefix |
The name of the index file to use for the separation of train from test entries. This file may already exist in |
file.has.ext |
Flag whether or not |
resample |
Additional file beginning with the name |
Splits the data file named file.name
in dir.file
, into TRAIN and TEST files, based on the percentage train.percent
- how many percent of the data should go into TRAIN file.
The file file.name
is expected to have last column represent CASE and CONTROL; this is necessary to make sure that train.percent
of CASE and train.percent
of CONTROL entries go into TRAIN file, to have even sample of both types of entries. If the data is saved in many files (for example one file per chromosome), this function is designed to first randomly sample the individuals for the TRAIN file for the first file it is run on. Then it uses this sampling for all other chromosomes on subsequent runs (if resample=FALSE), such that individuals in TRAIN file correspond to one another across all chromosome files (same holds for TEST files). The index file is also useful for processing familyl .fam file after the data has been split.
The following files will be output:
1 2 3 4 5 6 7 8 9 10 11 12 13 | - <file.name>.train.<train.percent>.<ext> - the output TRAIN file containing
train.percent percent of the original data;
will appear in dir.out directory.
* <file.name> here is the name without extension;
* <ext> is the extension part of <file.name> (i.e. the section that
follows the last "." symbol)
* <train.percent> is specifying the percentage that was used to generate
the file.
- <file.name>.test.<train.percent>.<ext> - the entries for TEST file, containing
the remaining (100 - train.percent) data. Similar to the TRAIN file above.
- <index.prefix>.<train.percent>.txt - the file containing indicies of the
entries corresponding to TRAIN file, this file will be generated if it
does not already exist in dir.out, or if resample=TRUE.
|
out$train |
The FULL name of the output TRAIN file |
out$test |
The FULL name of the output TEST file |
Olia Vesselova
pre6.merge.genos
, pre7.add.conf.var
,
pre8.split.train.test.batch
1 | print("See the demo 'gendemo'.")
|
[1] "See the demo 'gendemo'."
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.