pre8.split.train.test: Split dataset into TRAIN and TEST files
In genMOSSplus: Application of MOSS algorithm to genome-wide association study (GWAS)

Description Usage Arguments Details Value Author(s) See Also Examples

Splits the data file named file.name in dir.file, into TRAIN and TEST files, based on the percentage train.percent - how many percent of the data should go into TRAIN file.

1 2	pre8.split.train.test(file.name, dir.file, dir.out, train.percent = 80, separ = "\t", index.prefix = "index", file.has.ext = TRUE, resample = FALSE)

`file.name`	The name of the geno file. This file is expected to have the disease status as its last column (1 for CASE and 0 for CONTROL).
`dir.file`	The name of directory where `file.name` can be found.
`dir.out`	The name of directory into which the TRAIN and TEST output files should go.
`train.percent`	The pecentage (0 to 100) of what portion of data (rows) should go into the TRAIN file; the rest will be in TEST file. Ex: for 1000 entries, if `train.percent=80`, then 800 entries will appear in <file.name>.test, and 200 entries will go into <file.name>.train.
`separ`	The separator used in the `file.name` to separate entries.
`index.prefix`	The name of the index file to use for the separation of train from test entries. This file may already exist in `dir.out` (if it has been created by previous runs of this program).
`file.has.ext`	Flag whether or not `file.name` has a filename extension (ex. ".txt", ".ped", ".mlgeno").
`resample`	Additional file beginning with the name `index.prefix` will be saved in the `dir.out` directory for the given `train.percent`. This file will contain indices that correspond to entries taken into the TRAIN file. If `resample`=FALSE, then all subsequent runs of this function on other files (for example for different chromosomes on the same dataset) with the same `train.percent` will use that saved file. This is to make sure that the same individuals go into TRAIN file, across all chromosomes. If `resample`=TRUE, then new random resampling will take place and new index file will be generated and saved to the `dir.out` directory; note, in this case the entries generated by this file will no longer correspond to entries generated by previous runs for previous index files; so for consistency, re-run all chromosomes with resample flag set to FALSE.

Splits the data file named file.name in dir.file, into TRAIN and TEST files, based on the percentage train.percent - how many percent of the data should go into TRAIN file.

The file file.name is expected to have last column represent CASE and CONTROL; this is necessary to make sure that train.percent of CASE and train.percent of CONTROL entries go into TRAIN file, to have even sample of both types of entries. If the data is saved in many files (for example one file per chromosome), this function is designed to first randomly sample the individuals for the TRAIN file for the first file it is run on. Then it uses this sampling for all other chromosomes on subsequent runs (if resample=FALSE), such that individuals in TRAIN file correspond to one another across all chromosome files (same holds for TEST files). The index file is also useful for processing familyl .fam file after the data has been split.

The following files will be output:

 - <file.name>.train.<train.percent>.<ext> - the output TRAIN file containing 
      train.percent percent of the original data; 
      will appear in dir.out directory.
      * <file.name> here is the name without extension;
      * <ext> is the extension part of <file.name> (i.e. the section that 
          follows the last "." symbol)
      * <train.percent> is specifying the percentage that was used to generate 
          the file.
 - <file.name>.test.<train.percent>.<ext> - the entries for TEST file, containing
      the remaining (100 - train.percent) data. Similar to the TRAIN file above.
 - <index.prefix>.<train.percent>.txt - the file containing indicies of the 
      entries corresponding to TRAIN file, this file will be generated if it 
      does not already exist in dir.out, or if resample=TRUE.

`out$train`	The FULL name of the output TRAIN file
`out$test`	The FULL name of the output TEST file

Olia Vesselova

pre6.merge.genos, pre7.add.conf.var, pre8.split.train.test.batch

1	print("See the demo 'gendemo'.")

[1] "See the demo 'gendemo'."

genMOSSplus documentation built on May 1, 2019, 10:31 p.m.

genMOSSplus index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

genMOSSplus
Application of MOSS algorithm to genome-wide association study (GWAS)

pre8.split.train.test: Split dataset into TRAIN and TEST files
In genMOSSplus: Application of MOSS algorithm to genome-wide association study (GWAS)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Example output

Related to pre8.split.train.test in genMOSSplus...

R Package Documentation

Browse R Packages

We want your feedback!

genMOSSplus Application of MOSS algorithm to genome-wide association study (GWAS)

pre8.split.train.test: Split dataset into TRAIN and TEST files In genMOSSplus: Application of MOSS algorithm to genome-wide association study (GWAS)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Example output

Related to pre8.split.train.test in genMOSSplus...

R Package Documentation

Browse R Packages

We want your feedback!

genMOSSplus
Application of MOSS algorithm to genome-wide association study (GWAS)

pre8.split.train.test: Split dataset into TRAIN and TEST files
In genMOSSplus: Application of MOSS algorithm to genome-wide association study (GWAS)