split_data_prep: Function to split a data.prep analysis object into multiple...

View source: R/split_data_prep.r

split_data_prepR Documentation

Function to split a data.prep analysis object into multiple smaller objects in preparation for parallelized analysis.

Description

Function to split a data.prep analysis object into multiple smaller objects in preparation for parallelized analysis.

Usage

split_data_prep(data.prep.object, splitBy, keepN)

Arguments

data.prep.object

Character string. Name of the data.prep object created by the data.prep function.

splitBy

Character string. Name of the column by which to split the data.prep object. Typically ‘INDLABEL’ for hybrid index estimation, or ‘locus’ for genomic cline estimation.

keepN

Numeric scalar. The number of test subjects desired in each sub file.

Details

split_data_prep splits the data.prep analysis object into multiple files, each with equal numbers of test subjects (except for the final file unless the total number of test subjects is a multiple of splitBy). A 'test subject' is e.g. ‘INDLABEL’ for hybrid index estimation on each individual, or ‘locus’ for genomic cline estimation on individual loci. The resulting objects are written to the working directory as csv files and removed from the workspace. These can then be loaded into R and used as the data.prep.object in downstream functions. csv files are quite memory-heavy but load quickly.

Value

No output is returned in the workspace.

Author(s)

Richard Ian Bailey richardianbailey@gmail.com

Examples


## Not run: 
#Make a set of data.prep files, one for each individual.
split_data_prep(
 data.prep.object=prepdata$data.prep,   #The data analysis table#
 splitBy="INDLABEL",                    #The planned test subject (usually "INDLABEL" for hybrid index estimation, "locus" for genomic cline estimation)#
 keepN=1                                #The number of test subjects you want in each resulting file#
)

#Make another set of files, one for each set of 10 loci.

split_data_prep(
 data.prep.object=prepdata$data.prep,   #The data analysis table#
 splitBy="locus",                       #The planned test subject (usually "INDLABEL" for hybrid index estimation, "locus" for genomic cline estimation)#
 keepN=10                               #The number of test subjects you want in each resulting file 
                                        #(the final file will contain fewer if the total is not a multiple of this number)#
)

#The entry for the 'splitBy' option will be included in the filename for each resulting csv file. 
#Therefore, to create a list of files for analysis e.g. of the 'locus' files above:
files=list.files(pattern="_locus_")

## End(Not run)

ribailey/gghybrid documentation built on Feb. 2, 2024, 12:53 a.m.