Description Usage Arguments Value Author(s) Examples
View source: R/getbestchunksize.R
Reads in a small portion of the data and measures the amount of memory the portion occupies in R and then calculates the best size for each chunk based on available memory and additional overhead needed for calculations.
1 | getbestchunksize(filename, MemoryAllowed = 0.5, TestedRows = 1000, AdjFactor = 0.095, silent = TRUE)
|
filename |
The name of the file being chunked |
MemoryAllowed |
The maximum amount of memory,in gigabytes, that you want allowed by the R process on the current system or OS. The recommend setting is 0.5-1.0 Gb. Please see the CRAN website for inherent limits to memory on various versions of R. |
TestedRows |
Number of rows to read in for determining optimal chunksize. On thousand is set by default |
AdjFactor |
Adjustment factor to account for overhead of processes during fitting. Increase factor to increase the memory used. By default, the factor is 0.095. |
silent |
Set silent=TRUE to suppress most messages from function. |
Returns the optimal chunksize as the number of lines to read each iteration.
Alan Lee alanlee@stanfordalumni.org
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | #Get external data. For your own data skip this next line and replace all
#instance of SampleData with "YourFile.csv".
SampleData=system.file("extdata","SampleDataFile.csv", package = "allan")
#To get optimal chunksize for up to 1 Gb of allowable ram use for R while
#testing memory use by reading 1000 rows of current dataset and suppressing
#some output.
currentchunksize<-getbestchunksize(SampleData,MemoryAllowed=1 ,TestedRows=1000,silent=FALSE)
## The function is currently defined as
getbestchunksize<-function(filename,MemoryAllowed=0.5,TestedRows=1000,AdjFactor=0.095,silent=TRUE){
#Function that tests data size and adjusts memory for best chunking of large dataset
#This is done by reading in a number of rows(1000 by default)and then measuring the size of the memory
#used. Memory allwed is specified in Gb. The adjfactor is a factor used to adjust memory for overhead
#in the biglm fitting functions.
#get column names
columnnames<-names(read.csv(filename, nrows=2,header=TRUE))
#read in rows and test size
datapreview<-read.csv(filename, nrows=TestedRows,header=TRUE)
datamemsize<-object.size(datapreview)
optimalchunksize=floor(((MemoryAllowed*1000000000)/datamemsize[1])*TestedRows *AdjFactor)
if (silent!=TRUE){
print("Total memory usage for 1000 lines:")
print(datamemsize)
print("Chunksize for dataframe after adjustment factor:")
print(optimalchunksize)
}
return(optimalchunksize)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.