startSpiderSeqR: Prepare the environment to run SpiderSeqR
In ss-lab-cancerunit/SpiderSeqR: A Tool for Integration of Big Bio Data

Description Usage Arguments Details Value When to run startSpiderSeqR? Which options to choose from startSpiderSeqR menu options? Time and space requirements See Also Examples

View source: R/startSpiderSeqR.R

startSpiderSeqR prepares the environment so that other SpiderSeqR functions can be used. Run startSpiderSeqR every time you begin with a clear environment and want to use any of the other SpiderSeqR functions. In particular, the function does the following:

Ensure that SRAmetadb.sqlite is downloaded and up to date
Ensure that GEOmetadb.sqlite is downloaded and up to date
Ensure that SRR_GSM.sqlite is created and up to date
Set up database connections to the above files in the .GlobalEnv

startSpiderSeqR(
  path,
  general_expiry = 90,
  sra_expiry = NULL,
  geo_expiry = NULL,
  srr_gsm_expiry = NULL
)

`path`	Directory where database files will be stored
`general_expiry`	Maximum number of days since creation of all database files
`sra_expiry`	Maximum number of days since creation of SRAmetadb.sqlite file
`geo_expiry`	Maximum number of days since creation of GEOmetadb.sqlite file
`srr_gsm_expiry`	Maximum number of days since creation of SRR_GSM.sqlite file

Depending on the contents of the specified directory, startSpiderSeqR may download/create the database files. It will always create the database connections in the global environment.

It is necessary to fulfil all the above requirements; without them the package will not work. The first two database files are relatively large in size (32 GB and 9 GB at the time of writing), so please ensure that you have adequate internet connection and sufficient disk space.

It is recommended that the newest version of the databases is used. However, it is possible to ignore this requirement by manually setting the expiry date of the files (or by running default settings of the function and selecting the option not to download the newer files).

Nothing. If necessary, it may download/create database files. Sets up database connections in the global environment.

When to run `startSpiderSeqR`?

Run startSpiderSeqR every time you start with a fresh environment. There is no harm in running it too many times.

Which options to choose from `startSpiderSeqR` menu options?

Should you have any missing or outdated files in the specified directory, startSpiderSeqR will offer to download/create the files.

Please note that it is required to have all the three database files; if you choose not to download/create them, it will not be possible to run other SpiderSeqR functions.

However, it is not required for all the files to be up-to-date; startSpiderSeqR will suggest re-downloading/re-creating the files, but you can still use SpiderSeqR, even if you do not agree to re-download/re-create the files.

The following files must be present in order to run other SpiderSeqR functions; startSpiderSeqR will download/create them if necessary:

SRAmetadb.sqlite (from SRAdb package)
GEOmetadb.sqlite (from GEOmetadb package)
SRR_GSM.sqlite (custom-made at the time of running startSpiderSeqR)

They take approximately 40 GB of disk space (32 GB, 9 GB and 100 MB respectively at the time of writing), so please ensure that you have adequate internet connection and sufficient disk space. The two downloaded files are downloaded compressed, so the download size is smaller than the final file size. In order to save disk space, the previous files will be overwritten when downloading a newer version, so if you would like to keep them, please rename them before updating.

Running startSpiderSeqR for the first time will inevitably take some time, because large database files need to be downloaded and custom database created. However, once all the files are present, the function should take an order of seconds to complete (it is a matter of setting up database connections).

Other Setup functions: startSpiderSeqRDemo()

## Database files are stored (or will be downloaded) 
##    in the working directory
# startSpiderSeqR(path = getwd()) 

## Use the following if you would like to download 
##   the newest database files
# startSpiderSeqR(path = getwd(), general_expiry = 0) 

## Use the following if you have old database files
##   that you do not wish to re-download on this occasion
# startSpiderSeqR(path = getwd(), general_expiry = 365) 

## Use the following if you only wish to ignore 
##    an old SRAmetadb.sqlite file, 
##    but get reminders to re-download the other files
# startSpiderSeqR(path = getwd(), sra_expiry = 365) 

## Use the following if you would like to locate
##   the database files in a few directory levels above
# startSpiderSeqR(path = getwd(), recurse_levels = 4)