testFuns: Query and database build testing functions

testFunsR Documentation

Query and database build testing functions

Description

This group of testing fuctions can be used to test the entirety of sitadela annotation building capabilities from known resources or custom GTF/GFF files. They are useful for testing the particular annotation the user wishes to build prior to building the final database, in order to avoid failures during the longer build. In all cases, useful messages are also displayed.

Usage

    testEnsembl(level = c("normal", "long", "short"),
        versioned = FALSE)
    testEnsemblSimple(orgs, types, versioned = FALSE)

    testUcsc(orgs, refdbs, types, versioned = FALSE)
    testUcscAll()
    
    testUcscUtr(orgs, refdbs, versioned = FALSE)
    testUcscUtrAll()
    
    testCustomGtf(gtf)
    
    testKnownBuild(org, refdb, ver = NULL, tv = FALSE)
    testCustomBuild(gtf, metadata)
    

Arguments

level

how many Ensembl versions from the supported organisms should be checked. It can be "normal" (default), "long" or "short". See also Details.

orgs

a vector of sitadela supported organisms. See also addAnnotation.

refdbs

a vector of sitadela supported annotation. sources. See also addAnnotation.

versioned

use versioned genes/transcripts where available.

types

a vector of sitadela annotation types. See also loadAnnotation.

org

as orgs above but only one organism.

refdb

as refdbs above but only one source.

ver

specific annotation version, see also addAnnotation.

tv

retrieve versioned genes and transcripts when possible, see also addAnnotation.

gtf

a valid GTF or GFF file.

metadata

additional information on the contents of GTF/GFF file. See also addCustomAnnotation.

Details

Regarding testEnsembl and its arguments, when level="normal", only the last one or two (depending on availability with Biomart) supported Ensembl versions are checked for fetching availability. If level="long", all available versions are checked for fetching availability (use with care, it can run for some time!). If level="short", only the last version of each supported organism is checked. Simpler tests with Ensembl (single organisms, types) can be performed with testEnsemblSimple. It will use only the latest version for the asked organism(s).

Regarding testUcsc, it can be used to test the queries used with the UCSC databases for a given organism and database. testUcscAll will test queries for all supported organisms and databases and may take a while to finish.

Similarly, testUcscUtr and testUcscUtrAll will test the queries and building of 3' UTR regions form UCSC databases. 3' UTR constructing is not part of the other UCSC testing functions as the process is different and may be tested only in Unix/Linux machines.

The function testCustomGtf will simply test whether the provided GTF/GFF file can be parsed and used to extract the sitadela annotation types. If this is not possible (rarely), this test will fail. If you wish to test complete database building with a custom GTF/GFF file, use testCustomBuild.

Finally, testKnownBuild will test database building and querying (add/remove annotation) for a single organism.

Value

This group of functions return either a vector of logical values showing success or failure of conducted tests, or a list of test failure reasons or NULL if all tests are successful. Specifically, testKnownBuild and testCustomBuild) return logicals while all the rest return NULL if tests are successful or a list of failure reasons (and the respective test) otherwise.

Author(s)

Panagiotis Moulos

Examples

    # Test a dummy GTF file
    gtf <- file.path(system.file(package="sitadela"),
        "dummy.gtf.gz")
    chromInfo <- data.frame(length=c(1000L,2000L,1500L),
        row.names=c("A","B","C"))
    metadata=list(
        organism="dummy",
        source="dummy_db",
        version=1,
        chromInfo=chromInfo
    )
    
    testResult <- testCustomBuild(gtf,metadata)
    # For this case, just testResult <- testCustomBuild()
    # would also work
    
    # More real tests
    if (require(RMySQL))
        f <- testUcsc("hg19","refseq","gene",TRUE)
    
    # Test a complete build for Ensembl mm9
    # testResult <- testKnownBuild()
    
    # Test a complete build for UCSC dm6
    # testResult <- testKnownBuild("dm6","ucsc")

pmoulos/sitadela documentation built on May 19, 2024, 3:52 a.m.