R/import-methods.R

Defines functions .returnImport .resource `.origResource<-` .origResource .localOrRemoteFile .importMTXSidecar .getFunction .formatToFileClass .alertImport

#' @name import
#' @inherit AcidGenerics::import
#' @note Updated 2023-12-15.
#'
#' @details
#' `import()` supports automatic loading of common file types, by wrapping
#' popular importer functions. It intentionally designed to be simple, with few
#' arguments. Remote URLs and compressed files are supported. If you need more
#' complex import settings, just call the wrapped importer directly instead.
#'
#' @section Row and column names:
#'
#' **Row names.** Row name handling has become an inconsistent mess in R because
#' of differential support in base R, tidyverse, data.table, and Bioconductor.
#' To maintain sanity, `import()` attempts to handle row names automatically.
#' The function checks for a `rowname` column in delimited data, and moves these
#' values into the object's row names, if supported by the return type (e.g.
#' `data.frame`, `DFrame`). Note that `tbl_df` (tibble) and `data.table`
#' intentionally do not support row names. When returning in this format, no
#' attempt to assign the `rowname` column into the return object's row names is
#' made. Note that `import()` is strict about this matching and only checks for
#' a `rowname` column, similar to the default syntax recommended in
#' `tibble::rownames_to_column()`. To disable this behavior, set `rownames =
#' FALSE`, and no attempt will be made to set the row names.
#'
#' **Column names.** `import()` assumes that delimited files always contain
#' column names. If you are working with a file that doesn't contain column
#' names, either set `colnames = FALSE` or pass the names in as a `character`
#' vector. It's strongly recommended to always define column names in a
#' supported file type.
#'
#' @section FASTA and FASTQ files:
#'
#' FASTA and FASTQ files are currently managed internally by the Biostrings
#' package. Refer to `readDNAStringSet` and `readRNAStringSet` for details.
#' Import of these files will return `DNAStringSet` or `RNAStringSet` depending
#' on the input, defined by `moleculeType` argument.
#'
#' @section General feature format (GFF, GTF):
#'
#' The GFF (General Feature Format) format consists of one line per feature,
#' each containing 9 columns of data, plus optional track definition lines. The
#' GTF (General Transfer Format) is identical to GFF version 2.
#'
#' See also:
#'
#' - [Ensembl spec](https://useast.ensembl.org/info/website/upload/gff.html)
#' - [GENCODE spec](https://www.gencodegenes.org/pages/data_format.html)
#'
#' @section Gene cluster text format (GCT):
#'
#' Refer to the [IGV website][] for details.
#'
#' [IGV website]: https://igv.org/doc/desktop/
#'
#' @section GSEA gene set files:
#'
#' Refer to the Broad Institute [GSEA wiki][] for details.
#'
#' [GSEA wiki]: https://goo.gl/3ZkDPb
#'
#' @section Matrix Market Exchange:
#'
#' Reading a Matrix Market Exchange file requires `ROWNAMES` and `COLNAMES`
#' sidecar files containing the corresponding row and column names of the sparse
#' matrix.
#'
#' @section bcbio-nextgen count matrix:
#'
#' [bcbio][] count matrix (e.g. generated from featureCounts) and related
#' sidecar files are natively supported.
#'
#' - `COUNTS`: Counts table (e.g. RNA-seq aligned counts).
#' - `COLNAMES`: Sidecar file containing column names.
#' - `ROWNAMES`: Sidecar file containing row names.
#'
#' [bcbio]: https://bcbio-nextgen.readthedocs.io/
#'
#' @section Denylisted extensions:
#'
#' These file formats are intentionally not supported:
#' `DOC`, `DOCX`, `PDF`, `PPT`, `PPTX`.
#'
#' @section Duplicate methods:
#'
#' `GMTFile` and `OBOFile` are also supported by BiocSet package.
#'
#' @inheritParams AcidRoxygen::params
#'
#' @param format `character(1)` or `NULL`.
#' An optional file format type, which can be used to override the file format
#' inferred from `con`. Only recommended for file and URL paths that don't
#' contain an extension.
#'
#' @param colnames `logical(1)` or `character`.
#' Automatically assign column names, using the first header row.
#' Applies to file types that return `data.frame` only.
#' Pass in a `character` vector to define the column names manually.
#'
#' @param comment `character(1)`.
#' Comment character to detect at beginning of line, which will skip when
#' parsing file. Use `""` to disable interpretation of comments, which is
#' particularly
#' useful when parsing lines.
#' *Applies to plain text delimited and source code lines only.*
#'
#' @param engine `character(1)`.
#' Engine (package) to use for import.
#' Currently supported:`"base"`, `"data.table"`, or `"readr"`.
#'
#' @param makeNames `function`.
#' Apply syntactic naming function to (column) names.
#' Function is never applied to row names, when they are defined in object.
#'
#' @param metadata `logical(1)`.
#' Slot useful metadata about the import into the object.
#'
#' @param moleculeType `character(1)`.
#' Molecule type, either DNA or RNA.
#' Most RNA-seq FASTQ files contain complementary DNA (cDNA) sequences, not
#' direct sequencing of the RNA molecules.
#'
#' @param nMax `integer(1)` or `Inf`.
#' Maximum number of lines to parse.
#' *Applies to plain text delimited, Excel, and source code lines only.*
#'
#' @param naStrings `character`.
#' Character strings to reformat as `NA`.
#' Refer to `pipette::naStrings` for defaults.
#'
#' @param quote `character(1)`.
#' The set of quoting characters.
#' To disable quoting altogether, use `quote = ""` (not generally recommended).
#' *Applies to plain text delimited files only.*
#'
#' @param removeBlank `logical(1)`.
#' Remove blank lines.
#' *Applies to source code lines*.
#'
#' @param return `character(1)`.
#' Object class to return.
#'
#' @param rownameCol `NULL`, `character(1)`, or `integer(1)`.
#' *Applies only when `rownames = TRUE`.*
#' Column name to use for row names assignment.
#' If left `NULL` (default), the function will call `matchRownameCol()`
#' internally to attempt to automatically match the row name column (e.g.
#' `"rowname"` or `"rn"`).
#' Otherwise, can manually define using a scalar argument, either the name
#' directly or position in the column names.
#'
#' @param rownames `logical(1)`.
#' Automatically assign row names, if `rowname` column is defined.
#' Applies to file types that return a data frame only.
#'
#' @param rownamesFile,colnamesFile `character(1)` or `NULL`.
#' Row names and/or column names sidecare file.
#' Applies primarily to MatrixMarket Exchange files (e.g. `MTXFile`).
#'
#' @param sheet `character(1)` or `integer(1)`.
#' Sheet to read. Either a string (the name of a sheet), or an integer (the
#' position of the sheet). Defaults to the first sheet.
#' *Applies to Excel Workbook, Google Sheet, or GraphPad Prism file.*
#'
#' @param skip `integer(1)`.
#' Number of lines to skip.
#' *Applies to delimited file (CSV, TSV), Excel Workbook, or lines.*
#'
#' @param stripWhitespace `logical(1)`.
#' Strip leading and/or trailing whitespace.
#' *Applies to source code lines*.
#'
#' @return Varies, depending on the file type (format):
#'
#' - **R data serialized** (`RDS`):
#' *variable*.\cr
#' Currently recommend over RDA, if possible.\cr
#' Imported by `readRDS()`.
#' - **R data** (`RDA`, `RDATA`):
#' *variable*.\cr
#' Must contain a single object.
#' Doesn't require internal object name to match, unlike `loadData()`.\cr
#' Imported by `load()`.
#' - **Plain text delimited** (`CSV`, `TSV`, `TXT`):
#' `data.frame`.\cr
#' Data separated by commas, tabs, or visual spaces.\cr
#' Note that TXT structure is amgibuous and actively discouraged.\cr
#' Refer to `Data frame return` section for details on how to change the
#' default return type to `DFrame`, `tbl_df` or `data.table`.\cr
#' Imported by `readr::read_delim()` by default.
#' - **Excel workbook** (`XLSB`, `XLSX`):
#' `data.frame`.\cr
#' Resave in plain text delimited format instead, if possible.\cr
#' Imported by `readxl::read_excel()`.
#' - **Legacy Excel workbook (pre-2007)** (`XLS`):
#' `data.frame`.\cr
#' Resave in plain text delimited format instead, if possible.\cr
#' Note that import of files in this format is slow.\cr
#' Imported by `readxl::read_excel()`.
#' - **GraphPad Prism project** (`PZFX`):
#' `data.frame`.\cr
#' Experimental. Consider resaving in CSV format instead.\cr
#' Imported by `pzfx::read_pzfx()`.
#' - **General feature format** (`GFF`, `GFF1`, `GFF2`, `GFF3`, `GTF`):
#' `GRanges`.\cr
#' Imported by `rtracklayer::import()`.
#' - **Gene Ontology (GO) annotation file** (`GAF`):
#' `data.frame` with 17 columns.\cr
#' Imported by `base::read.table()`.
#' - **MatrixMarket exchange sparse matrix** (`MTX`):
#' `sparseMatrix`.\cr
#' Imported by `Matrix::readMM()`.
#' - **Sequence alignment/map format (`SAM`, `BAM`, `CRAM`):
#' `list`.\cr
#' Imported by `Rsamtools::scanBam`.
#' - **Mutation annotation format** (`MAF`):
#' `MAF`.\cr
#' Imported by `maftools::read.maf()`.
#' - **Variant annotation format** (`VCF`, `BCF`):
#' `list`.\cr
#' Imported by `Rsamtools::scanBcf`.
#' - **Gene cluster text** (`GCT`):
#' `matrix` or `data.frame`.\cr
#' Imported by `readr::read_delim()`.
#' - **Gene sets (for GSEA)** (`GMT`, `GMX`):
#' `character`.
#' - **Browser extensible data** (`BED`, `BED15`, `BEDGRAPH`, `BEDPE`):
#' `GRanges`.\cr
#' Imported by `rtracklayer::import()`.
#' - **ChIP-seq peaks** (`BROADPEAK`, `NARROWPEAK`):
#' `GRanges`.\cr
#' Imported by `rtracklayer::import()`.
#' - **Wiggle track format** (`BIGWIG`, `BW`, `WIG`):
#' `GRanges`.\cr
#' Imported by `rtracklayer::import()`.
#' - **JSON serialization data** (`JSON`):
#' `list`.\cr
#' Imported by `jsonlite::read_json()`.
#' - **YAML serialization data** (`YAML`, `YML`):
#' `list`.\cr
#' Imported by `yaml::yaml.load_file()`.
#' - **Lines** (`LOG`, `MD`, `PY`, `R`, `RMD`, `SH`):
#' `character`.\cr
#' Source code or log files.\cr
#' Imported by `readr::read_delim()` by default.
#' - **Infrequently used rio-compatible formats** (`ARFF`, `DBF`, `DIF`, `DTA`,
#' `MAT`, `MTP`, `ODS`, `POR`, `SAS7BDAT`, `SAV`, `SYD`, `REC`, `XPT`):
#' *variable*.\cr
#' Imported by `rio::import()`.
#'
#' @seealso
#' Packages:
#'
#' - [BiocIO](https://bioconductor.org/packages/BiocIO/).
#' - [data.table](https://r-datatable.com/).
#' - [readr](https://readr.tidyverse.org/).
#' - [vroom](https://vroom.r-lib.org/).
#' - [readxl](https://readxl.tidyverse.org/).
#' - [rtracklayer](https://bioconductor.org/packages/rtracklayer/).
#' - [maftools](https://bioconductor.org/packages/maftools/).
#' - [Rsamtools](https://bioconductor.org/packages/Rsamtools/).
#' - [rio](https://cran.r-project.org/package=rio).
#'
#' Import functions:
#'
#' - `BiocIO::import()`.
#' - `Rsamtools::scanBam()`.
#' - `Rsamtools::scanBcf()`.
#' - `data.table::fread()`.
#' - `maftools::read.maf()`.
#' - `readr::read_delim()`.
#' - `rio::import()`.
#' - `rtracklayer::import()`.
#' - `utils::read.table()`.
#' - `vroom::vroom()`.
#'
#' @examples
#' con <- system.file("extdata", "example.csv", package = "pipette")
#'
#' ## Row and column names enabled.
#' x <- import(con = con)
#' print(head(x))
#'
#' ## Row and column names disabled.
#' x <- import(con = con, rownames = FALSE, colnames = FALSE)
#' print(head(x))
NULL



## Internal functions ==========================================================

#' Inform the user about start of file import
#'
#' @note Updated 2023-09-20.
#' @noRd
.alertImport <-
    function(con,
             whatPkg,
             whatFun) {
        assert(
            is(con, "PipetteFile"),
            isString(whatPkg),
            isString(whatFun)
        )
        file <- .origResource(con)
        if (is.null(file)) {
            file <- .resource(con)
        }
        fileType <- ifelse(test = isAUrl(file), yes = "url", no = "file")
        ## Handle edge case of cleaning Google Sheets URL.
        if (identical(fileType, "url")) {
            file <- sub(pattern = "\\#.+$", replacement = "", x = file)
        }
        alert(sprintf(
            "Importing {.%s %s} using {.pkg %s}::{.fun %s}.",
            fileType, file,
            whatPkg, whatFun
        ))
        invisible(TRUE)
    }



#' Map file format extension to corresponding S4 file class
#'
#' @note Updated 2023-12-13.
#' @noRd
.formatToFileClass <-
    function(format) {
        class <- switch(
            EXPR = sub(
                pattern = compressExtPattern,
                replacement = "",
                x = tolower(format),
                fixed = FALSE,
                ignore.case = FALSE
            ),
            "arff" = "Rio",
            "bam" = "Bam",
            "bash" = "Lines",
            "bcbio-counts" = "BcbioCounts",
            "bcf" = "Bcf",
            "bed" = "Rtracklayer",
            "bed15" = "Rtracklayer",
            "bedgraph" = "Rtracklayer",
            "bedpe" = "Rtracklayer",
            "bigwig" = "Rtracklayer",
            "broadpeak" = "Rtracklayer",
            "bw" = "Rtracklayer",
            "counts" = "BcbioCounts",
            "cram" = "Cram",
            "csv" = "Csv",
            "dbf" = "Rio",
            "dif" = "Rio",
            "dta" = "Rio",
            "excel" = "Excel",
            "fa" = "Fasta",
            "fasta" = "Fasta",
            "fastq" = "Fastq",
            "fq" = "Fastq",
            "fwf" = "Rio",
            "gaf" = "Gaf",
            "gct" = "Gct",
            "gff" = "Rtracklayer",
            "gff1" = "Rtracklayer",
            "gff2" = "Rtracklayer",
            "gff3" = "Rtracklayer",
            "gmt" = "Gmt",
            "gmx" = "Gmx",
            "grp" = "Grp",
            "gsheet" = "Rio",
            "gtf" = "Rtracklayer",
            "json" = "Json",
            "lines" = "Lines",
            "log" = "Lines",
            "maf" = "Maf",
            "mat" = "Rio",
            "md" = "Lines",
            "mtp" = "Rio",
            "mtx" = "Mtx",
            "narrowpeak" = "Rtracklayer",
            "obo" = "Obo",
            "ods" = "Rio",
            "por" = "Rio",
            "psv" = "Rio",
            "py" = "Lines",
            "pzfx" = "Pzfx",
            "r" = "Lines",
            "rda" = "RData",
            "rdata" = "RData",
            "rds" = "Rds",
            "rec" = "Rio",
            "rio" = "Rio",
            "rmd" = "Lines",
            "rtracklayer" = "Rtracklayer",
            "sam" = "Sam",
            "sas7bdat" = "Rio",
            "sav" = "Rio",
            "sh" = "Lines",
            "syd" = "Rio",
            "table" = "Table",
            "txt" = {
                abort(sprintf(
                    fmt = paste(
                        "Automatic import of {.var %s} file is not supported.",
                        "Specify using {.arg %s} (e.g. {.var %s}, {.var %s}).",
                        sep = "\n"
                    ),
                    "txt",
                    "format",
                    "lines", "table"
                ))
            },
            "tsv" = "Tsv",
            "vcf" = "Vcf",
            "wig" = "Rtracklayer",
            "xls" = "Excel",
            "xlsb" = "Excel",
            "xlsx" = "Excel",
            "xpt" = "Rio",
            "yaml" = "Yaml",
            "yml" = "Yaml",
            "zsh" = "Lines",
            abort(sprintf(
                "{.pkg %s} does not support {.var %s} extension.",
                .pkgName, format
            ))
        )
        class <- paste0("Pipette", class, "File")
        class
    }



#' Get function
#'
#' @note Updated 2021-08-24.
#' @noRd
#'
#' @param f `character(1)`.
#' Function name.
#' @param pkg `character(1)`.
#' Package name.
#'
#' @return `function`.
.getFunction <-
    function(f, pkg) {
        assert(
            isString(f),
            isString(pkg),
            requireNamespaces(pkg)
        )
        x <- get(x = f, envir = asNamespace(pkg), inherits = TRUE)
        assert(is.function(x))
        x
    }



#' Internal importer for a sparse matrix sidecar file (e.g. `.rownames`)
#'
#' @note Updated 2021-09-25.
#' @noRd
.importMTXSidecar <-
    function(file, quiet) {
        assert(
            isString(file),
            isFlag(quiet)
        )
        if (isFALSE(quiet)) {
            alert(sprintf("Importing sidecar {.file %s}.", file))
        }
        object <- import(
            con = as.character(file),
            format = "lines",
            quiet = quiet
        )
        object
    }



#' Dynamically handle a local or remote file path
#'
#' @section Vectorization:
#'
#' This function is vectorized and supports mixed local and remote paths. Remote
#' files are downloaded locally to a temporary directory.
#'
#' @section Compressed files:
#'
#' Compressed files will automatically be decompressed. Currently, these file
#' extensions are natively supported: `BZ2`, `GZ`, `XZ`, and `ZIP`.
#'
#' @note Updated 2022-09-13.
#' @noRd
#'
#' @inheritParams AcidRoxygen::params
#' @param file `character(1)`.
#' Local file path or remote URL.
#' @param tempPrefix `character(1)`.
#' Prefix to use for temporary file basename.
#'
#' @return `character`.
#' Local file path(s). Stops on a missing file.
#'
#' @seealso
#' - `tempfile()`.
#' - `tempdir()`.
#'
#' @examples
#' ## Local
#' file <- system.file("extdata", "example.csv", package = "pipette")
#' x <- localOrRemoteFile(file)
#' basename(x)
#'
#' ## Remote
#' file <- AcidBase::pasteUrl(
#'     pipetteTestsUrl,
#'     "hgnc.txt.gz",
#'     protocol = "none"
#' )
#' x <- .localOrRemoteFile(file)
#' print(x)
.localOrRemoteFile <-
    function(file, quiet) {
        assert(
            isString(file),
            isFlag(quiet)
        )
        tmpDir <- tempdir2()
        tmpPrefix <- paste0(.pkgName, "-")
        fileExt <- fileExt(file)
        tmpFileExt <- ifelse(
            test = is.na(fileExt),
            yes = "",
            no = paste0(".", fileExt)
        )
        if (isAUrl(file)) {
            assert(isAnExistingUrl(file))
            url <- file
            file <- tempfile(
                pattern = tmpPrefix,
                tmpdir = tmpDir,
                fileext = tmpFileExt
            )
            if (isSubset(
                x = fileExt(path = url, pattern = "\\.([a-zA-Z0-9]+)$"),
                y = c("bz2", "gz", "rda", "rds", "xls", "xlsx", "xz", "zip")
            )) {
                ## Write binary.
                mode <- "wb"
            } else {
                ## Write (default).
                mode <- "w"
            }
            download(
                url = url,
                destfile = file,
                quiet = quiet,
                mode = mode
            )
        }
        assert(isAFile(file))
        if (isMatchingRegex(pattern = compressExtPattern, x = file)) {
            if (!isMatchingFixed(
                pattern = file.path(tmpDir, tmpPrefix),
                x = file
            )) {
                tmpFile <- tempfile(
                    pattern = tmpPrefix,
                    tmpdir = tmpDir,
                    fileext = tmpFileExt
                )
                file.copy(from = file, to = tmpFile)
                file <- tmpFile
            }
            file <- decompress(
                file = file,
                remove = TRUE,
                overwrite = TRUE
            )
        }
        realpath(file)
    }



#' Original resource
#'
#' @note Updated 2023-09-20.
#' @noRd
.origResource <- function(object) {
    assert(is(object, "PipetteFile"))
    x <- slot(object, "origResource")
    assert(isString(x, nullOk = TRUE))
    x
}



#' Assign original resource
#'
#' @note Updated 2023-09-20.
#' @noRd
`.origResource<-` <- function(object, value) {
    assert(is(object, "PipetteFile"))
    slot(object, "origResource") <- value
    object
}



#' Resource
#'
#' @note Updated 2023-09-20.
#' @noRd
.resource <- function(object) {
    assert(is(object, "PipetteFile"))
    x <- slot(object, "resource")
    assert(isString(x))
    x
}



#' Return standardized import object
#'
#' @note Updated 2021-09-24.
#' @noRd
.returnImport <-
    function(object,
             con,
             rownames = FALSE,
             rownameCol = NULL,
             colnames = FALSE,
             makeNames = FALSE,
             metadata = FALSE,
             whatPkg = NULL,
             whatFun = NULL,
             quiet = getOption(
                 x = "acid.quiet",
                 default = FALSE
             )) {
        validObject(object)
        assert(
            is(con, "PipetteFile"),
            isFlag(rownames),
            isScalar(rownameCol) || is.null(rownameCol),
            isFlag(colnames) || isCharacter(colnames),
            is.function(makeNames) ||
                is.null(makeNames) ||
                isFALSE(makeNames),
            isFlag(metadata),
            isString(whatPkg, nullOk = TRUE),
            isString(whatFun, nullOk = TRUE),
            isFlag(quiet)
        )
        file <- .origResource(con)
        if (is.null(file)) {
            file <- .resource(con)
        }
        if (!is.null(rownameCol)) {
            rownames <- TRUE
        }
        ## Check that manual column names are correct.
        if (isCharacter(colnames)) {
            assert(identical(colnames(object), colnames))
        }
        ## Attempt to set row names automatically for data frames.
        if (
            is.data.frame(object) &&
                isTRUE(rownames) &&
                !hasRownames(object)
        ) {
            if (is.null(rownameCol)) {
                rownameCol <- matchRownameColumn(object)
            }
            if (!is.null(rownameCol)) {
                assert(isScalar(rownameCol))
                if (!isString(rownameCol)) {
                    rownameCol <- colnames(object)[[rownameCol]]
                }
                assert(
                    isString(rownameCol),
                    isSubset(rownameCol, colnames(object))
                )
                if (isFALSE(quiet)) {
                    alertInfo(sprintf(
                        "Setting row names from {.var %s} column.",
                        rownameCol
                    ))
                }
                rownames(object) <- object[[rownameCol]]
                object[[rownameCol]] <- NULL
            }
        }
        if (hasRownames(object)) {
            assert(hasNoDuplicates(rownames(object)))
        }
        if (hasNames(object)) {
            if (hasDuplicates(names(object))) {
                dupes <- sort(names(object)[duplicated(names(object))])
                alertWarning(sprintf(
                    "Duplicate names: {.var %s}.",
                    toInlineString(dupes, n = 5L)
                ))
            }
            ## Ensure names are syntactically valid, when applicable.
            if (is.function(makeNames)) {
                ## Harden against any object classes that don't support names
                ## assignment, to prevent unwanted error on this step.
                tryCatch(
                    expr = {
                        names(object) <- makeNames(names(object))
                    },
                    error = function(e) NULL
                )
                if (isFALSE(hasValidNames(object))) {
                    alertWarning("Invalid names detected.")
                }
            }
            assert(hasNoDuplicates(names(object)))
        }
        if (isTRUE(metadata)) {
            metadata2(object, which = "import") <-
                SimpleList(
                    "date" = Sys.Date(),
                    "file" = ifelse(
                        test = isTRUE(isAFile(file)),
                        yes = realpath(file),
                        no = file
                    ),
                    "importerName" = paste0(whatPkg, "::", whatFun),
                    "importerVersion" = packageVersion(whatPkg),
                    "packageName" = .pkgName,
                    "packageVersion" = .pkgVersion
                )
        }
        object
    }



## Primary S4 methods ==========================================================

#' Primary `import` method, that hands off to classed file-extension variants
#'
#' @details
#' We're supporting remote files, so don't check using `isAFile()` here.
#'
#' Allow Google Sheets import using rio, by matching the URL.
#' Otherwise, coerce the file extension to uppercase, for easy matching.
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,character` <- # nolint
    function(con, format = NULL, ...) {
        dots <- list(...)
        if (isSubset("quiet", names(dots))) {
            quiet <- dots[["quiet"]]
        } else {
            quiet <- FALSE
        }
        assert(
            isString(format, nullOk = TRUE),
            isFlag(quiet)
        )
        if (isMatchingRegex(
            pattern = "^https://docs\\.google\\.com/spreadsheets",
            x = con
        )) {
            format <- "gsheet"
        }
        if (is.null(format)) {
            format <- fileExt(con)
        }
        assert(
            isString(format),
            msg = sprintf(
                fmt = paste(
                    "{.arg %s} ({.file %s}) doesn't contain extension.",
                    "Set the file format manually using {.arg %s}.",
                    "Refer to {.pkg %s}::{.fun %s} for details.",
                    sep = "\n"
                ),
                "con", basename(con),
                "format",
                "pipette", "import"
            )
        )
        class <- .formatToFileClass(format)
        assert(
            hasMethod(
                f = "import",
                signature = signature(con = class),
                where = asNamespace(.pkgName)
            ),
            msg = sprintf("Unsupported class: {.cls %s}.", class)
        )
        switch(
            EXPR = format,
            "gsheet" = {
                con <- new(Class = class, resource = con)
            },
            {
                origResource <- con
                if (isAFile(origResource)) {
                    origResource <- realpath(origResource)
                }
                resource <- .localOrRemoteFile(con, quiet = quiet)
                con <- new(Class = class, resource = resource)
                .origResource(con) <- origResource
            }
        )
        assert(
            is(con, "PipetteFile"),
            validObject(con)
        )
        out <- import(con = con, ...)
        resource <- .resource(con)
        if (isATempFile(resource)) {
            file.remove(resource)
        }
        out
    }



## Updated 2023-11-03.
`import,textConnection` <- # nolint
    function(con,
             format = c("csv", "tsv", "json", "yaml"),
             colnames = TRUE,
             quote = "\"",
             naStrings = pipette::naStrings,
             quiet = FALSE) {
        assert(
            is(con, "textConnection"),
            isString(format),
            isString(quote),
            is.character(naStrings),
            isFlag(quiet)
        )
        format <- match.arg(format)
        if (isSubset(format, c("json", "yaml"))) {
            switch(
                EXPR = format,
                "json" = {
                    whatPkg <- "jsonlite"
                    whatFun <- "fromJSON"
                    args <- list(
                        "txt" = readLines(con),
                        "simplifyVector" = FALSE,
                        "flatten" = FALSE
                    )
                },
                "yaml" = {
                    whatPkg <- "yaml"
                    whatFun <- "read_yaml"
                    args <- list("text" = readLines(con))
                }
            )
            what <- .getFunction(f = whatFun, pkg = whatPkg)
            object <- do.call(what = what, args = args)
            assert(is.list(object))
        } else {
            whatPkg <- "base"
            whatFun <- "read.table"
            args <- list(
                "file" = con,
                "blank.lines.skip" = FALSE,
                "fill" = FALSE,
                "na.strings" = naStrings,
                "quote" = quote,
                "sep" = switch(
                    EXPR = format,
                    "csv" = ",",
                    "tsv" = "\t"
                )
            )
            if (isCharacter(colnames)) {
                args[["header"]] <- FALSE
                args[["col.names"]] <- colnames
            } else {
                args[["header"]] <- colnames
            }
            if (isFALSE(quiet)) {
                alert(sprintf(
                    "Importing text connection with {.pkg %s}::{.fun %s}.",
                    whatPkg, whatFun
                ))
            }
            what <- .getFunction(f = whatFun, pkg = whatPkg)
            object <- do.call(what = what, args = args)
            assert(is.data.frame(object))
        }
        object
    }



## R data importers ============================================================

#' Import an R data file containing multiple objects (`.rda`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteRDataFile` <- # nolint
    function(con, quiet = FALSE) {
        assert(isFlag(quiet))
        file <- .resource(con)
        whatPkg <- "base"
        whatFun <- "load"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        saveEnv <- new.env()
        args <- list(
            "file" = file,
            "envir" = saveEnv
        )
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        if (isFALSE(hasLength(saveEnv, n = 1L))) {
            abort(sprintf(
                "{.file %s} does not contain a single object.",
                basename(file)
            ))
        }
        object <- get(object, envir = saveEnv, inherits = FALSE)
        if (isFALSE(quiet)) {
            tryCatch(
                expr = {
                    validObject(object)
                },
                error = function(e) {
                    conditionMessage(e)
                }
            )
        }
        object
    }



#' Import an R data serialized file (`.rds`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteRdsFile` <- # nolint
    function(con, quiet = FALSE) {
        assert(isFlag(quiet))
        file <- .resource(con)
        whatPkg <- "base"
        whatFun <- "readRDS"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list("file" = file)
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        if (isFALSE(quiet)) {
            tryCatch(
                expr = {
                    validObject(object)
                },
                error = function(e) {
                    conditionMessage(e)
                }
            )
        }
        object
    }



## Array importers =============================================================

#' Import a delimited file (e.g. `.csv`, `.tsv`).
#'
#' @note Updated 2023-10-06.
#' @noRd
`import,PipetteDelimFile` <- # nolint
    function(con,
             rownames = TRUE,
             rownameCol = NULL,
             colnames = TRUE,
             quote = "\"",
             naStrings = pipette::naStrings,
             comment = "",
             skip = 0L,
             nMax = Inf,
             engine = c("base", "data.table", "readr"),
             makeNames = syntactic::makeNames,
             metadata = FALSE,
             quiet = FALSE) {
        assert(
            isFlag(rownames),
            isScalar(rownameCol) || is.null(rownameCol),
            isFlag(colnames) || isCharacter(colnames),
            is.character(quote) && length(quote) <= 1L,
            is.character(naStrings),
            is.character(comment) && length(comment) <= 1L,
            isInt(skip), isNonNegative(skip),
            isPositive(nMax),
            is.function(makeNames) ||
                is.null(makeNames) ||
                isFALSE(makeNames),
            isFlag(metadata),
            isFlag(quiet)
        )
        file <- .resource(con)
        ext <- switch(
            EXPR = class(con),
            "PipetteCsvFile" = "csv",
            "PipetteTsvFile" = "tsv",
            "PipetteTableFile" = "table",
            "table"
        )
        whatPkg <- match.arg(engine)
        if (identical(ext, "table")) {
            assert(
                identical(whatPkg, "base"),
                msg = "Only base engine is supported for table."
            )
        }
        switch(
            EXPR = whatPkg,
            "base" = {
                whatFun <- "read.table"
                args <- list(
                    "file" = file,
                    "blank.lines.skip" = TRUE,
                    "comment.char" = comment,
                    "fill" = FALSE,
                    "na.strings" = naStrings,
                    "nrows" = nMax,
                    "quote" = quote,
                    "sep" = switch(
                        EXPR = ext,
                        "csv" = ",",
                        "table" = "",
                        "tsv" = "\t"
                    ),
                    "skip" = skip,
                    "stringsAsFactors" = FALSE,
                    "strip.white" = TRUE
                )
                if (isCharacter(colnames)) {
                    args[["header"]] <- FALSE
                    args[["col.names"]] <- colnames
                } else {
                    args[["header"]] <- colnames
                }
            },
            "data.table" = {
                whatFun <- "fread"
                if (isString(comment)) {
                    abort(sprintf(
                        paste0(
                            "{.pkg %s}::{.fun %s} does not support ",
                            "comment exclusion.\n",
                            "See also: {.url %s}."
                        ),
                        whatPkg, whatFun,
                        "https://github.com/Rdatatable/data.table/issues/856"
                    ))
                }
                args <- list(
                    "file" = file,
                    "blank.lines.skip" = TRUE,
                    "check.names" = TRUE,
                    "data.table" = FALSE,
                    "fill" = FALSE,
                    "na.strings" = naStrings,
                    "nrows" = nMax,
                    "quote" = quote,
                    "skip" = skip,
                    "showProgress" = FALSE,
                    "stringsAsFactors" = FALSE,
                    "strip.white" = TRUE,
                    "verbose" = FALSE
                )
                if (isCharacter(colnames)) {
                    args[["header"]] <- FALSE
                    args[["col.names"]] <- colnames
                } else if (isTRUE(colnames)) {
                    ## Usage of "auto" instead of TRUE will attempt to handle
                    ## malformed columns, similar to readr engine.
                    args[["header"]] <- "auto"
                } else {
                    args[["header"]] <- colnames
                }
            },
            "readr" = {
                whatFun <- "read_delim"
                args <- list(
                    "file" = file,
                    "col_names" = colnames,
                    "col_types" = readr::cols(),
                    "comment" = comment,
                    "delim" = switch(
                        EXPR = ext,
                        "csv" = ",",
                        "tsv" = "\t"
                    ),
                    "lazy" = FALSE,
                    "na" = naStrings,
                    "name_repair" = make.names,
                    "n_max" = nMax,
                    "progress" = FALSE,
                    "quote" = quote,
                    "show_col_types" = FALSE,
                    "skip" = skip,
                    "skip_empty_rows" = TRUE,
                    "trim_ws" = TRUE
                )
            }
        )
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        assert(is.data.frame(object))
        if (!identical(class(object), "data.frame")) {
            object <- as.data.frame(
                x = object,
                optional = TRUE,
                make.names = FALSE,
                stringsAsFactors = FALSE
            )
        }
        assert(allAreAtomic(object))
        .returnImport(
            object = object,
            con = con,
            rownames = rownames,
            rownameCol = rownameCol,
            colnames = colnames,
            makeNames = makeNames,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



#' Import a Microsoft Excel worksheet (`.xlsx`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteExcelFile` <- # nolint
    function(con,
             sheet = 1L,
             rownames = TRUE,
             rownameCol = NULL,
             colnames = TRUE,
             skip = 0L,
             nMax = Inf,
             naStrings = pipette::naStrings,
             makeNames = syntactic::makeNames,
             metadata = FALSE,
             quiet = FALSE) {
        assert(
            isScalar(sheet),
            isFlag(rownames) || isCharacter(rownames),
            isScalar(rownameCol) || is.null(rownameCol),
            isFlag(colnames) || isCharacter(colnames),
            isInt(skip), isNonNegative(skip),
            isPositive(nMax),
            is.character(naStrings),
            is.function(makeNames) ||
                is.null(makeNames) ||
                isFALSE(makeNames),
            isFlag(metadata),
            isFlag(quiet)
        )
        file <- .resource(con)
        whatPkg <- "readxl"
        whatFun <- "read_excel"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        warn <- getOption(x = "warn")
        args <- list(
            "path" = file,
            "col_names" = colnames,
            "n_max" = nMax,
            "na" = naStrings,
            "progress" = FALSE,
            "sheet" = sheet,
            "skip" = skip,
            "trim_ws" = TRUE,
            ".name_repair" = make.names
        )
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        options(warn = 2L) # nolint
        object <- do.call(what = what, args = args)
        options(warn = warn) # nolint
        object <- as.data.frame(
            x = object,
            optional = TRUE,
            make.names = FALSE,
            stringsAsFactors = FALSE
        )
        object <- removeNa(object)
        .returnImport(
            object = object,
            con = con,
            rownames = rownames,
            rownameCol = rownameCol,
            colnames = colnames,
            makeNames = makeNames,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



#' Import a sparse matrix file (`.mtx`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteMtxFile` <- # nolint
    function(con,
             rownamesFile,
             colnamesFile,
             metadata = FALSE,
             quiet = FALSE) {
        file <- .resource(con)
        origFile <- .origResource(con)
        if (is.null(origFile)) {
            origFile <- file
        }
        if (missing(rownamesFile)) {
            rownamesFile <- paste0(origFile, ".rownames")
        }
        if (missing(colnamesFile)) {
            colnamesFile <- paste0(origFile, ".colnames")
        }
        assert(
            isString(rownamesFile, nullOk = TRUE),
            isString(colnamesFile, nullOk = TRUE),
            isFlag(metadata),
            isFlag(quiet)
        )
        whatPkg <- "Matrix"
        whatFun <- "readMM"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list("file" = file)
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        ## Add the rownames automatically using `.rownames` sidecar file.
        rownamesFile <- tryCatch(
            expr = .localOrRemoteFile(
                file = rownamesFile,
                quiet = quiet
            ),
            error = function(e) {
                NULL
            }
        )
        if (isAFile(rownamesFile)) {
            rownames(object) <-
                .importMTXSidecar(file = rownamesFile, quiet = quiet)
        }
        ## Add the colnames automatically using `.colnames` sidecar file.
        colnamesFile <- tryCatch(
            expr = .localOrRemoteFile(
                file = colnamesFile,
                quiet = quiet
            ),
            error = function(e) {
                NULL
            }
        )
        if (isAFile(colnamesFile)) {
            colnames(object) <-
                .importMTXSidecar(file = colnamesFile, quiet = quiet)
        }
        .returnImport(
            object = object,
            con = con,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



#' Import a GraphPad Prism file (`.pzfx`)
#'
#' @note Updated 2023-09-20.
#' @noRd
#'
#' @note This function doesn't support optional column names.
`import,PipettePzfxFile` <- # nolint
    function(con,
             sheet = 1L,
             makeNames = syntactic::makeNames,
             metadata = FALSE,
             quiet = FALSE) {
        assert(
            isScalar(sheet),
            is.function(makeNames) ||
                is.null(makeNames) ||
                isFALSE(makeNames),
            isFlag(metadata),
            isFlag(quiet)
        )
        file <- .resource(con)
        whatPkg <- "pzfx"
        whatFun <- "read_pzfx"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list("path" = file, "table" = sheet)
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        object <- removeNa(object)
        .returnImport(
            object = object,
            con = con,
            rownames = FALSE,
            colnames = TRUE,
            makeNames = makeNames,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



## Non-array importers =========================================================

#' Import source code lines
#'
#' @note Updated 2023-09-28.
#' @noRd
`import,PipetteLinesFile` <- # nolint
    function(con,
             comment = "",
             skip = 0L,
             nMax = Inf,
             stripWhitespace = FALSE,
             removeBlank = FALSE,
             metadata = FALSE,
             engine = c("base", "data.table", "readr"),
             quiet = FALSE) {
        assert(
            is.character(comment) && length(comment) <= 1L,
            isInt(skip),
            isInt(skip), isNonNegative(skip),
            isPositive(nMax),
            isFlag(stripWhitespace),
            isFlag(removeBlank),
            isFlag(quiet)
        )
        if (isString(comment) || isTRUE(removeBlank)) {
            assert(
                identical(nMax, eval(formals()[["nMax"]])),
                identical(skip, eval(formals()[["skip"]])),
                msg = sprintf(
                    fmt = paste0(
                        "'%s' or '%s' arguments are not supported when ",
                        "either '%s' or '%s' are enabled."
                    ),
                    "nMax", "skip",
                    "comment", "removeBlank"
                )
            )
        }
        file <- .resource(con)
        whatPkg <- match.arg(engine)
        switch(
            EXPR = whatPkg,
            "base" = {
                whatFun <- "readLines"
                args <- list(
                    "con" = file,
                    "warn" = FALSE
                )
            },
            "data.table" = {
                whatFun <- "fread"
                args <- list(
                    "file" = file,
                    "blank.lines.skip" = removeBlank,
                    "fill" = FALSE,
                    "header" = FALSE,
                    "nrows" = nMax,
                    "sep" = "\n",
                    "skip" = skip,
                    "strip.white" = stripWhitespace,
                    "verbose" = FALSE
                )
            },
            "readr" = {
                whatFun <- "read_lines"
                args <- list(
                    "file" = file,
                    "lazy" = FALSE,
                    "n_max" = nMax,
                    "progress" = FALSE,
                    "skip" = skip,
                    ## This setting considers lines containing spaces empty.
                    "skip_empty_rows" = FALSE
                )
            }
        )
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        if (isTRUE(file.size(file) == 0L)) {
            return(character())
        }
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        if (identical(whatPkg, "data.table")) {
            object <- object[[1L]]
        }
        assert(is.character(object))
        if (isTRUE(stripWhitespace)) {
            object <- gsub(
                pattern = "^[[:space:]]+",
                replacement = "",
                x = object,
                fixed = FALSE
            )
            object <- gsub(
                pattern = "[[:space:]]+$",
                replacement = "",
                x = object,
                fixed = FALSE
            )
        }
        if (isTRUE(removeBlank)) {
            object <- strRemoveEmpty(object, naOk = TRUE, spacesOk = TRUE)
        }
        if (isString(comment)) {
            keep <- !grepl(pattern = paste0("^", comment), x = object)
            object <- object[keep]
        }
        if (identical(whatPkg, "base")) {
            if (isTRUE(skip > 0L)) {
                assert(skip < length(object))
                start <- skip + 1L
                end <- length(object)
                object <- object[start:end]
            }
            if (isTRUE(nMax < length(object))) {
                object <- object[1L:nMax]
            }
        }
        .returnImport(
            object = object,
            con = con,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



#' Import a JSON file (`.json`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteJsonFile` <- # nolint
    function(con, metadata = FALSE, quiet = FALSE) {
        assert(
            isFlag(metadata),
            isFlag(quiet)
        )
        file <- .resource(con)
        whatPkg <- "jsonlite"
        whatFun <- "read_json"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list("path" = file)
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        .returnImport(
            object = object,
            con = con,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



#' Import a YAML file (`.yaml`, `.yml`)
#'
#' @note Updated 2023-11-03.
#' @noRd
`import,PipetteYamlFile` <- # nolint
    function(con, metadata = FALSE, quiet = FALSE) {
        assert(
            isFlag(metadata),
            isFlag(quiet)
        )
        file <- .resource(con)
        whatPkg <- "yaml"
        whatFun <- "read_yaml"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list("file" = file)
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        .returnImport(
            object = object,
            con = con,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



## Bioinformatics importers ====================================================

#' Import a binary sequencing alignment file (`.bam`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteBamFile` <- # nolint
    function(con, quiet = FALSE) {
        assert(isFlag(quiet))
        file <- .resource(con)
        whatPkg <- "Rsamtools"
        whatFun <- "scanBam"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list("file" = file)
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        assert(
            is.list(object),
            hasLength(object, n = 1L),
            is.list(object[[1L]])
        )
        object <- object[[1L]]
        object
    }



#' Import a binary variant call file (`.bcf`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteBcfFile` <- # nolint
    function(con, quiet = FALSE) {
        assert(isFlag(quiet))
        file <- .resource(con)
        whatPkg <- "Rsamtools"
        whatFun <- "scanBcf"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        origFile <- .origResource(con)
        indexFile <- .localOrRemoteFile(
            file = paste0(origFile, ".csi"),
            quiet = quiet
        )
        ## Ensure that a decompressed BCF file in tempdir contains index file.
        indexFile2 <- paste0(file, ".csi")
        if (!isAFile(indexFile2)) {
            file.copy(from = indexFile, to = indexFile2, overwrite = FALSE)
        }
        indexFile <- indexFile2
        args <- list("file" = file)
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        assert(
            is.list(object),
            identical(
                x = names(object),
                y = c(
                    "CHROM",
                    "POS",
                    "ID",
                    "REF",
                    "ALT",
                    "QUAL",
                    "FILTER",
                    "INFO",
                    "FORMAT",
                    "GENO",
                    "RecordsPerRange"
                )
            )
        )
        if (isATempFile(indexFile)) {
            file.remove(indexFile)
        }
        object
    }



#' Import bcbio count matrix generated by featureCounts
#'
#' @details
#' Internal importer for a bcbio count matrix file (`.counts`).
#' These files contain an `"id"` column that we need to coerce to row names.
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteBcbioCountsFile` <- # nolint
    function(con, metadata = FALSE, quiet = FALSE) {
        assert(
            isFlag(metadata),
            isFlag(quiet)
        )
        object <- import(
            con = .resource(con),
            format = "tsv",
            rownames = FALSE,
            colnames = TRUE,
            makeNames = FALSE,
            metadata = metadata,
            quiet = quiet
        )
        assert(
            is.data.frame(object),
            isSubset("id", colnames(object)),
            hasNoDuplicates(object[["id"]])
        )
        if (isTRUE(metadata)) {
            m <- metadata2(object, which = "import")
        }
        rownames(object) <- object[["id"]]
        object[["id"]] <- NULL
        ## Don't attempt to coerce `"annotated_combined.counts"` file to matrix.
        if (isSubset("symbol", colnames(object))) {
            if (isFALSE(quiet)) {
                alertInfo("Annotated counts detected.")
            }
            object <- object[
                ,
                c("symbol", setdiff(colnames(object), "symbol")),
                drop = FALSE
            ]
        } else {
            object <- as.matrix(object)
            mode(object) <- "integer"
        }
        if (isTRUE(metadata)) {
            metadata2(object, which = "import") <- m
        }
        object
    }



#' Import a compressed reference-oriented alignment map file (`.cram`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteCramFile` <- # nolint
    function(con, quiet = FALSE) {
        assert(
            requireNamespaces("Rsamtools"),
            isFlag(quiet)
        )
        file <- .resource(con)
        whatPkg <- "Rsamtools"
        whatFun <- "scanBam"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        tmpBamFile <- Rsamtools::asBam(
            file = file,
            destination = tempfile(),
            overwrite = FALSE,
            indexDestination = TRUE
        )
        args <- list("file" = tmpBamFile)
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        file.remove(tmpBamFile)
        assert(
            is.list(object),
            hasLength(object, n = 1L),
            is.list(object[[1L]])
        )
        object <- object[[1L]]
        object
    }



#' Import a FASTA file
#'
#' @note Updated 2023-09-20.
#' @noRd
#'
#' @seealso
#' - `Biostrings::readDNAStringSet()`.
#' - `Biostrings::readRNAStringSet()`.
#' - `Biostrings::readAAStringSet()`.
#'
#' @return Varies, depending on the `moleculeType` argument:
#' - `"DNA"`: `DNAStringSet`.
#' - `"RNA"`: `RNAStringSet`.
#' - `"AA"`: `AAStringSet`.
`import,PipetteFastaFile` <- # nolint
    function(con,
             moleculeType = c("DNA", "RNA", "AA"),
             metadata = FALSE,
             quiet = FALSE) {
        assert(
            isFlag(metadata),
            isFlag(quiet)
        )
        moleculeType <- match.arg(moleculeType)
        file <- .resource(con)
        whatPkg <- "Biostrings"
        whatFun <- paste0("read", moleculeType, "StringSet")
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list(
            "filepath" = file,
            "format" = "fasta",
            "nrec" = -1L,
            "skip" = 0L,
            "seek.first.rec" = TRUE,
            "use.names" = TRUE
        )
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        tryCatch(
            expr = {
                object <- do.call(what = what, args = args)
            },
            warning = function(w) {
                msg <- w[["message"]]
                if (isMatchingFixed(
                    x = msg,
                    pattern = "invalid one-letter sequence codes"
                )) {
                    msg <- paste(
                        msg,
                        "Ensure that 'moleculeType' argument is correct.",
                        sep = "\n"
                    )
                }
                abort(msg)
            }
        )
        assert(is(object, paste0(moleculeType, "StringSet")))
        if (hasNames(object)) {
            if (allAreMatchingRegex(
                pattern = "\\bMI(MAT)?[0-9]+\\b",
                x = names(object)
            )) {
                alertInfo("miRBase FASTA file detected.")
                spl <- strsplit(
                    x = names(object),
                    split = " ",
                    fixed = TRUE
                )
                names <- vapply(
                    X = spl,
                    FUN = `[[`,
                    1L,
                    FUN.VALUE = character(1L),
                    USE.NAMES = FALSE
                )
                mat <- do.call(what = rbind, args = spl)
                assert(identical(ncol(mat), 5L))
                attributes <- DataFrame(
                    "id" = mat[, 1L],
                    "accession" = mat[, 2L],
                    "organism" = paste(mat[, 3L], mat[, 4L]),
                    "name" = mat[, 5L]
                )
                names(object) <- names
                rownames(attributes) <- names
                metadata(object)[["attributes"]] <- attributes
            } else if (allAreMatchingFixed(
                pattern = "|", x = names(object)
            )) {
                alertInfo(sprintf(
                    "Splitting attributes by {.var %s} separator.", "|"
                ))
                attributes <- strsplit(
                    x = names(object),
                    split = "|",
                    fixed = TRUE
                )
                attributes <- SimpleList(attributes)
                names <- vapply(
                    X = attributes,
                    FUN = `[[`,
                    1L,
                    FUN.VALUE = character(1L),
                    USE.NAMES = FALSE
                )
                names(object) <- names
                names(attributes) <- names
                metadata(object)[["attributes"]] <- attributes
            }
        }
        .returnImport(
            object = object,
            con = con,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



#' Import a FASTQ file
#'
#' @note Updated 2023-09-20.
#' @noRd
#'
#' @seealso
#' - `Biostrings::readDNAStringSet()`.
#'
#' @return Varies, depending on the `moleculeType` argument:
#' - `"DNA"`: `DNAStringSet`.
#' - `"RNA"`: `RNAStringSet`.
`import,PipetteFastqFile` <- # nolint
    function(con,
             moleculeType = c("DNA", "RNA"),
             metadata = FALSE,
             quiet = FALSE) {
        assert(
            isFlag(metadata),
            isFlag(quiet)
        )
        moleculeType <- match.arg(moleculeType)
        file <- .resource(con)
        whatPkg <- "Biostrings"
        whatFun <- paste0("read", moleculeType, "StringSet")
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list(
            "filepath" = file,
            "format" = "fastq",
            "nrec" = -1L,
            "skip" = 0L,
            "seek.first.rec" = TRUE,
            "use.names" = metadata,
            "with.qualities" = metadata
        )
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        assert(is(object, paste0(moleculeType, "StringSet")))
        .returnImport(
            object = object,
            con = con,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



#' Import a Gene Ontology (GO) annotation file (`.gaf`)
#'
#' @note Updated 2023-12-15.
#' @noRd
#'
#' @seealso
## - https://geneontology.org/docs/go-annotation-file-gaf-format-2.2/
#' - http://current.geneontology.org/products/pages/downloads.html
#' - https://www.ebi.ac.uk/GOA/
#' - `BaseSet::getGAF()`
`import,PipetteGafFile` <- # nolint
    function(con, metadata = FALSE, quiet = FALSE) {
        assert(
            isFlag(metadata),
            isFlag(quiet)
        )
        object <- import(
            con = .resource(con),
            format = "tsv",
            colnames = c(
                "db",
                "dbObjectId",
                "dbObjectSymbol",
                "qualifier",
                "goId",
                "dbReference",
                "evidenceCode",
                "withFrom",
                "aspect",
                "dbObjectName",
                "dbObjectSynonym",
                "dbObjectType",
                "taxon",
                "date",
                "assignedBy",
                "annotationExtension",
                "geneProductFormId"
            ),
            comment = "!",
            metadata = metadata,
            quiet = quiet
        )
        assert(is.data.frame(object))
        object
    }



#' Import a gene cluster text file (`.gct`)
#'
#' @note Updated 2023-12-15.
#' @noRd
#'
#' @seealso
#' - https://igv.org/doc/desktop/
`import,PipetteGctFile` <- # nolint
    function(con,
             metadata = FALSE,
             quiet = FALSE,
             return = c("matrix", "data.frame")) {
        assert(
            isFlag(metadata),
            isFlag(quiet)
        )
        return <- match.arg(return)
        object <- import(
            con = .resource(con),
            format = "tsv",
            engine = "readr",
            skip = 2L,
            metadata = metadata,
            quiet = quiet
        )
        assert(
            is.data.frame(object),
            identical(
                x = colnames(object)[c(1L, 2L)],
                y = c("Name", "Description")
            )
        )
        header <- import(
            con = .resource(con),
            format = "lines",
            nMax = 2L,
            quiet = TRUE
        )
        assert(
            is.character(header),
            hasLength(header, n = 2L)
        )
        header <- strsplit(header, split = "\t", fixed = TRUE)
        dim1 <- as.integer(header[[2L]][c(1L, 2L)])
        dim2 <- c(nrow(object), ncol(object) - 2L)
        assert(
            identical(dim1, dim2),
            msg = "Dimension mismatch."
        )
        switch(
            EXPR = return,
            "data.frame" = {
                out <- object
            },
            "matrix" = {
                out <- as.matrix(object[, seq(from = 3L, to = ncol(object))])
                assert(identical(dim(out), dim2))
                if (isTRUE(metadata)) {
                    attr(out, "import") <- attr(object, "import")
                }
            }
        )
        rownames(out) <- object[["Name"]]
        out
    }



#' Import a gene matrix transposed file (`.gmt`)
#'
#' @note Updated 2023-09-20.
#' @noRd
#'
#' @seealso
#' - `fgsea::gmtPathways()`.
`import,PipetteGmtFile` <- # nolint
    function(con, quiet = FALSE) {
        assert(isFlag(quiet))
        lines <- import(
            con = .resource(con),
            format = "lines",
            metadata = FALSE,
            quiet = quiet
        )
        lines <- strsplit(lines, split = "\t")
        object <- lapply(lines, tail, n = -2L)
        names(object) <- vapply(
            X = lines,
            FUN = head,
            FUN.VALUE = character(1L),
            n = 1L
        )
        object
    }



#' Import a gene matrix file (`.gmx`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteGmxFile` <- # nolint
    function(con, quiet = FALSE) {
        assert(isFlag(quiet))
        lines <- import(
            con = .resource(con),
            format = "lines",
            metadata = FALSE,
            quiet = quiet
        )
        object <- list(tail(lines, n = -2L))
        names(object) <- lines[[1L]]
        object
    }



#' Import a gene set file (`.grp`)
#'
#' @note Updated 2023-07-07.
#' @noRd
`import,PipetteGrpFile` <- # nolint
    `import,PipetteGmxFile`



#' Import a mutation annotation format file (`.maf`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteMafFile` <- # nolint
    function(con, quiet = FALSE) {
        assert(isFlag(quiet))
        file <- .resource(con)
        whatPkg <- "maftools"
        whatFun <- "read.maf"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list(
            "maf" = file,
            "verbose" = FALSE
        )
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        assert(is(object, "MAF"))
        object
    }



#' Import an open biomedical ontologies file (`.obo`)
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteOboFile` <- # nolint
    function(con, quiet = FALSE) {
        assert(isFlag(quiet))
        file <- .resource(con)
        whatPkg <- "ontologyIndex"
        whatFun <- "get_ontology"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list(
            "file" = file,
            "propagate_relationships" = "is_a",
            "extract_tags" = "everything"
        )
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        assert(is(object, "ontology_index"))
        object
    }



#' Import a sequence alignment map file (`.sam`)
#'
#' @note Updated 2023-07-12.
#' @noRd
`import,PipetteSamFile` <- # nolint
    `import,PipetteCramFile`



#' Import a variant call file (`.vcf`)
#'
#' @note Updated 2023-07-12.
#' @noRd
`import,PipetteVcfFile` <- # nolint
    `import,PipetteBcfFile`



## Handoff methods =============================================================

#' Import a file using `rio::import()`
#'
#' @note Updated 2023-09-20.
#' @noRd
`import,PipetteRioFile` <- # nolint
    function(con,
             rownames = TRUE,
             rownameCol = NULL,
             colnames = TRUE,
             makeNames = syntactic::makeNames,
             metadata = FALSE,
             quiet = FALSE,
             ...) {
        assert(
            isFlag(rownames),
            isScalar(rownameCol) || is.null(rownameCol),
            isFlag(colnames) || isCharacter(colnames),
            is.function(makeNames) ||
                is.null(makeNames) ||
                isFALSE(makeNames),
            isFlag(metadata),
            isFlag(quiet)
        )
        file <- .resource(con)
        whatPkg <- "rio"
        whatFun <- "import"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list("file" = file, ...)
        what <- .getFunction(f = whatFun, pkg = whatPkg)
        object <- do.call(what = what, args = args)
        .returnImport(
            object = object,
            con = con,
            rownames = rownames,
            rownameCol = rownameCol,
            colnames = colnames,
            makeNames = makeNames,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



#' Import file using `rtracklayer::import()`
#'
#' @note Updated 2023-09-20.
#' @noRd
#'
#' @note Using `tryCatch()` here to error if there are any warnings.
`import,PipetteRtracklayerFile` <- # nolint
    function(con,
             metadata = FALSE,
             quiet = FALSE,
             ...) {
        assert(
            isFlag(metadata),
            isFlag(quiet)
        )
        file <- .resource(con)
        whatPkg <- "rtracklayer"
        whatFun <- "import"
        if (isFALSE(quiet)) {
            .alertImport(
                con = con,
                whatPkg = whatPkg,
                whatFun = whatFun
            )
        }
        args <- list("con" = file, ...)
        assert(requireNamespaces(whatPkg))
        what <- methodFunction(
            f = whatFun,
            signature = signature(
                "con" = "character",
                "format" = "missing",
                "text" = "ANY"
            ),
            package = whatPkg
        )
        tryCatch(
            expr = {
                object <- do.call(what = what, args = args)
            },
            warning = function(w) {
                abort(w)
            }
        )
        .returnImport(
            object = object,
            con = con,
            metadata = metadata,
            whatPkg = whatPkg,
            whatFun = whatFun,
            quiet = quiet
        )
    }



## S4 method exports ===========================================================

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "character"),
    definition = `import,character`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "textConnection"),
    definition = `import,textConnection`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteRdsFile"),
    definition = `import,PipetteRdsFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteRDataFile"),
    definition = `import,PipetteRDataFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteDelimFile"),
    definition = `import,PipetteDelimFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteLinesFile"),
    definition = `import,PipetteLinesFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteExcelFile"),
    definition = `import,PipetteExcelFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteBamFile"),
    definition = `import,PipetteBamFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteBcfFile"),
    definition = `import,PipetteBcfFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteCramFile"),
    definition = `import,PipetteCramFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteFastaFile"),
    definition = `import,PipetteFastaFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteFastqFile"),
    definition = `import,PipetteFastqFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteGafFile"),
    definition = `import,PipetteGafFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteGctFile"),
    definition = `import,PipetteGctFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteGmtFile"),
    definition = `import,PipetteGmtFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteGmxFile"),
    definition = `import,PipetteGmxFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteGrpFile"),
    definition = `import,PipetteGrpFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteJsonFile"),
    definition = `import,PipetteJsonFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteMafFile"),
    definition = `import,PipetteMafFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteMtxFile"),
    definition = `import,PipetteMtxFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteOboFile"),
    definition = `import,PipetteOboFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipettePzfxFile"),
    definition = `import,PipettePzfxFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteSamFile"),
    definition = `import,PipetteSamFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteVcfFile"),
    definition = `import,PipetteVcfFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteYamlFile"),
    definition = `import,PipetteYamlFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteBcbioCountsFile"),
    definition = `import,PipetteBcbioCountsFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteRioFile"),
    definition = `import,PipetteRioFile`
)

#' @rdname import
#' @export
setMethod(
    f = "import",
    signature = signature(con = "PipetteRtracklayerFile"),
    definition = `import,PipetteRtracklayerFile`
)
acidgenomics/pipette documentation built on Jan. 31, 2024, 10:32 p.m.