import: Import data into R
In acidgenomics/pipette: Pipette Biological Data In and Out of R

import

R Documentation

Import data into R

Description

Import data into R

Usage

import(con, ...)

## S4 method for signature 'character'
import(con, format = NULL, ...)

## S4 method for signature 'textConnection'
import(
  con,
  format = c("csv", "tsv", "json", "yaml"),
  colnames = TRUE,
  quote = "\"",
  naStrings = pipette::naStrings,
  quiet = FALSE
)

## S4 method for signature 'PipetteRdsFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteRDataFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteDelimFile'
import(
  con,
  rownames = TRUE,
  rownameCol = NULL,
  colnames = TRUE,
  quote = "\"",
  naStrings = pipette::naStrings,
  comment = "",
  skip = 0L,
  nMax = Inf,
  engine = c("base", "data.table", "readr"),
  makeNames = syntactic::makeNames,
  metadata = FALSE,
  quiet = FALSE
)

## S4 method for signature 'PipetteLinesFile'
import(
  con,
  comment = "",
  skip = 0L,
  nMax = Inf,
  stripWhitespace = FALSE,
  removeBlank = FALSE,
  metadata = FALSE,
  engine = c("base", "data.table", "readr"),
  quiet = FALSE
)

## S4 method for signature 'PipetteExcelFile'
import(
  con,
  sheet = 1L,
  rownames = TRUE,
  rownameCol = NULL,
  colnames = TRUE,
  skip = 0L,
  nMax = Inf,
  naStrings = pipette::naStrings,
  makeNames = syntactic::makeNames,
  metadata = FALSE,
  quiet = FALSE
)

## S4 method for signature 'PipetteBamFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteBcfFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteCramFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteFastaFile'
import(
  con,
  moleculeType = c("DNA", "RNA", "AA"),
  metadata = FALSE,
  quiet = FALSE
)

## S4 method for signature 'PipetteFastqFile'
import(con, moleculeType = c("DNA", "RNA"), metadata = FALSE, quiet = FALSE)

## S4 method for signature 'PipetteGafFile'
import(con, metadata = FALSE, quiet = FALSE)

## S4 method for signature 'PipetteGctFile'
import(
  con,
  metadata = FALSE,
  quiet = FALSE,
  return = c("matrix", "data.frame")
)

## S4 method for signature 'PipetteGmtFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteGmxFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteGrpFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteJsonFile'
import(con, metadata = FALSE, quiet = FALSE)

## S4 method for signature 'PipetteMafFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteMtxFile'
import(con, rownamesFile, colnamesFile, metadata = FALSE, quiet = FALSE)

## S4 method for signature 'PipetteOboFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipettePzfxFile'
import(
  con,
  sheet = 1L,
  makeNames = syntactic::makeNames,
  metadata = FALSE,
  quiet = FALSE
)

## S4 method for signature 'PipetteSamFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteVcfFile'
import(con, quiet = FALSE)

## S4 method for signature 'PipetteYamlFile'
import(con, metadata = FALSE, quiet = FALSE)

## S4 method for signature 'PipetteBcbioCountsFile'
import(con, metadata = FALSE, quiet = FALSE)

## S4 method for signature 'PipetteRioFile'
import(
  con,
  rownames = TRUE,
  rownameCol = NULL,
  colnames = TRUE,
  makeNames = syntactic::makeNames,
  metadata = FALSE,
  quiet = FALSE,
  ...
)

## S4 method for signature 'PipetteRtracklayerFile'
import(con, metadata = FALSE, quiet = FALSE, ...)

Arguments

`con`	`character(1)` or `connection`. Data connection. Most commonly, use `character(1)` to represent a file path or URL. Less commonly, can create a `textConnection` to a character vector of source code lines (text), which is useful for reformatting malformed files directly in R.
`format`	`character(1)` or `NULL`. An optional file format type, which can be used to override the file format inferred from `con`. Only recommended for file and URL paths that don't contain an extension.
`...`	Additional arguments.
`colnames`	`logical(1)` or `character`. Automatically assign column names, using the first header row. Applies to file types that return `data.frame` only. Pass in a `character` vector to define the column names manually.
`quote`	`character(1)`. The set of quoting characters. To disable quoting altogether, use `quote = ""` (not generally recommended). Applies to plain text delimited files only.
`naStrings`	`character`. Character strings to reformat as `NA`. Refer to `pipette::naStrings` for defaults.
`quiet`	`logical(1)`. Perform command quietly, suppressing messages.
`rownames`	`logical(1)`. Automatically assign row names, if `rowname` column is defined. Applies to file types that return a data frame only.
`rownameCol`	`NULL`, `character(1)`, or `integer(1)`. Applies only when `rownames = TRUE`. Column name to use for row names assignment. If left `NULL` (default), the function will call `matchRownameCol()` internally to attempt to automatically match the row name column (e.g. `"rowname"` or `"rn"`). Otherwise, can manually define using a scalar argument, either the name directly or position in the column names.
`comment`	`character(1)`. Comment character to detect at beginning of line, which will skip when parsing file. Use `""` to disable interpretation of comments, which is particularly useful when parsing lines. Applies to plain text delimited and source code lines only.
`skip`	`integer(1)`. Number of lines to skip. Applies to delimited file (CSV, TSV), Excel Workbook, or lines.
`nMax`	`integer(1)` or `Inf`. Maximum number of lines to parse. Applies to plain text delimited, Excel, and source code lines only.
`engine`	`character(1)`. Engine (package) to use for import. Currently supported:`"base"`, `"data.table"`, or `"readr"`.
`makeNames`	`function`. Apply syntactic naming function to (column) names. Function is never applied to row names, when they are defined in object.
`metadata`	`logical(1)`. Slot useful metadata about the import into the object.
`stripWhitespace`	`logical(1)`. Strip leading and/or trailing whitespace. Applies to source code lines.
`removeBlank`	`logical(1)`. Remove blank lines. Applies to source code lines.
`sheet`	`character(1)` or `integer(1)`. Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Defaults to the first sheet. Applies to Excel Workbook, Google Sheet, or GraphPad Prism file.
`moleculeType`	`character(1)`. Molecule type, either DNA or RNA. Most RNA-seq FASTQ files contain complementary DNA (cDNA) sequences, not direct sequencing of the RNA molecules.
`return`	`character(1)`. Object class to return.
`rownamesFile`, `colnamesFile`	`character(1)` or `NULL`. Row names and/or column names sidecare file. Applies primarily to MatrixMarket Exchange files (e.g. `MTXFile`).

Details

import() supports automatic loading of common file types, by wrapping popular importer functions. It intentionally designed to be simple, with few arguments. Remote URLs and compressed files are supported. If you need more complex import settings, just call the wrapped importer directly instead.

Value

Varies, depending on the file type (format):

R data serialized (RDS): variable.
Currently recommend over RDA, if possible.
Imported by readRDS().
R data (RDA, RDATA): variable.
Must contain a single object. Doesn't require internal object name to match, unlike loadData().
Imported by load().
Plain text delimited (CSV, TSV, TXT): data.frame.
Data separated by commas, tabs, or visual spaces.
Note that TXT structure is amgibuous and actively discouraged.
Refer to ⁠Data frame return⁠ section for details on how to change the default return type to DFrame, tbl_df or data.table.
Imported by readr::read_delim() by default.
Excel workbook (XLSB, XLSX): data.frame.
Resave in plain text delimited format instead, if possible.
Imported by readxl::read_excel().
Legacy Excel workbook (pre-2007) (XLS): data.frame.
Resave in plain text delimited format instead, if possible.
Note that import of files in this format is slow.
Imported by readxl::read_excel().
GraphPad Prism project (PZFX): data.frame.
Experimental. Consider resaving in CSV format instead.
Imported by pzfx::read_pzfx().
General feature format (GFF, GFF1, GFF2, GFF3, GTF): GRanges.
Imported by rtracklayer::import().
Gene Ontology (GO) annotation file (GAF): data.frame with 17 columns.
Imported by base::read.table().
MatrixMarket exchange sparse matrix (MTX): sparseMatrix.
Imported by Matrix::readMM().
**Sequence alignment/map format (SAM, BAM, CRAM): list.
Imported by Rsamtools::scanBam.
Mutation annotation format (MAF): MAF.
Imported by maftools::read.maf().
Variant annotation format (VCF, BCF): list.
Imported by Rsamtools::scanBcf.
Gene cluster text (GCT): matrix or data.frame.
Imported by readr::read_delim().
Gene sets (for GSEA) (GMT, GMX): character.
Browser extensible data (BED, BED15, BEDGRAPH, BEDPE): GRanges.
Imported by rtracklayer::import().
ChIP-seq peaks (BROADPEAK, NARROWPEAK): GRanges.
Imported by rtracklayer::import().
Wiggle track format (BIGWIG, BW, WIG): GRanges.
Imported by rtracklayer::import().
JSON serialization data (JSON): list.
Imported by jsonlite::read_json().
YAML serialization data (YAML, YML): list.
Imported by yaml::yaml.load_file().
Lines (LOG, MD, PY, R, RMD, SH): character.
Source code or log files.
Imported by readr::read_delim() by default.
Infrequently used rio-compatible formats (ARFF, DBF, DIF, DTA, MAT, MTP, ODS, POR, SAS7BDAT, SAV, SYD, REC, XPT): variable.
Imported by rio::import().

Row and column names

Row names. Row name handling has become an inconsistent mess in R because of differential support in base R, tidyverse, data.table, and Bioconductor. To maintain sanity, import() attempts to handle row names automatically. The function checks for a rowname column in delimited data, and moves these values into the object's row names, if supported by the return type (e.g. data.frame, DFrame). Note that tbl_df (tibble) and data.table intentionally do not support row names. When returning in this format, no attempt to assign the rowname column into the return object's row names is made. Note that import() is strict about this matching and only checks for a rowname column, similar to the default syntax recommended in tibble::rownames_to_column(). To disable this behavior, set rownames = FALSE, and no attempt will be made to set the row names.

Column names. import() assumes that delimited files always contain column names. If you are working with a file that doesn't contain column names, either set colnames = FALSE or pass the names in as a character vector. It's strongly recommended to always define column names in a supported file type.

FASTA and FASTQ files

FASTA and FASTQ files are currently managed internally by the Biostrings package. Refer to readDNAStringSet and readRNAStringSet for details. Import of these files will return DNAStringSet or RNAStringSet depending on the input, defined by moleculeType argument.

General feature format (GFF, GTF)

The GFF (General Feature Format) format consists of one line per feature, each containing 9 columns of data, plus optional track definition lines. The GTF (General Transfer Format) is identical to GFF version 2.

Gene cluster text format (GCT)

Refer to the IGV website for details.

GSEA gene set files

Refer to the Broad Institute GSEA wiki for details.

Matrix Market Exchange

Reading a Matrix Market Exchange file requires ROWNAMES and COLNAMES sidecar files containing the corresponding row and column names of the sparse matrix.

bcbio-nextgen count matrix

bcbio count matrix (e.g. generated from featureCounts) and related sidecar files are natively supported.

COUNTS: Counts table (e.g. RNA-seq aligned counts).
COLNAMES: Sidecar file containing column names.
ROWNAMES: Sidecar file containing row names.

Denylisted extensions

These file formats are intentionally not supported: DOC, DOCX, PDF, PPT, PPTX.

Duplicate methods

GMTFile and OBOFile are also supported by BiocSet package.

Note

Updated 2023-12-15.

Examples

con <- system.file("extdata", "example.csv", package = "pipette")

## Row and column names enabled.
x <- import(con = con)
print(head(x))

## Row and column names disabled.
x <- import(con = con, rownames = FALSE, colnames = FALSE)
print(head(x))

acidgenomics/pipette documentation built on June 9, 2025, 1:56 p.m.

acidgenomics/pipette index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

acidgenomics/pipette
Pipette Biological Data In and Out of R

import: Import data into R
In acidgenomics/pipette: Pipette Biological Data In and Out of R

Import data into R

Description

Usage

Arguments

Details

Value

Row and column names

FASTA and FASTQ files

General feature format (GFF, GTF)

Gene cluster text format (GCT)

GSEA gene set files

Matrix Market Exchange

bcbio-nextgen count matrix

Denylisted extensions

Duplicate methods

Note

See Also

Examples

Related to import in acidgenomics/pipette...

R Package Documentation

Browse R Packages

We want your feedback!

acidgenomics/pipette Pipette Biological Data In and Out of R

import: Import data into R In acidgenomics/pipette: Pipette Biological Data In and Out of R

Import data into R

Description

Usage

Arguments

Details

Value

Row and column names

FASTA and FASTQ files

General feature format (GFF, GTF)

Gene cluster text format (GCT)

GSEA gene set files

Matrix Market Exchange

bcbio-nextgen count matrix

Denylisted extensions

Duplicate methods

Note

See Also

Examples

Related to import in acidgenomics/pipette...

R Package Documentation

Browse R Packages

We want your feedback!

acidgenomics/pipette
Pipette Biological Data In and Out of R

import: Import data into R
In acidgenomics/pipette: Pipette Biological Data In and Out of R