import: Import

Description Usage Arguments Details Value Row and column names Data frame return FASTA and FASTQ files (FASTAFile, FASTQFile) General feature format (GFF, GTF; RtracklayerFile) GSEA gene set files (GMTFile, GMXFile, GRPFile) Matrix Market Exchange (MTXFile) bcbio-nextgen count matrix (BcbioCountsFile) Denylisted extensions Note See Also Examples

Description

Read file by extension into R.

Usage

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
import(file, ...)

## S4 method for signature 'character'
import(file, format = "auto", ...)

## S4 method for signature 'DelimFile'
import(
  file,
  rownames = TRUE,
  rownameCol = NULL,
  colnames = TRUE,
  comment = "",
  skip = 0L,
  nMax = Inf,
  makeNames = getOption("acid.import.make.names", default = syntactic::makeNames),
  engine = getOption(x = "acid.import.engine", default = "data.table"),
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE)
)

## S4 method for signature 'LinesFile'
import(
  file,
  comment = "",
  skip = 0L,
  nMax = Inf,
  stripWhitespace = FALSE,
  removeBlank = FALSE,
  metadata = getOption("acid.import.metadata", default = FALSE),
  engine = getOption(x = "acid.import.engine", default = "base"),
  quiet = getOption("acid.quiet", default = FALSE)
)

## S4 method for signature 'ExcelFile'
import(
  file,
  sheet = 1L,
  rownames = TRUE,
  rownameCol = NULL,
  colnames = TRUE,
  skip = 0L,
  nMax = Inf,
  makeNames = getOption("acid.import.make.names", default = syntactic::makeNames),
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE)
)

## S4 method for signature 'FASTAFile'
import(
  file,
  moleculeType = c("DNA", "RNA"),
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE)
)

## S4 method for signature 'FASTQFile'
import(
  file,
  moleculeType = c("DNA", "RNA"),
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE)
)

## S4 method for signature 'GMTFile'
import(file, quiet = getOption("acid.quiet", default = FALSE))

## S4 method for signature 'GMXFile'
import(file, quiet = getOption("acid.quiet", default = FALSE))

## S4 method for signature 'GRPFile'
import(file, quiet = getOption("acid.quiet", default = FALSE))

## S4 method for signature 'JSONFile'
import(
  file,
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE)
)

## S4 method for signature 'MTXFile'
import(
  file,
  rownamesFile = paste0(file, ".rownames"),
  colnamesFile = paste0(file, ".colnames"),
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE)
)

## S4 method for signature 'PZFXFile'
import(
  file,
  sheet = 1L,
  makeNames = getOption("acid.import.make.names", default = syntactic::makeNames),
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE)
)

## S4 method for signature 'YAMLFile'
import(
  file,
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE)
)

## S4 method for signature 'RDSFile'
import(file, quiet = getOption("acid.quiet", default = FALSE))

## S4 method for signature 'RDataFile'
import(file, quiet = getOption("acid.quiet", default = FALSE))

## S4 method for signature 'RioFile'
import(
  file,
  rownames = TRUE,
  rownameCol = NULL,
  colnames = TRUE,
  makeNames = getOption("acid.import.make.names", default = syntactic::makeNames),
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE),
  ...
)

## S4 method for signature 'RtracklayerFile'
import(
  file,
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE),
  ...
)

## S4 method for signature 'BcbioCountsFile'
import(
  file,
  metadata = getOption("acid.import.metadata", default = FALSE),
  quiet = getOption("acid.quiet", default = FALSE)
)

Arguments

file

character(1). File path.

...

Additional arguments.

format

character(1). An optional file format type, which can be used to override the file format inferred from file. Only recommended for file and URL paths that don't contain an extension.

rownames

logical(1). Automatically assign row names, if rowname column is defined. Applies to file types that return a data frame only.

rownameCol

NULL, character(1), or integer(1). Applies only when rownames = TRUE. Column name to use for row names assignment. If left NULL (default), the function will call matchRownameCol() internally to attempt to automatically match the row name column (e.g. "rowname" or "rn"). Otherwise, can manually define using a scalar argument, either the name directly or position in the column names.

colnames

logical(1) or character. Automatically assign column names, using the first header row. Applies to file types that return data.frame only. Pass in a character vector to define the column names manually.

comment

character(1). Comment character to detect at beginning of line, which will skip when parsing file. Use "" to disable interpretation of comments, which is particularly useful when parsing lines. Applies to plain text delimited and source code lines only.

skip

integer(1). Number of lines to skip. Applies to delimited file (CSV, TSV), Excel Workbook, or lines.

nMax

integer(1) or Inf. Maximum number of lines to parse. Applies to plain text delimited, Excel, and source code lines only.

makeNames

function. Apply syntactic naming function to (column) names. Function is never applied to row names, when they are defined in object.

engine

character(1). Engine (package) to use for import. Currently supported:

  • base

  • data.table

  • readr

  • vroom

metadata

list. Metadata.

quiet

logical(1). Perform command quietly, suppressing messages.

stripWhitespace

logical(1). Strip leading and/or trailing whitespace. Applies to source code lines.

removeBlank

logical(1). Remove blank lines. Applies to source code lines.

sheet

character(1) or integer(1). Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Defaults to the first sheet. Applies to Excel Workbook, Google Sheet, or GraphPad Prism file.

moleculeType

character(1). Molecule type, either DNA or RNA. Most RNA-seq FASTQ files contain complementary DNA (cDNA) sequences, not direct sequencing of the RNA molecules.

rownamesFile, colnamesFile

character(1) or NULL. Row names and/or column names sidecare file. Applies primarily to MatrixMarket Exchange files (e.g. MTXFile).

Details

import() supports automatic loading of common file types, by wrapping popular importer functions. It intentionally designed to be simple, with few arguments. Remote URLs and compressed files are supported. If you need more complex import settings, just call the wrapped importer directly instead.

Value

Varies, depending on the file type (format):

Row and column names

Row names. Row name handling has become an inconsistent mess in R because of differential support in base R, tidyverse, data.table, and Bioconductor. To maintain sanity, import() attempts to handle row names automatically. The function checks for a rowname column in delimited data, and moves these values into the object's row names, if supported by the return type (e.g. data.frame, DataFrame). Note that tbl_df (tibble) and data.table intentionally do not support row names. When returning in this format, no attempt to assign the rowname column into the return object's row names is made. Note that import() is strict about this matching and only checks for a rowname column, similar to the default syntax recommended in tibble::rownames_to_column(). To disable this behavior, set rownames = FALSE, and no attempt will be made to set the row names.

Column names. import() assumes that delimited files always contain column names. If you are working with a file that doesn't contain column names, either set colnames = FALSE or pass the names in as a character vector. It's strongly recommended to always define column names in a supported file type.

Data frame return

By default, import() returns a standard data.frame for delimited/column formatted data. However, any of these desired output formats can be set globally using options(acid.data.frame = "data.frame").

Supported return types:

Note that stringsAsFactors is always disabled for import.

FASTA and FASTQ files (FASTAFile, FASTQFile)

FASTA and FASTQ files are currently managed internally by the Biostrings package. Refer to readDNAStringSet and readRNAStringSet for details. Import of these files will return DNAStringSet or RNAStringSet depending on the input, defined by moleculeType argument.

General feature format (GFF, GTF; RtracklayerFile)

The GFF (General Feature Format) format consists of one line per feature, each containing 9 columns of data, plus optional track definition lines. The GTF (General Transfer Format) is identical to GFF version 2.

basejump exports the specialized makeGRangesFromGFF() function that makes GFF loading simple.

See also:

GSEA gene set files (GMTFile, GMXFile, GRPFile)

Refer to the Broad Institute GSEA wiki for details.

Matrix Market Exchange (MTXFile)

Reading a Matrix Market Exchange file requires ROWNAMES and COLNAMES sidecar files containing the corresponding row and column names of the sparse matrix.

bcbio-nextgen count matrix (BcbioCountsFile)

bcbio count matrix (e.g. generated from featureCounts) and related sidecar files are natively supported.

Denylisted extensions

These file formats are intentionally not supported: DOC, DOCX, PDF, PPT, PPTX.

Note

Updated 2021-09-22.

See Also

Packages:

Import functions:

Examples

1
2
3
4
5
6
7
8
9
file <- system.file("extdata/example.csv", package = "pipette")

## Row and column names enabled.
x <- import(file)
print(head(x))

## Row and column names disabled.
x <- import(file, rownames = FALSE, colnames = FALSE)
print(head(x))

acidgenomics/pipette documentation built on Sept. 27, 2021, 9:10 a.m.