reader-package: Suite of Functions to Flexibly Read Data from Files
In reader: Suite of Functions to Flexibly Read Data from Files

Description Details Author(s) See Also Examples

A set of functions to simplify reading data from files. The main function, reader(), should read most common R datafile types without needing any parameters except the filename. Other functions provide simple ways of handling file paths and extensions, and automatically detecting file format and structure.

Package:	reader
Type:	Package
Version:	1.0.6
Date:	2016-12-29
License:	GPL (>= 2)

The reader() function, for which the package is named, should be able to read most of the common types of datafiles used in R without needing any arguments other than the filename. The structure, header, file-format and delimiter are determined automatically. Usually no extra parameters are needed. Other functions provide similarly flexibility to run contigent on data type and file format, or can look for an input file in multiple directory locations. The function cat.path() provides a simple interface to construct file paths using directories, suffixes, prefixes and file extension. Functions in this package can be nested inside new functions, providing flexible parameter format, without having to use multiple if-statements to cope with contigencies. Supported types included delimited text files, R binary files, big.matrix files, text list files, and unstructured text. Note that the file type that will be attempted to read in is initially determine by the file extension, using the function: 'classify.ext()'.

List of key functions:

cat.path Simple and foolproof way to create full-path file names.
classify.ext Classify file types readable by standard R I/O functions.
column.salvage Change column name in different form to desired form.
file.ncol Find the number of columns (lines) in a file.
file.nrow Find the number of rows (lines) in a file.
find.id.col Find which column in a dataframe contains a specified set of values.
shift.rownames Shift the first column of a dataframe to rownames()
force.frame returns a dataframe if 'unknown.data' can in anyway relate to such
force.vec returns a vector if 'unknown.data' can in anyway relate to such
get.delim Determine the delimiter for a text data file.
get.ext Get the file extension from a file-name.
is.file Test whether a file exists in a target directory.
make.fixed.width Convert a matrix or dataframe to fixed-width.
n.readLines Read 'n' lines (ignoring comments and header) from a file.
parse.args Function to collect arguments when running R from the command line.
reader Flexibly load from a text or binary file, accepts multiple file formats.
rmv.ext Remove the file extension from a file-name.
find.file Construct a path to a file, where multiple directories can be searched to find an existing file.

Nicholas Cooper

Maintainer: Nicholas Cooper <njcooper@gmx.co.uk>

NCmisc ~~

mydir <- "/Documents"
cat.path(mydir,"temp.doc","NEW",suf=5)
## example for the reader() function ##
df <- data.frame(ID=paste("ID",101:110,sep=""),
                 scores=sample(70,10,TRUE)+30,age=sample(7,10,TRUE)+11)
test.files <- c("temp.txt","temp2.csv","temp3.rda")
write.table(df,file=test.files[1],col.names=TRUE,row.names=TRUE,sep="\t",quote=FALSE)
# file.nrow and file.ncol examples
file.nrow(test.files[1])
file.ncol(test.files[1])
write.csv(df,file=test.files[2])
save(df,file=test.files[3])
# use the same simple reader() function call to read in each file type
for(cc in 1:length(test.files)) {
    cat(test.files[cc],"\n")
    myobj <- reader(test.files[cc])  # add 'quiet=F' to see some working
    print(myobj); cat("\n\n")
}
# inspect files before deleting if desired:
#  unlink(test.files)
#
# find id column in data frame
new.frame <- data.frame(day=c("M","T","W"),time=c(9,12,3),staff=c("Mary","Jane","John"))
staff.ids <- c("Mark","Jane","John","Andrew","Sally","Mary")
new.frame; find.id.col(new.frame,staff.ids)

Loading required package: NCmisc

Attaching package: 'reader'

The following objects are masked from 'package:NCmisc':

    cat.path, get.ext, rmv.ext

[1] "/Documents/NEWtemp.doc5"
temp.txt 
      11 
[1] 4
temp.txt 
      ID scores age
1  ID101     42  13
2  ID102     77  17
3  ID103     93  14
4  ID104     99  14
5  ID105     59  18
6  ID106     67  15
7  ID107     69  15
8  ID108     65  17
9  ID109     66  15
10 ID110     65  15


temp2.csv 
      ID scores age
1  ID101     42  13
2  ID102     77  17
3  ID103     93  14
4  ID104     99  14
5  ID105     59  18
6  ID106     67  15
7  ID107     69  15
8  ID108     65  17
9  ID109     66  15
10 ID110     65  15


temp3.rda 
      ID scores age
1  ID101     42  13
2  ID102     77  17
3  ID103     93  14
4  ID104     99  14
5  ID105     59  18
6  ID106     67  15
7  ID107     69  15
8  ID108     65  17
9  ID109     66  15
10 ID110     65  15


  day time staff
1   M    9  Mary
2   T   12  Jane
3   W    3  John
$col
[1] 3

$maxpc
[1] 0.5

$index
[1] NA  2  3 NA NA  1

$result
[1] <NA> Jane John <NA> <NA> Mary
Levels: Jane John Mary