get.delim: Determine the delimiter for a text data file.

Description Usage Arguments Value Author(s) See Also Examples

View source: R/reader.R

Description

Reads the first few lines of data in a text file and attempts to infer what delimiter is in use, based on the 'delims' argument that would result in the most consistent number of columns in the first 'n' lines of data. Searches preferentially for delimiters implying between 2 and 'large' columns, then for >large, and lastly for 1 column if nothing else gives a match.

Usage

1
2
get.delim(fn, n = 10, comment = "#", skip = 0, delims = c("\t",
  "\t| +", " ", ";", ","), large = 10, one.byte = TRUE)

Arguments

fn

name of the file to parse

n

the number of lines to read to make the inference

comment

a comment symbol to ignore lines in files

skip

number of lines to skip at top of file before processing

delims

the set of delimiters to test for

large

search initially for delimiters that imply more than 1, and less than this 'large' columns; if none in this range, look next at >large.

one.byte

only check for one-byte delimiters, [e.g, whitespace regular expr is >1 byte]

Value

returns character of the most likely delimiter

Author(s)

Nicholas Cooper nick.cooper@cimr.cam.ac.uk

See Also

reader

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
orig.dir <- getwd(); setwd(tempdir()); # move to temporary dir
df <- data.frame(ID=paste("ID",101:110,sep=""),
  scores=sample(70,10,TRUE)+30,age=sample(7,10,TRUE)+11)
# save data to various file formats
test.files <- c("temp.txt","temp2.txt","temp3.csv")
write.table(df,file=test.files[1],col.names=FALSE,row.names=FALSE,sep="|",quote=TRUE)
write.table(df,file=test.files[2],col.names=TRUE,row.names=TRUE,sep="\t",quote=FALSE)
write.csv(df,file=test.files[3])
# report the delimiters
for (cc in 1:length(test.files)) { 
  cat("\n",test.files[cc],": ")
  print(get.delim(test.files[cc])) }
unlink(test.files)
setwd(orig.dir) # reset working dir to original

Example output

Loading required package: NCmisc

Attaching package: 'reader'

The following objects are masked from 'package:NCmisc':

    cat.path, get.ext, rmv.ext


 temp.txt : [1] NA

 temp2.txt : [1] "\t"

 temp3.csv : [1] ","
Warning message:
In get.delim(test.files[cc]) : not a delimited file, probably a vector file

reader documentation built on May 2, 2019, 9:27 a.m.