functions: Data as Text Arrays

Description Usage Arguments Details Value Author(s) Examples

Description

This is a set of functions that help in the process of developing Vegsoup objects as defined in the vegsoup package from scanned vegetation tables.

The functions operate on CSV files, most likely output from OCR software, or on text arrays, themselves possibly developed from OCR transcripts. Text arrays, in this context, are tabular respectively matrix like plain text files. Rows (lines) give species. Positions along each line, the columns, give the plots respectively the abundance/presence of a species in a plot. As a consequence, in a file corresponding to this type of text array each species with it's abundances in plots consist of a single line. For a particular plot all the abundances for the therein occurring species have width of one letter and each plot aligns vertically. There might also be a header part containing information on the sampling units, not the species. At least the header includes the plot names.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
csv2txt(x, header.rows, merge.cols, sep = ";", width = 1, vertical = TRUE,
        collapse = " ", overwrite = FALSE, ...)

extractTaxon(x, col, row, blank = "blank")

replaceTaxon(x, y, z, schema = c("taxon", "abbr"), col, row = NULL,
             file, keywords = FALSE, overwrite = FALSE)

splitArray(x, col, row = NULL, blank = "blank")

trimTaxon(x, pad = "@hl", sep = ";")

Arguments

x

character. Text array with species abundances. (See ‘Details’).

y

data.frame. Two column matrix of taxon matches. See linktaxa. First column is giving the source, second column the matched taxon.

z

data.frame. Taxonomic reference list, must hold literal column "abbr".

row, col

integer. Row (lines) and column (position along a line) positions of a text array (cursor coordinates). col defines the width of characters (nchar) for splitting horizontally, row sets the vertical split.

header.rows

integer. Like above, the number of rows the header part claims in the table. Note, if the table has no header data, specify header.rows = 0.

merge.cols

integer. Merge two or more columns into one (experimental!).

width

integer. Additional space to adjust column width.

vertical

logical. Format header horizontal, then must define width.

collapse

character. Currently not used.

schema

character. Column names to queried from argument z.

keywords

logical. If file is given and keywords = TRUE wrap header and taxa block into paired keywords (BEGIN HEAD, END HEAD, BEGIN TABLE, END TABLE).

blank

character. Value to be filled in instead of NAs.

sep

character. Separator for read.csv.

pad

character. Padding string to species names.

file

logical. Output file name.

overwrite

logical. Overwrite existing files.

...

arguments passed to read.csv.

Details

Give summary.

splitArray

splits taxa and abundance part of text arrays and returns them as lists. The functions demands an argument col. If also the row argument is given, splitArray will return a list of lists. First the header part, then the taxa block. If only col is supply, and it must, the functions splits the text array into first species and second abundances part, left and right blocks of columns in the array respectively. (see also argument n to functions extractTaxon read csv2txt.

trimTaxon

trims and pads strings of species names as part of a csv file. The function selects by default the first column.

csv2txt

takes a csv file and transform it to a text array. Note, file encoding might be an issue. At least it is for german umlaut. Lines with non ASCII characters will likely have wrong indentation. There is no handle to cope with decimals in the header for now. Fortunately, they are rarely present in printed sources.

replaceTaxon

replaces species name of source with an abbr(eviation) queried against a reference list. In order to accomplish this, three objects need to be supplied (see ‘Arguments’). If keywords = TRUE the functions places the keywords searched by read.verbatim of package vegsoup.

extractTaxon

extracts species from text array. The function will remove keywords (see ‘Argumnets’) if present.

Value

Depending on the function.

Author(s)

Roland Kaiser

Examples

1
2
3
# Nothing here yet
#	file <- file.path(path, "foo.txt")
#	extractTaxon(readLines(file), 30, 10)

kardinal-eros/vegit documentation built on Feb. 16, 2020, 9:20 p.m.