find_taxoname: Locate and extract taxonomic names from given input files

Description Usage Arguments Value Examples

Description

find_taxoname locates and extracts taxonomic names from txt, docx, pdf or html files and reorganizes the taxonomy names into standard order: genus, species, subspecies, author&year, distribution. The function can output the result to a txt file and each row of the file is one entry of a taxonomic name. The result txt file of this function can be further processed into a tabular format in csv which contains more detailed information using function parse_taxolist.

Usage

1
find_taxoname(filepath, filename, type, encoding = "unknown", output_name = "FALSE")

Arguments

filepath

Required. The path of the file which the data is to be read from. If it does not contain an absolute path, the file name is relative to the current working directory.

filename

Required. The name of the file which the data is to be read from.

type

Required. Currently accept 'txt', 'docx', and 'pdf' format files.

encoding

Optional. The encoding method of the input file. Default value is 'unknown'.

output_name

Required. The path and name of the file for writing. If it does not contain an absolute path, the file name is relative to the current working directory.

Value

A data frame containing the result of finding and reorganizing taxonomic names in the input file into standard format.

A TXT file written from the above data frame and each line of this file contains one entry of taxonomic names.

Examples

1
2
3
4
df <- find_taxoname(filepath = "./Examples/input_data",
                    filename = "taxo01.txt",
                    type = "txt",
                    output_name = "./Examples/output_data/taxo01_output")

qingyuexu/bioparser documentation built on May 19, 2019, 4:13 p.m.