parse_taxolist: Parse and extract taxonomic names from txt files

Description Usage Arguments Value Examples

Description

parse_taxolist reads and parses all text lines from a file which contains taxonomic names, authors and distribution in each row and writes the tabular output to a csv file automatically or based on the configuration {config

Usage

1
parse_taxolist(input_file, output_file, location_detail, language, evaluation, config)

Arguments

input_file

Required. The path and name of the file which the data is to be read from. If it does not contain an absolute path, the file name is relative to the current working directory.

output_file

Required. The path and name of the file for writing. If it does not contain an absolute path, the file name is relative to the current working directory.

location_detail

Optional.A logical value indicating whether the detailed information including longitude, latitude and detail location names of distributions is to be exported. Defaults to TRUE.

language

Optional.The language of detailed distribution information which must be one of "English" or "Chinese". Defaults to "English".

evaluation

Optional. A logical value indicating whether the evaluation of the parsing result is to be exported. Defaults to TRUE.

config

Optional. If it is not specified by users, the output will be generated automatically based on the structure of input texts. If it is indicated explicitly by users, the function will parse the input texts based on the rules specified in the config. Some examples of config are provided in the "Examples" part. Note that: Author_year should be regarded as a whole part; The separator between author_year part and distribution part should be stated clearly; If '\n' exsits, it can only appear right after the genus part.

Value

A data frame containing the result of parsing taxonomic names in the input file and detailed distribution information about species. For those taxonomic names which have more than one distribution, if location_detail is TRUE, each row in the data frame will only contain one distribution. If location_detail is FALSE, all distributions for a single species will be written in one row.

A CSV file written from the above data frame.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# example1:
parse_taxolist(input_file = "./Examples/input_data/test_example.txt",
               output_file = "./Examples/output_data/test_example_output.csv",
               location_detail = TRUE,
               language = "English",
               evaluation = TRUE,
               config = "")

input example:
Charmus indicus Chiang, Chen & Zhang 1996.Distribution: Andhra Pradesh, Kerala,Pondicherry and Tamil Nadu.
Isometrus maculatus De Geer, 1778.Distribution: Kerala, Andhra Pradesh, Madhya Pradesh, Karnataka,Maharashtra, Meghalaya and Tamil Nadu.
Lychas hendersoni Pocock, 1897) Distribution: Kerala and Tamil Nadu.


# example2:
parse_taxolist(input_file = "./Examples/input_data/test_example_config_1.txt",
               output_file = "./Examples/output_data/test_example_output_config_1.csv",
               location_detail = FALSE,
               language = "English",
               evaluation = TRUE,
               config = "Genus, Species, Author_Year, 'Distribution:', distribution")

input example:
Pachliopta pandiyana Moore, 1881 Distribution: Goa, Karnataka, Kerala


# example3:
parse_taxolist(input_file = "./Examples/input_data/test_example_config_2.txt",
               output_file = "./Examples/output_data/test_example_output_config_2.csv",
               location_detail = FALSE,
               language = "English",
               evaluation = FALSE,
               config = "Genus, Species, Author_Year, ':', distribution")

input example:
Pachliopta pandiyana Moore, 1881 : Goa, Karnataka, Kerala


# example4:
parse_taxolist(input_file = "./Examples/input_data/test_example_config_3.txt",
               output_file = "./Examples/output_data/test_example_output_config_3.csv",
               location_detail = TRUE,
               language ="English",
               evaluation = FALSE,
               config = "Genus, '\n', Species, Author_Year, ':',distribution")

input example:
Pachliopta
pandiyana Moore, 1881 : Goa, Karnataka, Kerala
aristolochiae Fabricius, 1775  : Meghalaya, Paschimbanga, Kerala, Karnataka, Arunachal Pradesh, Telangana, Andhra Pradesh, Maharashtra, Gujarat, Odisha, Chhattisgarh

qingyuexu/bioparser documentation built on May 19, 2019, 4:13 p.m.