getspecies: parseSpeciesList is a prototype function to parse a species...

Description Usage Arguments Author(s) References Examples

Description

parseSpeciesList is a first prototype to parse beetle species lists as provided by the enthusiasts (coleopterists) of the beetle community.

This is a very raw and simple approach and due to parsing the text line by line not really in R style.

Unfortunately it has to be performed line by line because some keywords are missing and the rules are not always matching. So in a first try we use the "family", "genus" and "subgenus" as keywords. They are always placed in the beginning of a line. After "genus" or "subgenus" there is a unique line for each single species. In the species textline we will find a more or less systematic list of country codes that indicate all countries with known occurrence of this special species.

The resulting dataframe is a not normalized relation ( so it means a huge table with mostly redundant informations).

It looks like:

familiy; genus; subgenus; species; loctype; country
Carabidae; Carabus; Carabinae; irregularis; A:; GE
Carabidae; Carabus; Carabinae; irregularis; N:; CZ
. . .

Usage

1
getspecies(inputFile, short = TRUE)

Arguments

short

logical parameter if TRUE (default) the function trys to get only the names and country codes. If FALSE the full text will put in the data frame.

inputTXT

a Text of the specified format

Author(s)

Chris Reudenbach, Flo Detsch

References

Löbl, I. & A. Smetana (eds): Catalogue of Palaearctic Coleoptera: http://www.apollobooks.com/palaearcticcoleoptera.htm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
 ### examples parseSpeciesList ###

 ### we need the stringr lib
 library(stringr)
 library(foreach)

 ### first the basic parsing
 inputFile <- system.file("extdata", "species.chunk",   package="parseSpeciesList")
 df <- getspecies(inputFile)

 ### all entries only for CZ
 cz<- subset(df, df$loc =='CZ')

 ### all entries for porculus
 porculus<- subset(df, (df$species =="porculus"))

 ######################################
 ###  now a basic mapping example  ####

 ###  we need some more libs ;)
   if (!require(devtools)) {install.packages("devtools")} # for installation from github
   if (!require(maptools)) {install.packages("maptools")} # for read shapes
   if (!require(sp)) {install.packages("sp")}             # for manipulationg spatial data sp objects
   library(devtools)
   library(maptools)
   library(sp)
   if (!require(mapview)) {install_github("environmentalinformatics-marburg/mapview")}
   library(mapview)  # for modern mapping

   ###  load prepared mapdata (source: http://thematicmapping.org/downloads/TM_WORLD_BORDERS-0.3.zip)
   load("data/world.Rdata")

   ### now all findings of porculus (whatever it is ;))
   porculus<- subset(df, (df$species =="porculus"))

   ### join the world countries to our data
   ### (iso2 fit most but there is no Code for the regions)
   joinSpdf <- joinData2Map(
   porculus
     , nameMap = sPDF
     , nameJoinIDMap = "ISO2"
     , nameJoinColumnData = "loc")

   #### no we have to  project it
   proj4string(joinSpdf) <- CRS("+init=epsg:4326")

   ### plot it with e.g. mapview (and have some colors and interactivity)
   mapView(joinSpdf,zcol="species")

gisma/parseSpeciesList documentation built on May 17, 2019, 5:27 a.m.