geocoder: Title 'geocoder': a function to geocode tweets by approximate...

Description Usage Arguments Value


The function will first try to find exact matches for the full_names column of the corpus in the full_names column of the geoNames_output_file, which will automatically be enriched with a database of geocoded Twitter locations from an earlier project. For the records that do not provide an exact match, it will then perform approximate string matching based on Levenhstein distance. The first string in the GeoNames_output_file full_names column to match with a distance of less than maxDistance will be returned.This is done in multithreaded C++ code, so it should be reasonably fast even for larger vectors. Matching the one million strings with one million candidates takes about thirty minutes on my MacBook Pro.


geocoder(filtered_corpus, GeoNames_output_file, maxDistance = 2,
  nthreads = parallel::detectCores())



Output of searchCorpus


csv file produced with the function GeoNames()


Maximum Levenhstein distance to use for approximate string matching. Defaults to 2 (i.e., max 2 deletions/insertions from input string to output string)


Number of threads to use for the approximate string matching. Defaults to the number of CPUs available on your machine.


data.frame lat, lon columns filled in based on the geoNames_output_file

jeroenclaes/tweetCorp documentation built on May 27, 2019, 4:50 a.m.