Installation

devtools::install_github("jonotuke/adParser", build_vignettes = TRUE)

This package is to get adverts for dog and cat sells and convert to data frame

Downloading webpages

We start by downloading the adverts. We start by going to the correct webpage. For example:

http://www.gumtree.com.au/s-dogs-puppies/sa/puppy/page-01

We get the total number of pages, then used

library(adParser)
get_ads(n = 1, page = "gumtree", animal = "dog", dir = "~/Desktop/")

The possible page are

and the possible animal are

This saves the adverts in the folder ~/Desktop like so:

gumtree-dog-1-1-2015-10-08.html
gumtree-dog-1-2-2015-10-08.html
gumtree-dog-1-3-2015-10-08.html
gumtree-dog-1-4-2015-10-08.html
gumtree-dog-1-5-2015-10-08.html 

Parsing folder into data frame

Once you have a set of html pages, you can parse these into a data frame using

parse_folder("~/Desktop/", page = "gumtree", animal = "dog", date = "2015")
Source: local data frame [30 x 14]

                                                            title        dob            offered
                                                            (chr)      (chr)              (chr)
1  1 girl left Border collie Cross X Swiss Shepherd Cross Puppies 24/10/2015              Owner
2                                  Wanted: Small/Medium sized dog 17/03/2015              Owner
3                                              Cane Corso puppies 25/09/2015 Registered Breeder
4                                                     Dog trailer 19/11/2015              Owner
5                                                     Staffy pups 17/09/2015              Owner
6                              Urgently seeking a new loving home 01/09/2013              Owner
7                                                         Pig dog 01/02/2011              Owner
8            Chihuahua Longcoat Female Puppy - Registered Breeder 21/10/2015 Registered Breeder
9                             Wanted: Wanted non-shedding lap dog 19/11/2012              Owner
10                                       Give away maltese shitzu 19/06/2013              Owner
..                                                            ...        ...                ...
Variables not shown: price (chr), location (chr), phone (lgl), description (chr), lat (chr), long (chr),
  date_listed (chr), file (chr), page (chr), animal (chr), date (chr)

This can be saved using

gumtree_2015 <- parse_folder("~/Desktop/", page = "gumtree", animal = "dog", date = "2015")


jonotuke/adParser documentation built on May 19, 2019, 8:34 p.m.