README.md

sahibinden

This R package aims data collection from popular classifieds website of Turkey -sahibinden.com-

Notice on accesing publicly available data

This package intends easily acces classifieds data on sahibinden.com by web scraping and it is designed for personal use. It is users' responsibility to check legal aspects about accesing data and usage. As a reminder, all the services accessed in this package are publicly available on the internet and can be used by anyone.

What is included ?

At the moment, the package gives easy acces to housing classifieds for sale in sahibinden.com.

What is not included ?

Other types of classifieds(Cars, Electronic devices, Household goods etc.) as well as rental housing classifieds are not supported.

Installation

devtools is your friend: devtools::install_github(repo = "bhakyuz/sahibinden")

Usage

Get classifieds

In order to get html files on classifieds listing pages as well as meta data about them: search_for_sale() Then obtained HTMLs can be converted into tabular formatted data using parse_classifieds()

Get localities for filtering

For accessing data, main filtering options are location based and location information can be accessed via following functions (Hierarchical from larger geographic ares to smaller ones): * Get list of countries:

get_countries()
# A tibble: 235 x 14
#       id name  abbreviation language displayOrder sortOrder phoneCode status detail_location… detail_lat detail_lon detail_zoom
#    <int> <chr> <chr>        <chr>           <int>     <int> <chr>     <chr>             <int>      <dbl>      <dbl>       <int>
#  1     1 Türk… TR           tr                  1         1 +90       ACTIVE                4       39.0      35.2            1
#  2   257 ABD … ""           tr                  2         2 +1340     ACTIVE              442       18.3     -64.9            1
#  3    61 Afga… ""           tr                  2         3 +93       ACTIVE               70       33.9      67.7            1
#  4     4 Alma… DE           tr                  2         4 +49       ACTIVE                6       51.2      10.5            1
#  5     3 Amer… ""           tr                  2         5 +1        ACTIVE                2       37.1     -95.7            1
#  6    62 Amer… ""           tr                  2         6 +1684     ACTIVE               72      -14.3    -170.             1
#  7    63 "And… AN           tr                  2         7 +376      ACTIVE               74       42.5       1.60           1
#  8    64 "Ang… ""           tr                  2         8 +244      ACTIVE               76      -11.2      17.9            1
#  9    65 "Ang… ""           tr                  2         9 +1264     ACTIVE               78       18.2     -63.1            1
# 10    66 Anta… ""           tr                  2        10 +672      ACTIVE               80      -82.9    -135              0
# … with 225 more rows, and 2 more variables: detail_countryId <int>, active <lgl>
get_cities(address_country = 1) # list of cities in Turkey 
# A tibble: 83 x 23
#       id name  tag   country_id country_name country_abbrevi… country_language country_display… country_sortOrd… country_phoneCo…
#    <int> <chr> <chr>      <int> <chr>        <chr>            <chr>                       <int>            <int> <chr>           
#  1    34 İsta… ista…          1 Türkiye      TR               tr                              1                1 +90             
#  2 10001 İsta… ista…          1 Türkiye      TR               tr                              1                1 +90             
#  3 10002 İsta… ista…          1 Türkiye      TR               tr                              1                1 +90             
#  4     6 Anka… anka…          1 Türkiye      TR               tr                              1                1 +90             
#  5    35 İzmir izmir          1 Türkiye      TR               tr                              1                1 +90             
#  6     1 Adana adana          1 Türkiye      TR               tr                              1                1 +90             
#  7     2 Adıy… adiy…          1 Türkiye      TR               tr                              1                1 +90             
#  8     3 Afyo… afyo…          1 Türkiye      TR               tr                              1                1 +90             
#  9     4 Ağrı  agri           1 Türkiye      TR               tr                              1                1 +90             
# 10    68 Aksa… aksa…          1 Türkiye      TR               tr                              1                1 +90             
# … with 73 more rows, and 13 more variables: country_status <chr>, country_active <lgl>, displayOrder <int>, sortOrder <int>,
#   status <chr>, kmlId <int>, detail_location_id <int>, detail_lat <dbl>, detail_lon <dbl>, detail_zoom <int>,
#   detail_cityId <int>, active <lgl>, label <chr>
get_towns(address_city = 34) # list of towns in Istanbul, Turkey
# A tibble: 39 x 27
#    sortOrder displayOrder kmlId name     id status city_sortOrder city_displayOrd… city_kmlId city_tag city_name city_id
#        <int>        <int> <int> <chr> <int> <chr>           <int>            <int>      <int> <chr>    <chr>       <int>
#  1         0            0  1103 Adal…   438 ACTIVE              1                1         34 istanbul İstanbul       34
#  2         0            0  2048 Arna…   420 ACTIVE              1                1         34 istanbul İstanbul       34
#  3         0            0  2049 Ataş…   447 ACTIVE              1                1         34 istanbul İstanbul       34
#  4         0            0  2003 Avcı…   429 ACTIVE              1                1         34 istanbul İstanbul       34
#  5         0            0  2004 Bağc…   432 ACTIVE              1                1         34 istanbul İstanbul       34
#  6         0            0  2005 Bahç…   431 ACTIVE              1                1         34 istanbul İstanbul       34
#  7         0            0  1166 Bakı…   416 ACTIVE              1                1         34 istanbul İstanbul       34
#  8         0            0  2050 Başa…   434 ACTIVE              1                1         34 istanbul İstanbul       34
#  9         0            0  1886 Bayr…   417 ACTIVE              1                1         34 istanbul İstanbul       34
# 10         0            0  1183 Beşi…   418 ACTIVE              1                1         34 istanbul İstanbul       34
# … with 29 more rows, and 15 more variables: city_country_abbreviation <chr>, city_country_sortOrder <int>,
#   city_country_displayOrder <int>, city_country_phoneCode <chr>, city_country_pinRequired <lgl>, city_country_name <chr>,
#   city_country_language <chr>, city_country_id <int>, city_country_status <chr>, city_status <chr>, detail_townId <int>,
#   detail_lat <dbl>, detail_lon <dbl>, detail_location_id <int>, detail_zoom <int>
get_districts(address_town = 438) # list of districts in Adalar, Istanbul, Turkey
# A tibble: 9 x 39
#   sortOrder displayOrder status name     id town_city_kmlId town_city_sortO… town_city_displ… town_city_tag town_city_status
#       <int>        <int> <chr>  <chr> <int>           <int>            <int>            <int> <chr>         <chr>           
# 1         1            0 ACTIVE Burg…  2096              34                1                1 istanbul      ACTIVE          
# 2         1            0 ACTIVE Burg…  2096              34                1                1 istanbul      ACTIVE          
# 3         2            0 ACTIVE Büyü…  2094              34                1                1 istanbul      ACTIVE          
# 4         2            0 ACTIVE Büyü…  2094              34                1                1 istanbul      ACTIVE          
# 5         5            0 ACTIVE Heyb…  2095              34                1                1 istanbul      ACTIVE          
# 6         6            0 ACTIVE Kına…  2097              34                1                1 istanbul      ACTIVE          
# 7         7            0 ACTIVE Merk…  5427              34                1                1 istanbul      ACTIVE          
# 8         7            0 ACTIVE Merk…  5427              34                1                1 istanbul      ACTIVE          
# 9         7            0 ACTIVE Merk…  5427              34                1                1 istanbul      ACTIVE          
# … with 29 more variables: town_city_name <chr>, town_city_id <int>, town_city_country_abbreviation <chr>,
#   town_city_country_phoneCode <chr>, town_city_country_sortOrder <int>, town_city_country_displayOrder <int>,
#   town_city_country_pinRequired <lgl>, town_city_country_status <chr>, town_city_country_name <chr>,
#   town_city_country_language <chr>, town_city_country_id <int>, town_kmlId <int>, town_sortOrder <int>, town_displayOrder <int>,
#   town_status <chr>, town_name <chr>, town_id <int>, quarter_kmlId <int>, quarter_sortOrder <int>, quarter_displayOrder <int>,
#   quarter_displayable <lgl>, quarter_status <chr>, quarter_name <chr>, quarter_id <int>, quarter_detail_quarterId <int>,
#   quarter_detail_location_id <int>, quarter_detail_zoom <int>, quarter_detail_lat <dbl>, quarter_detail_lon <dbl>

Examples

ex <- search_for_sale(address_city = 34)
ex <- search_for_sale(address_city = 34:35)
summary(ex)
#               Length Class        Mode     
# content        2     xml_document list     
# url            1     -none-       character
# hashed_url     1     -none-       character
# next_page_url  1     -none-       character
# prev_page_url  1     -none-       character
# meta          11     -none-       list    
ex <- search_for_sale(address_town = 438)
ex <- search_for_sale(address_town = c(438, 420))
ex <- search_for_sale(address_district = 2096)
ex <- search_for_sale(address_district = c(2096, 2094))
ex <- search_for_sale(address_quarter = 60689)
ex <- search_for_sale(address_quarter = c(60689, 22948, 22945))
ex <- search_for_sale(address_city = 34)
parse_classifieds(ex$content)
#    x     emlak_tipi ilan_basligi    m_brut oda_sayisi fiyat  ilan_tarihi  mahalle x_1   classified_url     store_url  id   
#    <chr> <chr>      <chr>           <chr>  <chr>      <chr>  <chr>        <chr>   <chr> <chr>              <chr>      <chr>
#  1 ""    Daire      ALİBEYKÖY MERK… 160    3+1        500.0… 02 Haziran … Alibey… ""    https://www.sahib… https://y… 6901…
#  2 ""    Daire      YAŞAR İNŞAATTA… 65     1+1        150.0… 02 Haziran … Çırçır… ""    https://www.sahib… https://y… 6124…
#  3 ""    Daire      MURAT EMLAK'ta… 120    3+1        365.0… 02 Haziran … Akşems… ""    https://www.sahib… https://m… 6901…
#  4 ""    Daire      **MERT EMLAK**… 95     2+1        385.0… 02 Haziran … Çırçır… ""    https://www.sahib… https://m… 6589…
#  5 ""    Daire      BEYAZNOKTADAN … 105    2+1        325.0… 02 Haziran … Akşems… ""    https://www.sahib… https://b… 6973…
#  6 ""    Daire      METROYA 250M B… 125    3+1        540.0… 02 Haziran … Alibey… ""    https://www.sahib… https://t… 6797…
#  7 ""    Daire      GÜLER EMLAK 'T… 106    2+1        335.0… 02 Haziran … Çırçır… ""    https://www.sahib… https://g… 5343…

Next Steps



bhakyuz/sahibinden documentation built on June 12, 2019, 2:28 p.m.