This R package aims data collection from popular classifieds website of Turkey -sahibinden.com-
This package intends easily acces classifieds data on sahibinden.com by web scraping and it is designed for personal use. It is users' responsibility to check legal aspects about accesing data and usage. As a reminder, all the services accessed in this package are publicly available on the internet and can be used by anyone.
At the moment, the package gives easy acces to housing classifieds for sale in sahibinden.com.
Other types of classifieds(Cars, Electronic devices, Household goods etc.) as well as rental housing classifieds are not supported.
devtools is your friend:
devtools::install_github(repo = "bhakyuz/sahibinden")
In order to get html files on classifieds listing pages as well as meta data about them:
search_for_sale()
Then obtained HTMLs can be converted into tabular formatted data using parse_classifieds()
For accessing data, main filtering options are location based and location information can be accessed via following functions (Hierarchical from larger geographic ares to smaller ones): * Get list of countries:
get_countries()
# A tibble: 235 x 14
# id name abbreviation language displayOrder sortOrder phoneCode status detail_location… detail_lat detail_lon detail_zoom
# <int> <chr> <chr> <chr> <int> <int> <chr> <chr> <int> <dbl> <dbl> <int>
# 1 1 Türk… TR tr 1 1 +90 ACTIVE 4 39.0 35.2 1
# 2 257 ABD … "" tr 2 2 +1340 ACTIVE 442 18.3 -64.9 1
# 3 61 Afga… "" tr 2 3 +93 ACTIVE 70 33.9 67.7 1
# 4 4 Alma… DE tr 2 4 +49 ACTIVE 6 51.2 10.5 1
# 5 3 Amer… "" tr 2 5 +1 ACTIVE 2 37.1 -95.7 1
# 6 62 Amer… "" tr 2 6 +1684 ACTIVE 72 -14.3 -170. 1
# 7 63 "And… AN tr 2 7 +376 ACTIVE 74 42.5 1.60 1
# 8 64 "Ang… "" tr 2 8 +244 ACTIVE 76 -11.2 17.9 1
# 9 65 "Ang… "" tr 2 9 +1264 ACTIVE 78 18.2 -63.1 1
# 10 66 Anta… "" tr 2 10 +672 ACTIVE 80 -82.9 -135 0
# … with 225 more rows, and 2 more variables: detail_countryId <int>, active <lgl>
get_cities(address_country = 1) # list of cities in Turkey
# A tibble: 83 x 23
# id name tag country_id country_name country_abbrevi… country_language country_display… country_sortOrd… country_phoneCo…
# <int> <chr> <chr> <int> <chr> <chr> <chr> <int> <int> <chr>
# 1 34 İsta… ista… 1 Türkiye TR tr 1 1 +90
# 2 10001 İsta… ista… 1 Türkiye TR tr 1 1 +90
# 3 10002 İsta… ista… 1 Türkiye TR tr 1 1 +90
# 4 6 Anka… anka… 1 Türkiye TR tr 1 1 +90
# 5 35 İzmir izmir 1 Türkiye TR tr 1 1 +90
# 6 1 Adana adana 1 Türkiye TR tr 1 1 +90
# 7 2 Adıy… adiy… 1 Türkiye TR tr 1 1 +90
# 8 3 Afyo… afyo… 1 Türkiye TR tr 1 1 +90
# 9 4 Ağrı agri 1 Türkiye TR tr 1 1 +90
# 10 68 Aksa… aksa… 1 Türkiye TR tr 1 1 +90
# … with 73 more rows, and 13 more variables: country_status <chr>, country_active <lgl>, displayOrder <int>, sortOrder <int>,
# status <chr>, kmlId <int>, detail_location_id <int>, detail_lat <dbl>, detail_lon <dbl>, detail_zoom <int>,
# detail_cityId <int>, active <lgl>, label <chr>
get_towns(address_city = 34) # list of towns in Istanbul, Turkey
# A tibble: 39 x 27
# sortOrder displayOrder kmlId name id status city_sortOrder city_displayOrd… city_kmlId city_tag city_name city_id
# <int> <int> <int> <chr> <int> <chr> <int> <int> <int> <chr> <chr> <int>
# 1 0 0 1103 Adal… 438 ACTIVE 1 1 34 istanbul İstanbul 34
# 2 0 0 2048 Arna… 420 ACTIVE 1 1 34 istanbul İstanbul 34
# 3 0 0 2049 Ataş… 447 ACTIVE 1 1 34 istanbul İstanbul 34
# 4 0 0 2003 Avcı… 429 ACTIVE 1 1 34 istanbul İstanbul 34
# 5 0 0 2004 Bağc… 432 ACTIVE 1 1 34 istanbul İstanbul 34
# 6 0 0 2005 Bahç… 431 ACTIVE 1 1 34 istanbul İstanbul 34
# 7 0 0 1166 Bakı… 416 ACTIVE 1 1 34 istanbul İstanbul 34
# 8 0 0 2050 Başa… 434 ACTIVE 1 1 34 istanbul İstanbul 34
# 9 0 0 1886 Bayr… 417 ACTIVE 1 1 34 istanbul İstanbul 34
# 10 0 0 1183 Beşi… 418 ACTIVE 1 1 34 istanbul İstanbul 34
# … with 29 more rows, and 15 more variables: city_country_abbreviation <chr>, city_country_sortOrder <int>,
# city_country_displayOrder <int>, city_country_phoneCode <chr>, city_country_pinRequired <lgl>, city_country_name <chr>,
# city_country_language <chr>, city_country_id <int>, city_country_status <chr>, city_status <chr>, detail_townId <int>,
# detail_lat <dbl>, detail_lon <dbl>, detail_location_id <int>, detail_zoom <int>
get_districts(address_town = 438) # list of districts in Adalar, Istanbul, Turkey
# A tibble: 9 x 39
# sortOrder displayOrder status name id town_city_kmlId town_city_sortO… town_city_displ… town_city_tag town_city_status
# <int> <int> <chr> <chr> <int> <int> <int> <int> <chr> <chr>
# 1 1 0 ACTIVE Burg… 2096 34 1 1 istanbul ACTIVE
# 2 1 0 ACTIVE Burg… 2096 34 1 1 istanbul ACTIVE
# 3 2 0 ACTIVE Büyü… 2094 34 1 1 istanbul ACTIVE
# 4 2 0 ACTIVE Büyü… 2094 34 1 1 istanbul ACTIVE
# 5 5 0 ACTIVE Heyb… 2095 34 1 1 istanbul ACTIVE
# 6 6 0 ACTIVE Kına… 2097 34 1 1 istanbul ACTIVE
# 7 7 0 ACTIVE Merk… 5427 34 1 1 istanbul ACTIVE
# 8 7 0 ACTIVE Merk… 5427 34 1 1 istanbul ACTIVE
# 9 7 0 ACTIVE Merk… 5427 34 1 1 istanbul ACTIVE
# … with 29 more variables: town_city_name <chr>, town_city_id <int>, town_city_country_abbreviation <chr>,
# town_city_country_phoneCode <chr>, town_city_country_sortOrder <int>, town_city_country_displayOrder <int>,
# town_city_country_pinRequired <lgl>, town_city_country_status <chr>, town_city_country_name <chr>,
# town_city_country_language <chr>, town_city_country_id <int>, town_kmlId <int>, town_sortOrder <int>, town_displayOrder <int>,
# town_status <chr>, town_name <chr>, town_id <int>, quarter_kmlId <int>, quarter_sortOrder <int>, quarter_displayOrder <int>,
# quarter_displayable <lgl>, quarter_status <chr>, quarter_name <chr>, quarter_id <int>, quarter_detail_quarterId <int>,
# quarter_detail_location_id <int>, quarter_detail_zoom <int>, quarter_detail_lat <dbl>, quarter_detail_lon <dbl>
ex <- search_for_sale(address_city = 34)
ex <- search_for_sale(address_city = 34:35)
summary(ex)
# Length Class Mode
# content 2 xml_document list
# url 1 -none- character
# hashed_url 1 -none- character
# next_page_url 1 -none- character
# prev_page_url 1 -none- character
# meta 11 -none- list
ex <- search_for_sale(address_town = 438)
ex <- search_for_sale(address_town = c(438, 420))
ex <- search_for_sale(address_district = 2096)
ex <- search_for_sale(address_district = c(2096, 2094))
ex <- search_for_sale(address_quarter = 60689)
ex <- search_for_sale(address_quarter = c(60689, 22948, 22945))
ex <- search_for_sale(address_city = 34)
parse_classifieds(ex$content)
# x emlak_tipi ilan_basligi m_brut oda_sayisi fiyat ilan_tarihi mahalle x_1 classified_url store_url id
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 "" Daire ALİBEYKÖY MERK… 160 3+1 500.0… 02 Haziran … Alibey… "" https://www.sahib… https://y… 6901…
# 2 "" Daire YAŞAR İNŞAATTA… 65 1+1 150.0… 02 Haziran … Çırçır… "" https://www.sahib… https://y… 6124…
# 3 "" Daire MURAT EMLAK'ta… 120 3+1 365.0… 02 Haziran … Akşems… "" https://www.sahib… https://m… 6901…
# 4 "" Daire **MERT EMLAK**… 95 2+1 385.0… 02 Haziran … Çırçır… "" https://www.sahib… https://m… 6589…
# 5 "" Daire BEYAZNOKTADAN … 105 2+1 325.0… 02 Haziran … Akşems… "" https://www.sahib… https://b… 6973…
# 6 "" Daire METROYA 250M B… 125 3+1 540.0… 02 Haziran … Alibey… "" https://www.sahib… https://t… 6797…
# 7 "" Daire GÜLER EMLAK 'T… 106 2+1 335.0… 02 Haziran … Çırçır… "" https://www.sahib… https://g… 5343…
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.