Sorry for all Australian customers, I abandoned Australia Car Database to focus on increasing update frequency of European and American car databases that I can update myself and have higher sales volume. Reasons:
1. Australia Car Database was by far, the most difficult database I ever made, after few customers requested me to scrap data from Redbook, a website with anti-scraping technology so I had to pay third-party programmers (some being not trusted, gone away leaving me without updates), and given by small population of Australia, my efforts never paid off from sales, then in January 2020 Redbook added CAPTCHA that made data scraping impossible.
2. In July 2020 been contacted by a Redbook lawyer asking me to cease & desist selling Australia. Redbook TOS forbid scraping data:
Personal and Non Commercial Use Only
(a) Use of the RedBook Website is for your personal and non-commercial use only. Except for the Material held in your computers cache or a single permanent copy of the Material for your personal use you must not without our prior written approval:
– use any automated process of any sort to query, access, retrieve, scrape, data-mine or copy any Material on the RedBook Website or generate or compile any document, index or database based on the Material published on the RedBook Website;
If you want to use Redbook data, you must contact Redbook for a quote (will cost you 4 digits of $ depending by your company size and amount of data required). Many startup companies unable to afford Redbook price tried to pay few hundreds $ to me or other freelancers to code a scraper to get data in Excel format. I did not wanted to steal customers from Redbook, I wanted to help startup companies who were anyway not able to afford buying from Redbook (when they grow they could become Redbook direct customers and get realtime updates). However this turned too risky, Redbook can make legal actions against anyone scraping data illegally.
If you want an Australia car database from me, we need find a website that I can legally scrap data from (other than Redbook, Carsales, Carsguide – all use same Redbook API thus cannot be scraped for both technical and legal reasons). Any suggestions?
I wrote this page in 2016 as possible future project, inviting people to contact me if are interested in a car database for Australia and suggest websites to scrap data from, other than Redbook because its TOS forbid scraping. I informed visitors that Redbook is selling data so would be ILLEGAL for me to scrap data from a seller and sell on my behalf, and asked them to suggest me other websites to scrap data from, but once few visitors insisted me to scrap from Redbook because is the BEST, I had no other choice to help them…
I created Australia car database in June 2017 using Octoparse.com, a FREE scraper slow and buggy, had to run in small batches of 1000-2000 cars and took 2 weeks to complete project. I updated every 3-4 months, scraping only last 2 years of models, each update taking 2-3 days.
In February 2019 I worked with an Australian student who made a Python script that took only 2 days to scrap all cars, gave me a .py scraper with errors, which I reported and he fixed afterwards but instead of giving me final scraper, he setup scraper to run on his server (monthly schedule) and gave me login info to download CSV. Everything looked perfect, but in April 2019 he gone away, never replied emails, never signed in Skype, and server was closed, leaving me and my customers with no updates.
In July 2019 I hired a Python programmer from India who turned to be a scam artist, lied me that has made over 80 databases using Python, which turned to be free/open data downloaded from internet. He spent 4 months “working” and kept asking me payments, but never gave me an error-free script. I started scraping with his script in November 2019, it took 20 days to finish so only in December 2019 I was able to deliver to customers an update that still had errors.
In January 2020 Redbook added a CAPTCHA that appear every time I access website using a new browser, blocking my scraper. The indian, instead of apologizing for his errors and try to do better job next time, refuse to help me because I blamed his inexperience for errors he made in 2019. I tried to ask help from 4 programmers to add a cookie feature in his Python script to bypass captcha but nobody was able to succeed. One sent me this reply:
Sat, May 30, 8:02 PM
We looked at this in detail, and discoved that redbook are using a sophisticated AI based protection system called datadome, its not just the cookie that it is using to protect the data, its a load of other meterics, like access frequency, coverage and other technical metrics.
You can read about it here: https://datadome.co
We decided after some experiments that it would not viable to bypass this system.