Sorry for all Australian customers, I stopped selling Australia Car Database because it no longer comply with website title “The most updated automobile database“.
Out of the 100+ databases I made, Australia Car Database was the MOST DIFFICULT of all web scraping projects. Keeping updates on-going for Australia required me to pay third-party programmers (some being not trusted) and took too much time and effort compared with other databases I made myself, and given by small population of Australia, effort will never pay off from sales. I rather spend my time increasing update frequency of European and American car databases where sales volume is much higher.
I wrote this page in 2016 as possible future project, inviting people to contact me if are interested in a car database for Australia and suggest websites to scrap data from, other than Redbook because its TOS forbid scraping. I informed visitors that Redbook is selling data so would be ILLEGAL for me to scrap data from a seller and sell on my behalf, and asked them to suggest me other websites to scrap data from, but once few visitors insisted me to scrap from Redbook because is the BEST, I had no other choice to help them… I created Australia car database in June 2017 using Octoparse.com, a FREE scraper slow and buggy, had to run in small batches of 1000-2000 cars and took 2 weeks to complete project, 96 columns, without colors and features which were not scrap-able with Octoparse. I updated every 3-4 months, each update taking 2-3 days.
In February 2019 I worked with an Australian student who made a Python script that took only 2 days to scrap all cars, gave me a BETA .py scraper with errors and Colors and Optional Features missing, which I reported and he fixed afterwards but instead of giving me final scraper, he setup scraper to run and update on monthly basis on his server and the agreement was to give me login info to download CSV. Everything looked perfect, his database included also Colors and Optional Features, but in April 2019 he became uncontactable and server was closed, leaving me and my customers with no updates.
In July 2019 I hired another Python programmer, an indian who impressed me saying that have over 80 databases created in Python, I gave him the BETA scraper from Australian student to fix errors. He turned to be a liar, very inexperienced, he used a stupid way to fix errors in Australian student’s scraper which caused more and more errors, wasting my time for 4 months testing his scraper and report errors, the idiot abused my lack of knowledge of Python + lack of time to check his code carefully, kept demanding extra $ over the price we initially agreed (NO legitimate programmer would charge extra money for fixing his own errors).
In September he offered to sell me his portfolio of 80+ databases for 100 EURO, which I paid, and in October when I was less busy and I checked them, I realized that at least 90% of them were open data available for free download on various sites (he was LYING that created himself). After providing 2 temporary updates with 2019 models in September and November, I started a fresh scrap of all 1960-2019 models in late November and published final version on 17 December 2019, but due to coding typos 11 columns do not have data and car names were wrong initially (fixed in January). I fixed myself all errors in his code, and I was planning to run scraper again all 100,000 cars but did not had this chance anymore because in January 2020 the source website added a CAPTCHA. The indian idiot kept making false promises so I tried to hire other programmers to add a cookie feature in his Python script to bypass captcha, but… got this reply:
Sat, May 30, 8:02 PM
We looked at this in detail, and discoved that redbook are using a sophisticated AI based protection system called datadome, its not just the cookie that it is using to protect the data, its a load of other meterics, like access frequency, coverage and other technical metrics.
You can read about it here: https://datadome.co
We decided after some experiments that it would not viable to bypass this system.
Beside that is nearly impossible to scrap data from, I do not want to break Redbook TOS anymore and risk legal troubles for just few sales per year.