Out of the 100+ databases I made, Australia Car Database was the MOST DIFFICULT of all web scraping projects. Last release: 17 December 2019.
BIG database – download SAMPLES:
Australia Car Database SAMPLE 2019 edition (116 columns)
Australia Car Database SAMPLE original 2017 edition (96 columns)
Buy complete database + FREE updates for 1 year:
Small database – download SAMPLE:
Australia Make-Model-Year SAMPLE.xls
Buy complete database + FREE updates for 1 year:
Contact me for custom packages! For example you can ask for model naming, engine power and torque, wheels and tire dimensions. 1990-2019, 2005-2019 or whatever do you need. Price will be 0.3 eurocents / model.
Description & notes
First time made in June 2017 using Octoparse.com, a FREE scraper slow and buggy, had to run in small batches of 1000-2000 cars and took 2 weeks to complete project, 96 columns, without colors and features which were not scrap-able with Octoparse.
In mid-2017 source website removed META tags used by me to obtain Make and Model name separately from full car name, also removed a column highly appreciated by my customers: VIN. Due to this reason it was not worth my effort to re-scrap all cars, I was updating database every 3 months by scraping only last year of cars, manually splitting Make and Model, leaving older cars unchanged. I removed “trade-in prices” columns because it was variable over time.
In February 2019 I worked with an Australian student who made a Python script that took only 2 days to scrap all cars, gave me a BETA .py scraper with errors and Colors and Optional Features missing, which I reported and he fixed afterwards but instead of giving me final scraper, he setup scraper to run and update on monthly basis on his server and the agreement was to give me login info to download CSV. Everything looked perfect, his database included also Colors and Optional Features, but in April 2019 he became uncontactable and server was closed, leaving me and my customers with no updates.
In July 2019 I hired another Python programmer, an indian who impressed me saying that have over 80 databases created in Python, I gave him the BETA scraper from Australian student to fix errors. He turned to be a liar, very inexperienced, he used a stupid way to fix errors in Australian student’s scraper which caused more and more errors, wasting my time for 4 months testing his scraper and report errors, the idiot abused my lack of knowledge of Python + lack of time to check his code carefully, kept demanding extra $ over the price we initially agreed (NO legitimate programmer would charge extra money for fixing his own errors).
In September he offered to sell me his portfolio of 80+ databases for 100 EURO, which I paid, and in October when I was less busy and I checked them, I realized that at least 90% of them were open data available for free download on various sites (he was LYING that created himself). After providing 2 temporary updates with 2019 models in September and November, I started a fresh scrap of all 1960-2019 models in late November and published final version on 17 December 2019.
By buying Australia Full Specs database you will get 3 versions:
December 2019 version made by idiot indian which due to his unfixed typos 11 columns do not have data (is about columns with rare data available for less than 5% of cars), reason for which I offer also March 2019 update made by Australian student with all columns filled up. I also offer 1st version from June 2017 that include VIN column.
Shortly after Christmas when I was less busy, I checked his Python code in detail and spent 5 hours myself to fix all his errors that he did not fixed in 4 months (even if I never coded in Python before), the scraper is now filling up all columns except Colors and Optional Equipment (Indian programmer want another $100 for this and promise doing in 3 days. Are you fucking serious? Any skilled programmer wouldn’t spend more than 1 hour on this! I already waited for 4 months). I was planning to run scraper again all 100,000 cars when I find a skilled programmer to help me with codes for Colors and Optional Equipment, BUT…
In January 2020 the source website added a CAPTCHA that appear every time I access website using a new browser. The indian idiot, instead of apologizing for his errors and try to do better job next time, refuse to help me because I blamed his experience for errors done in 2019. I tried to ask help from other programmers but nobody was able to succeed. Anyone who know Python and want to give a try, please contact me!
There is possible source website to add additional anti-scraping measures in the future, requiring me to pay again and again a programmer to adapt scraper to continue updating this database in an automatic way, which will be a waste of money. Alternatively I can solve captcha in normal browser and save manually each car page in my computer then scrap locally, which will be a waste of time. In both cases, updating Australian database require a HUGE effort that will never be paid off so I might abandon this database. For comparison, American Year-Make-Model-Trim-Specs takes 5x less effort per update and its sales are 5x higher than Australian car database.
Full story and changelog: https://www.teoalida.com/cardatabase/australia-car-database-changelog/
List of 125 car makes included
Abarth, Alfa Romeo, Alpina, Alpine, Armstrong Siddeley, Asia Motors, Aston Martin, Audi, Austin, Austin Healey, Australian Classic Car, Bedford, Bentley, Bertone, Blade, BMC, BMW, Bolwell, Bufori, Bugatti, Buick, Cadillac, Caterham, Chery, Chevrolet, Chrysler, Citroen, Commer, CSV, Daewoo, Daihatsu, Daimler, Datsun, De Tomaso, Dodge, Elfin, Eunos, Ferrari, Fiat, Ford, Ford Performance Vehicles, Foton, FSM, Geely, Genesis, Giocattolo, Great Wall, Haval, HDT, Hillman, Hino, Holden, Holden Special Vehicles, Honda, Humber, Hummer, Hyundai, INFINITI, International, ISO, Isuzu, Jaguar, Jeep, Jensen, JMC, Kia, KTM, Lada, Lamborghini, Lancia, Land Rover, LDV, Lexus, Leyland, Lightburn, Lincoln, Lotus, Mahindra, Maserati, Maybach, Mazda, McLaren, Mercedes-Benz, MG, MINI, Mitsubishi, Morgan, Morris, Nissan, Noble, NSU, Opel, Pagani, Peugeot, Pontiac, Porsche, Proton, RAM, Rambler, Renault, Robnell, Rolls-Royce, Rover, Saab, Seat, Simca, SKODA, smart, SsangYong, Studebaker, Subaru, Suzuki, Tata, TD 2000, Tesla, Toyota, TRD, Triumph, TVR, Vanden Plas, Vauxhall, Volkswagen, Volvo, Wolseley, ZX Auto.
Data fields included
Percentages calculated for 1960-2017 database. If you buy 1990-2017, 2000-2017, etc, you will get higher completion ratio. Certain data fields are available for recent cars only, for example Fuel Consumption is available starting from 2000s.
Naming: Full car name 100%, ID 100%, Make 100%, Model 100%, Year 100%, Price 97.16%, Image URL 69.76%.
Description: Body 100%, Engine 99.94%, Transmission and Drivetrain 100%, Fuel Type 100%, Fuel Consumption 52.84%.
Overview: Badge 100%, Series 100%, Body 100%, No. Doors 99.97%, Seat Capacity 98.53%, Transmission 100%, Number of Gears 99.99%, Drive 99.99%, FuelType 100%, Recommended RON Rating 55.10%, Release Year 100%, VIN 77.14%, Country of Origin 99.97%, ANCAP Safety Rating 29.29%, Overall Green Star Rating 32.33%, Text Description 25.81%.
Engine: Engine Type 99.99%, Engine Location 97.77%, Engine Size 99.92%, Induction 99.94%, Engine Configuration 78.95%, Cylinders 99.93%, Camshaft 86.72%, Valves/Ports per Cylinder 86.19%, Compression ratio 74.18%, Engine Code 58.01%, Power 85.10%, Torque 81.67%, Power to Weight Ratio 81.42%, Acceleration 0-100km/h 37.14%, Maximum Speed 0.39%.
Fuel Fuel Type 100%, Fuel Capacity 78.03%, RON Rating 55.10%, Maximum Ethanol Blend 78.03%, Fuel Delivery 99.92%, Method of Delivery 99.90%, Fuel Consumption Combined 52.84%, Fuel Consumption Extra Urban 56.27%, Fuel Consumption Urban 58.10%, Fuel Average Distance (km) 52.61%, Fuel Maximum Distance 55.69%, Fuel Minimum Distance 57.33%, CO2 Emissions Combined 51.83%, CO2 Extra Urban 30.36%, CO2 Urban 30.40%, Greenhouse Rating 32.05%, Air Pollution Rating 32.07%, Green Star Rating 32.32%, Emission Standard 25.57%.
Dimensions and weight: Length 83.34%, Width 83.31%, Height 83.03%, Wheelbase 83.64%, Track Front 77.29%, Track Rear 77.28%, Tare Mass 61.18%, Kerb Weight 75.25%, Gross Vehicle Mass 58.79%, Gross Combination Mass 31.62%, Payload 46.69%, Boot Load Space Min 9.93%, Boot Load Space Max 10.96%, Towing Capacity braked 62.81%, Towing Capacity Unbraked 59.22%, Load Length 4.22%, Load Width 4.17%, Width Between Wheel Arches 2.28%.
Warranty: Warranty in Years from First Registration 72.28%, Warranty in Km 70.45%, Warranty Customer Assistance 47.05%, Warranty Anti Corrosion from First Registration 13.23%, Free Scheduled Service 0.98%, First Service Due in Km 42.11%, First Service Due in Months 34.40%, Regular Service Interval in Km 43.86%, Regular Service Interval in Months 39.66%.
Steering and Wheels: Steering 65.04%, Rim Material 39.66%, Front Rim Description 79.46%, Rear Rim Description 79.46%, Front Tyre Description 75.54%, Rear Tyre Description 75.55%.