For Indian automobile market I provide databases with vehicle specifications and features for cars (source: Carwale), bikes (source: Bikewale), trucks and buses (source: Cardekho), plus car / bike dealers database.
I DO NOT provide car owners / registration data, please don’t waste my time and your own time asking me to give you such personal data.
Download FREE sample (one make) old format 2015-2020:
Car make, model, version, no specs (5 columns)
Car make, model, version, basic specs (28 columns)
Car make, model, version, full specs & features (188 columns)
Alternate formats: CSV and SQL (full specs & features)
New FREE samples from December 2020 redesign (3 models from 3 makes):
Car make, model, version, no specs (5 columns)
Car make, model, version, basic specs (25 columns)
Car make, model, version, full specs & features (211 columns)
Buy FULL database (all makes) + FREE updates every month:
Coverage: oldest car included in database that indicate year is Daewoo from 1994, the FIRST foreign car manufacturer to enter in Indian market, followed by Ford and Opel (1996), Fiat (1997), Honda and Hyundai (1998). It may include pre-1994 models from Indian domestic brands but no year is indicated.
Makes included: Ashok Leyland, Aston Martin, Audi, Bentley, BMW, Bugatti, Caterham, Chevrolet, Chrysler, Datsun, Daewoo, DC, Eicher Polaris, Ferrari, Fiat, Force Motors, Ford, Honda, Hindustan Motors, Humber, Hyundai, ICML, Isuzu, Jaguar, Jeep, Kia, Lamborghini, Land Rover, Lexus, Mahindra, Mahindra-Renault, Maini, Maruti Suzuki, Maserati, Maybach, Mercedes-Benz, MG, Mini, Mitsubishi, Opel, Nissan, Porsche, Premier, Renault, Rolls-Royce, San, Skoda, Ssangyong, Tata, Toyota, Volkswagen, Volvo, Willys.
Download FREE sample (one make):
Bike make, model, version, full specs (93 columns)
Buy FULL database (all makes) + FREE updates when someone ask an update:
Coverage: since NO bike have production years indicated, I cannot answer which is oldest bike included in database, but I assume that 2000s-present at least.
Makes included: Aprilia, Ather, Avan Motors, Avanturaa Choppers, Bajaj, Benelli, BMW, Cleveland CycleWerks, Ducati, F.B Mondial, Harley-Davidson, Hero, Hero Electric, Hero Honda, Honda, Hyosung, Indian, Jawa, Kawasaki, KTM, LML, Mahindra, Moto Guzzi, MV Agusta, Norton, Okinawa, Royal Enfield, Suzuki, SWM, Tork, Triumph, TVS, UM, Vespa, Yamaha, Yo.
Description & sources of data
India Car Database is the first project I done using an automated software to extract data from websites, opposite of European car databases that I am making since 2003 entering data manually from books and magazines.
I made it in August 2015 when I finished a new phase of European databases, I looked online for web scraping software and after a week of learning and experiments, I managed to grab all data from www.carwale.com website, new and old cars. I was not aware that Carwale website been just redesigned between 14-18 August 2015 according www.archive.org.
I made it from my personal interest, because of many people from India contacting me asking if I have / can make a database for their country, and though that there is big potential. I was WRONG, just a small percentage of indians are willing to pay for data.
One of the people who contacted me during that week (and purchased India car database afterwards) wrongly understood that I created India Car Database “just for him” and asked me to make a 2-wheeler database too. I rejected, because my interest was only for cars, but once I mastered my data scraping skills and reduced effort to create a database to <1 day, I decided to offer web scraping as service. In January 2016 another customer wanted bike specifications, this was the moment I made a scraper for www.bikewale.com too.
After doing 3 updates in 8 months, database been purchased by 7 people and that was enough to offer monthly updates on 1st day of month, since May 2016. See list of updates.
Note about updates inconsistency
Between 2015 and 2017 I ran scraper on make pages to get model URLs, then every model URL to get version URLs, then every version URL to get specifications, remove all data from previous update and put new data. All cars got updated (including prices) to current month.
In February 2017 Carwale website removed (hide) URLs leading to discontinued models. So my database contains valuable data that you cannot get yourself from Carwale anymore. I kept updating database by getting version URLs of new cars only, add URLs in existing data, compare the unique ID number from each URL, delete duplicates, then scrap all versions URL (new and discontinued) to get specifications including current price and last recorded price for discontinued cars.
In November 2017 Carwale removed unique ID from each URL, which was the ONLY way to distinguish multiple cars with exactly same name. All cars URL been changed and redirected to new URLs without ID number at end of URL, in 10 cases the old version URLs redirect to 404 Not Found, in 197 cases the old version URLs is redirecting to wrong car (multiple old URLs redirect to same new URL because of identical model name), making me impossible to re-scrap old cars for updates without risking loss of model versions.
The only way to update database is to run scraper on new cars only, add data into New & Old cars, use an Excel formula to identify duplicate URLs and delete them, remaining URLs I assume that they are cars launched last month and I add them at bottom of database. I add new cars each month, but cannot update older cars data anymore (example: price, which change often). This is not 100% reliable, if Carwale change/correct a model name it will reflect in different URL and I will add in database as new model, and if a model is discontinued and replaced by a new model with same name and URL, it will be not included.
In 2019 Carwale choose to concatenate multiple specifications into a single field (such as cubic centimetres, cylinders, valves and camshaft), causing inconsistencies in my database between old and new cars. Since old cars aren’t showing anymore on Carwale website to scrap data again for ALL cars as I did in 2015-2017, my database’s quality is at risk if Carwale continue to do changes on website (if you purchase “new cars only” database, don’t worry, it is consistent).
In April 2019 I made new scraper for Bikewale, adding individual versions in the Indian bikes database (in the previous editions, if a bike had multiple versions, database contained only base version).
In August 2020 Carwale website was redesigned, I spend few hours editing scraper xPath codes. Carwale page source code no longer include Make and Model separated from Version, so the only place to get this info was in URL, that do not have correct capitalization.
On October 2020 Carwale re-added discontinued models, allowing me to re-scrap ALL cars and not just the ones currently in production, but it resulted 3317 model versions. I emailed update notifications to 30+ customers asking if I should continue adding new cars into existing database each month, bearing the risk of inconsistencies and duplicates described above + possible even more inconsistences in the future if Carwale redesign their website again, OR start a new database containing new and discontinued cars, ONLY those currently shown on Carwale (2000 cars less) with consistent data in each column and without duplicates?
of 30+ past customers emailed, only 2 replied choosing option 2. Another 2 new customers also choose option 2. So in December 2020 I redesigned database according customer preference and according current design of Carwale.
Car data fields included
Naming: ID, Make, Model, Version, Status 100%.
Price: Production cars 30.65%, Discontinued cars (last recorded price) 69.35%.
Body: Length (mm) 99.70%, Width (mm) 99.70%, Height (mm) 99.67%, Wheelbase (mm) 99.54%, Ground clearance (mm) 59.27%, Kerb weight (kg) 61.11%, Bootspace (litres) 48.75%, No of doors 99.54%, Seating capacity 99.57%, No of seating rows 72.36%.
Engine: Displacement (cc) 99.21%, Max power (bhp) 99.43%, Max power (rpm) 99.21%, Max torque (Nm) 99.43%, Max torque (rpm) 99.73%, Transmission type 99.78%, No of gears 97.15%, Drivetrain 86.74%, Engine type 87.39%, Cylinders 72.17%, Bore x Stroke (mm) 13.18%, Compression ratio 9.16%, Valves per cylinder 69.73%, Dual clutch 60.92%, Sport mode 61.74%, Fuel system 31.33%, Turbocharger/supercharger 50.60%, Turbocharge type 50.16%, Driving modes 51.01%, Manual shifting for automatic 50.03%, Engine start-stop 49.78%.
Fuel: Fuel type 99.67%, Alternate fuel type 63.53%, Mileage (kmpl) 87.31%, Fuel tank capacity (litres) 96.30%.
Drivetrain: Suspension front 91.30%, Suspension rear 90.68%, Brake type front 98.48%, Brake type rear 98.23%, Steering type 50.87%, Turning radius (m) 80.14%, Wheels 50.95%, Spare wheel 68.53%, Tyres front 71.28%, Tyres rear 71.22%.
Others: Colour names 93.89%, Colour RGB 93.89%, Image URL 87.85% (you can use Tab Save extension for Chrome to download image files).
Features: 131 columns, see SAMPLE file, I do not list them here to overload the page with too much text.
Bonus: Car class, Body style 100.00% (these 2 columns are NOT sourced from Carwale, but added manually from my personal experience, available ONLY in new+old cars package).
Percentages as 1 January 2017 (3680 cars).
Note: Car class and Body style are NOT available in new cars only package, because would take a lot of time to re-add them for 1200 cars every monthly update, so I add them only in new+old cars package for the cars added each month (about 20-50 cars per monthly update). Cars launched after June 2019 have some engine columns merged into 1 column due to changes in Carwale website.
Car engine codes database
Database made at request for someone who wanted production years for every model and engine, and engine codes, and posted here for other people who may be interested in same thing
Download FREE samples:
India Car Models Engines Database SAMPLE.xls
Buy FULL database:
Trucks and buses databases
Source of data: CarDekho.com, I made these 2 databases in September 2016 at request from a customer who just wasted my time and never purchased databases. First sale I done in January 2018 so I updated them for first time. I updated 1 more time in April 2018, then in August 2018 I noticed that CarDekho made each version URL to redirect to main model URL, effectively making me impossible to scrap specifications of other versions than base version. Poor sales of database made me to abandon them.
Somewhere in late 2019 or early 2020 CarDekho changed coding again making scraping specifications by version feasible again. I updated them again in May 2020. Due to low sale volumes, I will do updates on request basis rather than monthly like in case of India Car Database.
Buy FULL database + FREE updates when someone ask an update:
Bikes and cars DEALERS database
In 2016 someone asked me to scrap dealer information from Carwale and in 2018 someone wanted the same from Bikewale. Here are the databases containing dealer name, street address, email and phone number. Sales been LOW (in whole year 2019 not even a single person purchased dealers database) so I won’t update them unless someone else purchase and ask for update.
Buy FULL database + FREE updates when someone ask an update: