Hong Kong use an unique system where Gross floor area include a sharepart of common spaces, thus two blocks with identical apartment units but with different number of units per floor had different gross floor area stated for each apartment, because in the block with fewer apartments per floor the common areas were divided to fewer apartments. In other countries Gross floor area meaning is like Hong Kong’s Saleable area.
Net floor area is the internal room space between face of walls.
Saleable floor area include full thickness of external walls, half thickness of walls shared with other units, internal walls thickness, as well as balconies, but exclude bay windows.
Gross floor area also include a sharepart of common areas (lifts, staircases, corridors) so it should not been taken in serious.
Housing Authority shows internal floor area in case of PRH estates, while for TPS, HOS and PSPS estates it shows saleable and gross floor area.
The efficiency (Saleable floor area) is usually 75-85% of Gross floor area, higher value for the blocks with bigger apartments and for low-rise blocks, since the common area of a blocks with a given number of units per floor is approximately same for regardless of apartments size. More details about what is gross floor area and efficiency here insitu.com.hk and eaa.org.hk.
Home owners should be interested in internal floor area, the percentage of internal floor area from saleable area vary much, as newer and taller buildings require thicker walls, take a look how thick are the walls of The Masterpiece. Higher floors may have thinner walls so larger rooms.
BEWARE! Most, if not all private developers, as well as all real estate websites, publish gross floor area as primary value of apartment size, confusing people. We cannot make a law to force developers to quote the saleable floor area and put an END of this annoying confusion?
Starting from 2012 or 2013, websites like Centadata.com and GoHome.com.hk added saleable floor area beside gross floor area in every listing.
Here is the list of 70+ database projects made by me, from personal interest to distribute freely (small ones) or sell for professional use (big ones), as well as databases made at request of single or multiple customers and published on website to allow other customers to purchase them if interested.
Not included in below list are the databases created as one-time project for single customers outside of my fields of interest, few of them made under non-disclosure agreement.
Since childhood I love writing books, doing research, making databases and statistics. I started using computers in 1997, my dad taught me to use Word, but around 2003 I started using Excel more than Word. Analyzing data, making tables and charts about everything encountered in my life! For example, in racing games I measure the speed of each car and write the numbers in an Excel spreadsheet then make a chart.
Early works were pure hobbies, made from personal interest, with no plans to commercialize them.
Hong Kong Housing Database was also made from personal interest of analyzing public housing estates, few months later, an insurance company asked me if I can expand it to private housing estates. It became my FIRST large project made for a customer.
Car Database were also started as hobby for my personal use, but had an unexpected turnout into business after 2012 when I realized that many companies in automotive industry are paying big $$ to get a complete, accurate, and frequently updated database. I went through extensive transformation to make my hobby databases suitable for professional use.
Seeing the success with car database, I decided to transform other hobby databases and sell them for professional use, such as HDB Database and World Cities Database.
Starting from 2015 I learned web scraping. Scraping usually means running a software to visit a list of given pages, extract specific data and put it in a database automatically. This allow me to create very large databases with little effort, spending ~30 min to write codes and leave scraping software to run in background for few hours or days.
Between 2015 to 2017 I created over 50 databases via web scraping, some from personal interest to sell to multiple people (India, Middle East and Australia car databases, mobile phones databases, etc) and others for single customers who requested them (web scraping services).
Keeping all databases regularly updated takes is a huge workload. Scraping more websites, if they take too much time, will create additional workload and will delay everyone’s updates. As 2018 I decided to STOP updating databases having less than 5 sales per year so I can focus on the ~20 best-selling databases that produce 80% of my income.
So… unless you come with a GREAT IDEA of database that can be sold to multiple customers, I have the right to NOT do your web scraping project if it takes more than ~2 hours of manual work and more than ~50 hours of running scraper in background.
Some projects (example Database of HDB Resale Flat Prices) involve copying the data from a source website and pasting in Excel, then few visual adjustments to make it beautiful, which takes just few hours to create a 20 MB Excel database.
Other projects (example Database of HDB blocks) involve compiling data from various sources and manual data entry in Excel, taking hundreds hours of work!
All my projects are made primarily for visualization in Excel, but some (especially the Car Database) are often used by professionals who are converting the Excel spreadsheet to CSV and MySQL and use in web design and mobile app development.
My parents encouraged me to work in Microsoft Word, and told me to finish and print the work because “what is not finished have ZERO value“. I never understand why they wanted to print… some works for example Car Database should NOT be printed and cannot be “finished”, it need to be updated constantly with new launched cars. They promised me that will help me publishing a book… but this never happened (there is a possibility that they encouraged me just to give me a solitary occupation at computer to prevent me disturbing them, instead of letting me to have a social life).
My dad even set me rules, how a book should be written: Arial font, 12pt body text, 16-20pt titles, all titles centered, bolded, underlined. However, since my writings were not really a book but a list of… something, the rules imposed by dad created excessive bold and centered text.
Around 2001-2003 I got fascinated by Notepad and by fixed-width font and I was using lines of — and === signs full page width to enhance titles.
Since 2003 I broke away from dad rules and started using Excel more than Word. New works in Word were optimized for on-screen display instead of printing, often using non-standard page sizes, to make exactly 1 page for each subject. I write with 10pt font for body text, 20pt and 15pt for titles, 10pt and 5pt for empty spaces.
Since 2003 for both Word and Excel works, the titles were white text on blue background for full page width, which does not look very well on paper. I used same style when created my website in 2009.
Since 2010, one of the distinctive features of all my Excel works are the coloured columns, older works were coloured to this format too. I combine the Excel databases with my graphic design hobby, making Excel files also artistic!
Since 2015 I changed the standard of Word files, removing full-width coloured backgrounds of titles and putting instead full-width horizontal lines (similar with what I was doing in 2001-2003 in Notepad). This will be better for printing (even if I assume that nobody will print my works). I changed also my website design to this format.
One of my friends said that my website looks like being made by an expert in typography rather than by a webdesigner!
Example of styling in my books: 2012, 2013, 2015 editions of Car Models Encyclopedia
I also done some kind of competition between my works, a race to create biggest database in Excel or biggest book in Word, in terms of pages and file size, under certain standards (10pt font, no duplicate stuff, no large open spaces, NO bullshit but useful content, etc). I kept track of file size in an Excel table similar with above table.
If you are looking for a database of mobile phones specifications in Excel format to create a website, use in a GSM shop or anything similar, I created an Excel database for you using a scraping software to extract data from www.gsmarena.com.
Looking for digital cameras, laptops, TV, and other electronics? Suggest websites where I can extract data and create databases!
Buy FULL database + 1 year of FREE monthly updates:
DO NOT ask for phone numbers database. I do not support telemarketing / SMS spamming. Please check above sample Excel file and watch video below to understand what I sell before wasting my time asking things that I don’t sell (like this guy).
Mobile phone database coverage
The mobile phone specification database include classic phones and smartphones, tablets and smartwatches, from most popular phone brands in western world. There may be missing brands especially from China which have a lot of brands totally unknown outside their domestic market.
Earliest phones included are Ericsson models launched in 1994, but I guess that some early phone models are missing, as number of gadgets launched per year is significant only after 2003. See changelog.
Comms: WLAN 99.97%, Bluetooth 99.80%, GPS 99.68%, NFC 10.61%, Radio 98.90%, USB 90.39%.
Features: Sensors 56.99%, Messaging 39.68%, Browser 39.04%, Clock 5.16%, Alarm 5.16%, Games 39.03%, Languages 2.93%, Java 39.11%, Other 60.60%.
Battery: Battery 99.97%, Stand by 72.25%, Talk time 75.38%, Music play 7.97%.
Misc: Colors 93.60%, SAR US 21.29%, SAR EU 25.20%, Price group 59.86%.
Tests: Performance 4.46%, Display 6.25%, Camera 7.72%, Loudspeaker 9.16%, Audio quality 8.38%, Battery life 5.96%.
Indian mobile phones
This page has been getting significant traffic from India, with many people contacting me but refusing to buy above worldwide mobile phones database and asking me if I can create another mobile phones database with only models available in Indian market. Job done!
I sourced data from 91mobiles.com, this website have 10 category of products, such as Tablets, Cameras, TVs, Home Theaters, Smartwatches, Washing machines, Air conditioners, Refrigerators, Microwave oven, but I do not intend to scrap all categories and create databases for each, because would take a lot of time to regularly update each of them, for their sale volume, they will become products sell-able only in India and we all know how little indians pay, if people from western world contact me if they are able to make a purchase, many indians contact me regardless they are willing to pay or not, and just few leads actually convert to sales.
Filter the mobile phone database and get a list of mobile phones having specific features.
Filter the mobile phone table by year and make list of features introduced in each year of history.
Analyze data and make statistics what are the best phones in each price range.
Convert Excel spreadsheet to CSV or MySQL and create your own phone comparison website or mobile app.
Actually I do not recommend using this database in making websites. Feel free to use it for research and analysis purposes. I am NOT responsible for any copyright troubles you may face if you use their data commercially, making own website, etc. This is NOT an original database “Made by Teoalida”, but rather a database showcasing abilities of data scraping, and the data belongs to www.gsmarena.com. Please consider the price a fee for data scraping service from website rather than author of data.
The hobby for cars started in 1999 but only in 2003 I decided to start making an Excel database of all cars. The research was done independently from the internet world (I connected to internet in 2005), sourcing data from AutoKatalog books (German publication), making an original compilation that you cannot find anywhere else online (except on websites that purchased the database from me).
I published the car databases on my website only in 2011, intending to share my research with other car buyers, hobbyists, car experts, etc, without expecting that I will be visited by various companies (auto insurance, auto parts shops, car shipping services, etc), programmers, web designers and mobile app developers, and I can make a business from this! Most of these visitors have zero experience in cars, and make often mistakes such as buying wrong database, buying an American car database while they do business in Europe, or buying from other data providers selling bad quality database just because it is cheaper or have higher number of model variations.
First sale was done in May 2012. Had to do some changes to make it appealing for this unexpected audience, both in data structure and in website presentation. The rising flow of customers gave me a REAL motivation to dedicate time for updating car database constantly. Since late 2012, NO month had passed without adding or changing something, creating the MOST UPDATED car database ever found on the internet.
Beside original European car database manually compiled from AutoKatalog books, since 2013 I create database for American market, and seeing the success, I created additional databases for India, Middle East, Australia, as well as real estate databases, mobile phones, etc, via web scraping.
Sales been growing and in 2015 I exceeded 100 databases sold, producing 80% of my income, making me to quit my job of AutoCAD and architectural design and dedicate my life to data providing industry!
I offer data mining and web scraping services. “Scraping” usually means coding a bot that visit a list of given pages, copy specific data from each page and put it in an Excel / CSV file automatically, at rate of few pages per second. Watch the video!
If you are building a website, a mobile app, or just require specific data but cannot find in usable format, just give me link to a website having required data, I will make a scraper and turn website into an Excel database, for you and for future customers.
Note: I have a LIMITED amount of time and I love creating useful databases that I can sell via website to as many people is possible, due to this reason I may reject projects that takes more than few hours if the data collected have no use for anyone else than yourself, or I can pass your project to my partners. See examples of projects done and their price.
For many years, manual data entry in Excel (sourcing from books, as seen in this video) or manual copy-pasting from websites, was the only way I created databases. A slow process which limited the size of the databases I could make. Even in this slow process I made about 40 databases in the fields of personal interest: automobiles, geography, real estate, computers, gaming, etc, from pure hobby.
I started in web scraping in August 2015 when I found import.io (a free scraper for simple HTML websites), and in November 2015 I allied with a programmer to create custom scrapers for more complex websites. This allowed me to create large databases with minimum effort, in a matter of hours.
These online scraping tools some are free but slow and limited in functionality, limited in one project at time, limited number of pages you can extract, unless you upgrade to paid subscription. Although you can scrap yourself for free (small number of pages), may take few days to learn to use them efficiently. Most people do not have time to learn or cannot pay expensive monthly subscription. I can help you!
import.io turned into a paid service in April 2016 and suspended my free account. Prices were increased in 2017 to $299/month to scrap up to 5000 pages. This gave idea to my programmer to develop own “universal” scraping software in Visual Studio, comparable with the tools available online, but with no limit in number of pages or simultaneous projects, this allow me to scrap any simple website at lower price that you can do yourself.
Once I mastered my scraping skills, in early 2016 I wrote this article to offer freelance scraping services. In 2016 and 2017 I was doing every project that was technically feasible… until I overloaded myself with responsibility to provide regular updates for about 50 projects. In 2018 abandoned databases having less than 5 sales per year so I can focus on the ~20 best-selling databases that produce 80% of my income.
Simple data scraping service
This apply on websites with a distinct URL for each page and all data in HTML code. Data can be extrated with our “universal” scraper, analyzing website and writing codes that indicate what data to extract takes usually 10 min – 1 hour. Indicative prices:
Number of pages to be extracted: 1,000 pages = $50, 10,000 pages = $100, 100,000 pages = $300 (average speed 1 second per page, if website is slower I may charge higher) since each website have a random number of pages, these prices are just relative and final price will be given at request.
Number of columns to be extracted = 50 cents per column.
Multi-level scraping = $10 per level. Many car websites require a scraper for makes pages to get models URL, a second scraper for model pages to get versions URL, a third scraper to get in versions pages to extract car details which is what you need. Infinite scrolling, pagination, enter data in search boxes also add few $.
Cleaning data after scraping = extra $ if raw data from scraper include annoying spaces and line breaks, or unwanted characters such as unit of measurement after value, which need to be removed with Excel find-replace.
You NEED to provide website URL and I will quote a price in your preferred currency (USD, EUR, GBP, AUD, SGD, etc). For example for scraping Parkers.co.uk seen in demonstration video, 1-level scraping, 101 pages to be extracted, 4 columns, no cleaning needed, I charged only €23.66 which is the number of rows (2366 rows).
Complex data scraping service
This apply on websites having drop-down lists, search boxes, JSON data, a login is required to access data, selecting various items do not produce a different URL, etc. In this case online scraping tools do not work, my friend universal scraping software also do not work, so he need to make in Visual Studio a custom scraper just for that particular website, this may take few days depending by his available time.
News: in 2019 other 2 people, from India and Australia, joined to take web scraping projects that are too complex for me.
Price: usually within $200 to $500 range which I share with my partner, price vary depending by complexity of website rather than number of pages to be extracted.
For less than 200 records may be faster to copy-paste manually than coding a custom scraping software.
Complex scraping services sometimes require screenshots (as below) for my programmer to indicate to bot where to click and what data to extract.
What cannot be scraped
Theoretically I can scrap data from any website, but only websites having the required data in a consistent structure from page to page, can produce a good usable database. An example of non-consistent website is Wikipedia.
Some websites look simple to scrap, but after starting job I get IP blocked, a CAPTCHA page, etc, anti-scraping features made to prevent copying data or to prevent DDOS attacks. If you ask for price before starting the job, you should be prepared for price changes if I find anti-scraping features, captchas that require a human to sit at computer all time and solve them, change IP or do manual data entry, making project too costly for the value of the data we can get.
Do not get angry at me if I fail scraping data from one website, just give me another website and I may succeed with it.
I know how useful is a phone or email database, for example if you are a car insurance company to spam emails to car owners posting listings in classifieds websites, but most classifieds websites protect seller phone number and contact email from being scraped and spammed with unsolicited emails, by using a Contact button, or need to click a button to reveal email, or email is shown in an image rather than text format. In this case the job can be done via manual data entry, a job more suitable for a child than for us, busy skilled programmers.
Advantages of my service, discounts and future updates
The main advantage of working with me is that once I create a database I can post on website to be purchased by multiple people, so you will pay just a small part of the cost of scraping (if database is something of my personal interest – cars worldwide, real estate of Singapore, and few more).
If you want to keep private, I can sell it just for you at higher price and not publish on website, but the BIG question is what I should do if a second customer ask me to scrap same website and he agrees to publish it on website to get cheaper price? I reserve the right to sell to other people if they ask. If you ask to scrap a website outside my personal interests and unrelated with the fields covered by website, I will not publish it because is unlikely for anyone else to purchase it, and you need to pay the full cost of scraping.
I offer FREE updates for one year for all databases that get purchased by multiple people, when a new customer pay for database, if he require an update, I run scraper again and offer updated database for previous customers too, free of charge. But if you ask me to scrap a website privately “just for you” you need to pay each time you want an update, 20-50% of the price you paid for initial database creation.
Legal issues of web scraping
Scraping data from a website is usually LEGAL, but using scraped data in another website, is usually ILLEGAL.
Depends… if the data is added by volunteers, or by sellers in classifieds websites, scraping is most likely legal. But if authors of website hardworked to compile data from sources like car brochures or manufacturer websites, scraping is most likely illegal, especially if you use their data in making your own website or other commercial purpose. Although data is freely available, compilation can be copyrighted. Most websites contains dummy data (example: a bunch of cars having +/- 1 horsepower than official value) and if you use data copied from them, they can prove that you copied their data compilation and make a lawsuit against you. BEWARE!
For a moment I became concerned if my European Car Models & Engines Database sourced from AutoKatalog books is a copyright violation, but I came in conclusion that it is fine, because my databases is an original compilation writing data in a different data structure than the book, and it target online audience, while the AutoKatalog is a book sold in shops targeting car hobbyists. I am doing each year over 100 sales without having a single person worrying about copyright.
In case of America, Year-Make-Model is my original compilation sourced from Wikipedia and 3 more websites, while Year-Make-Model-Trim-Specs is web scraping from Edmunds.com website who is also offering API thus allow other websites using their data, so again is legal.
But, since I created India car database in 2015 sourcing data from Carwale.com I started being concerned that what I am doing may be illegal.
Country matters: I had many customers in India asking me to scrap data from various websites. However, when someone from Europe or America ask me certain data that I do not have and I propose him scraping services from a website, some people bring attention to legal issues of web scraping.
Funny case: someone offered to sell me a car database that he claimed to have been creating it by working for 4 months, 8 hours per day, copy-pasting data from a website, with rights to resell on my website. From copyright point of view does NOT matter if you extracted data using an automatic software or typed every letter manually, as long you copied data from a website your work is not original. He was probably not aware of scraping software. If you wasted few months doing something that could have been done in few hours using scraping software, you are an IDIOT (I was an idiot too doing such jobs before 2015 being not aware of scraping software, but small jobs only) and I am still doing in case of European database because I source data from books (offline sources), making an original product on the web.
Example of data extraction / scraping projects done and their price
All scraping software save data in CSV format, but if I decide to publish on website, I make XLS files with borders, colors, headers and other visual features to match the style of other products “Made by Teoalida” that give impression of work done with care.
India Car Database – source: www.carwale.com – Made in August 2015 from personal interest because of numerous people asking me about indian car database. Being my first scraping project, it took initially 8 days to figure out how to use import.io and do it, once my programmer partner made own universal scraper, time required to do each update was reduced to 3 hours. Over 3000 rows and 188 columns. Sold in 3 different packages 30, 60, 120 euro depending by number of columns. During first year it has been purchased by 8 people, I also made a FREE “make & model only” package, hoping to encourage customers to make a free purchase before paying for big database, but contrary happened. Once removing the free package, number of sales increased.
India Bike Database – source: www.bikewale.com – Made in January 2016 after 2nd person requested a database of bikes sold in India. One of easiest projects, having no drop-down boxes but plain links to each bike page. 250 records, price: 25 euro.
Skyscrapers Buildings Database – source: www.emporis.com – Made in November 2015 from personal interest, put for sale for $150 (15000 buildings) and turned into a marketing failure, 1 year passed and nobody purchased it (except a customer asking me for make US buildings database, see below). Took about 20 hours to compile manually list of cities with buildings over 100 meters, then list of buildings from these cities, then used import.io to automatically extract each building details. 15000+ buildings. Emporis block my IP for 2 days if I access more than 3000 pages in one day, so data extraction with import.io (not able to change IP) was limited to 3000 buildings per day, which took about 1 hour daily for 6 days.
US Buildings Database – source: www.emporis.com – Made in November 2016 for a customer seeing above Skyscrapers database told me to make a similar databases with all types of buildings from USA, 160,000+ buildings, had to run over 100 batches of max 2000 buildings, now using my partner’s universal scraper from my computer, I could change IP after each batch, running again and again blocked URLs until I was able to get all buildings. 60 hours of work. Price: $600.
Singapore Condo Database II – source: www.propertyguru.com.sg – Made for a customer in 2016. Apparently an easy project, having plain links to all condos, it turned impossible to do with import.io because of a fucking CAPTCHA appearing randomly after 10-50 pages extracted. My programmer spend 2 weekends in Visual Studio making a custom scraper that allow me to input CAPTCHA when needed, charged me $300 USD, and I sold database with 3176 condos for $317.60 SGD (about 240 USD), leaving me in loss, but because other customers have purchased it, profit came.
World countries database – source: The World Factbook – Made in 2017 from personal interest, a database with an impressive amount of 362 columns and only 268 rows. Took about 5 hours to write XPath codes for each column, and only 35 minutes to scrap data.
Mobile Phones Database – source: GSMarena.com – Made in August 2016 from personal interest. A simple project made with our niversal scraper. During first year it has been purchased by over 10 people, this allowed me to provide FREE monthly updates, each scrap taking about 1 hour.
Australia car database – Made in June 2017 after a year of hiatus because I wasn’t sure if Australia can provide sufficient sale volume to cover my effort. Scraping was a headache because the source website use anti-scraping features that blocks my partner universal scraper. Had to use another scraper which was slow (12 seconds per page) and frequent crashes. Took 14 days to scrap all 90000+ cars, future updating is done by scraping only last year of cars. Price $450 with discounts offered for partial purchases. It had a happy turnout, during first year it over 10 people purchased it.
Sulekha.xls – source: www.sulekha.com – A bit unusual data scraping, an one-time use database for SMS and email marketing, instead of creating a saleable product containing all car models, all buildings, all of something.
Postal code scraping – a customer gave me a list of postal codes which I input in www.streetdirectory.com to get building name and street address (in Singapore every building have unique postal code).
Flickr scraping – a customer downloaded a large amount of car images from Flickr and realized that to use in his website he needs to specify author name, link to source page and link to Creative Commons license. I scraped this info, 223,000 images for 223 euro at 0.6 seconds per page.
Used cars images – a customer asked me to scrap an used cars website, to get image URL beside Make, Model, Year. Took only FEW HOURS and I got over 100.000 car images, all in same resolution. He told me to keep it private and do not publish or resell on website. So I am telling you only the idea. If anyone wants to scrap car images in this way, let me know what website to scrap!
I done few more databases but the customers told me to NOT publish on website, or they are in fields unrelated to topics covered by my website so even if published, they won’t get sales.
Are you looking for a database of planets and satellites in Excel format with their facts and figures, for research or to make a website? I made a database for you, sourcing data from solarsystem.nasa.gov, manually compiled in Excel in 2014 and offer it here for FREE download.
I made it for personal interest of comparing facts of all planets and satellites at same time, while the NASA website allow only 1 vs 1 comparison.
What is included
The current version of database include planets and satellites in hydrostatic equilibrium (approximately round shape). I have not included satellites with less than 400 km in diameter of planets beyond Jupiter.
If in 1990s were known 77 bodies in solar system, today 300+ bodies were discovered. Since 1997, each year bring discovery of new satellites, most of them under 10 km diameter. Not much data is available about them on NASA website, some estimated data is shown on Wikipedia, and as the space exploration progress, the data is likely to change. Do I should add more satellites in the next version of database?
Note: I have a long hobby for geography and astronomy. First time I made a solar system database in Word around year 2000 sourcing data from old 1970s atlases dating back from my parents school age, and another one made in 2004 sourcing data from Encarta Encyclopedia 2002.
See also: solarsystemscope.com, a website showing 3D model of not just solar system but whole galaxy!
Are you looking for a database of countries in Excel format with their facts and figures, for research or to create a website? I made a database for you. Took few hours to make a scraping script which extract data from The World Factbook and create a CSV file. Takes about 35 min to extract the 268 entries, and I can update anytime you want!
Buy country database
The database include 268 entries, sovereign countries, dependent territories, as well as oceans, World and European Union.
LITE version include area and population for all countries, as well as full facts for United States and United Kingdom.
FULL version include 362 facts, covering everything possible from Geography, People and Society, Government, Economy, Energy, Communications, Transportations, Military and Security, Transnational Issues.
Contact me for custom packages (specific selection of columns)
List of entries in country database
World, Afghanistan, Akrotiri, Albania, Algeria, American Samoa, Andorra, Angola, Anguilla, Antarctica, Antigua and Barbuda, Arctic Ocean, Argentina, Armenia, Aruba, Ashmore and Cartier Islands, Atlantic Ocean, Australia, Austria, Azerbaijan, Bahamas, The, Bahrain, Baker Island, Bangladesh, Barbados, Belarus, Belgium, Belize, Benin, Bermuda, Bhutan, Bolivia, Bosnia and Herzegovina, Botswana, Bouvet Island, Brazil, British Indian Ocean Territory, British Virgin Islands, Brunei, Bulgaria, Burkina Faso, Burma, Burundi, Cabo Verde, Cambodia, Cameroon, Canada, Cayman Islands, Central African Republic, Chad, Chile, China, Christmas Island, Clipperton Island, Cocos (Keeling) Islands, Colombia, Comoros, Congo, Democratic Republic of the, Congo, Republic of the, Cook Islands, Coral Sea Islands, Costa Rica, Cote d’Ivoire, Croatia, Cuba, Curacao, Cyprus, Czechia, Denmark, Dhekelia, Djibouti, Dominica, Dominican Republic, Ecuador, Egypt, El Salvador, Equatorial Guinea, Eritrea, Estonia, Ethiopia, Falkland Islands (Islas Malvinas), Faroe Islands, Fiji, Finland, France, French Polynesia, French Southern and Antarctic Lands, Gabon, Gambia, The, Gaza Strip, Georgia, Germany, Ghana, Gibraltar, Greece, Greenland, Grenada, Guam, Guatemala, Guernsey, Guinea, Guinea-Bissau, Guyana, Haiti, Heard Island and McDonald Islands, Holy See (Vatican City), Honduras, Hong Kong, Howland Island, Hungary, Iceland, India, Indian Ocean, Indonesia, Iran, Iraq, Ireland, Isle of Man, Israel, Italy, Jamaica, Jan Mayen, Japan, Jarvis Island, Jersey, Johnston Atoll, Jordan, Kazakhstan, Kenya, Kingman Reef, Kiribati, Korea, North, Korea, South, Kosovo, Kuwait, Kyrgyzstan, Laos, Latvia, Lebanon, Lesotho, Liberia, Libya, Liechtenstein, Lithuania, Luxembourg, Macau, Macedonia, Madagascar, Malawi, Malaysia, Maldives, Mali, Malta, Marshall Islands, Mauritania, Mauritius, Mexico, Micronesia, Federated States of, Midway Islands, Moldova, Monaco, Mongolia, Montenegro, Montserrat, Morocco, Mozambique, Namibia, Nauru, Navassa Island, Nepal, Netherlands, New Caledonia, New Zealand, Nicaragua, Niger, Nigeria, Niue, Norfolk Island, Northern Mariana Islands, Norway, Oman, Pacific Ocean, Pakistan, Palau, Palmyra Atoll, Panama, Papua New Guinea, Paracel Islands, Paraguay, Peru, Philippines, Pitcairn Islands, Poland, Portugal, Puerto Rico, Qatar, Romania, Russia, Rwanda, Saint Barthelemy, Saint Helena, Ascension, and Tristan da Cunha, Saint Kitts and Nevis, Saint Lucia, Saint Martin, Saint Pierre and Miquelon, Saint Vincent and the Grenadines, Samoa, San Marino, Sao Tome and Principe, Saudi Arabia, Senegal, Serbia, Seychelles, Sierra Leone, Singapore, Sint Maarten, Slovakia, Slovenia, Solomon Islands, Somalia, South Africa, Southern Ocean, South Georgia and South Sandwich Islands, South Sudan, Spain, Spratly Islands, Sri Lanka, Sudan, Suriname, Svalbard, Swaziland, Sweden, Switzerland, Syria, Taiwan, Tajikistan, Tanzania, Thailand, Timor-Leste, Togo, Tokelau, Tonga, Trinidad and Tobago, Tunisia, Turkey, Turkmenistan, Turks and Caicos Islands, Tuvalu, Uganda, Ukraine, United Arab Emirates, United Kingdom, United States, United States Pacific Island Wildlife Refuges, Uruguay, Uzbekistan, Vanuatu, Venezuela, Vietnam, Virgin Islands, Wake Island, Wallis and Futuna, West Bank, Western Sahara, Yemen, Zambia, Zimbabwe, European Union.
I compiled this Word file starting from 2003, initially sourcing data from Encarta Encyclopedia 2002 that I got from a friend. It was the ONLY recent geography encyclopedia I had in my hands, all others being world atlas books from my parents childhood (1970s). I did not had connection to internet until 2005 to get other sources of data. Over next years I improved the Word file using Encarta 2006 and 2009, completing all countries. Since 2016 I offer an Excel version beside Word file, and also updated all countries to Encarta 2009.
As an original product “Made by Teoalida”, you will NOT find this database elsewhere on internet.
This city database classify cities based by icon used by Encarta Encyclopedia according city population so number of cities in each country is nearly proportional with country urban population. This city database does NOT pay attention to administrative status, which vary from country to country so making a single worldwide database of official “cities” excluding towns, communes, villages and other places” pose major troubles, especially since few countries does not have any “city”.
For example: Romania have officially 103 municipalities and 217 towns, but my database include 116 cities that have icon of 20,000+ population in Encarta Encyclopedia. Germany have 2058 “towns”, Italy have 7954 “communes”, Spain have 8122 “municipalities”, there is no “city” or “town” designation.
As 2017, the world city database contains:
294 cities over 1,000,000 people from all countries.
305 cities over 500,000 people from all countries.
3085 cities over 100,000 people from all countries.
4000+ cities over 20,000 people from selected countries that do not have too many cities: Eastern Europe, former U.S.S.R countries, Middle East, South-East Asia, Africa.
Depending by how many people purchase the database, I may add 20,000-100,000 cities from North and South America, Western Europe, South Asia, East Asia, Australia, etc, which may bring number of cities in database to about 15,000-20,000.
Administrative divisions (states, regions, provinces, etc) are included for 30 most major countries, second-level administrative divisions are included only for Italy at this moment, but I am able to add them also for United Kingdom, France, Spain and United States if you want.
100,000+ cities represented on map
Cyan – 1,000,000+, Green 500,000+, Yellow 100,000+
The database itself does NOT contain GPS coordinates, on this map the city location is generated automatically by Google Fusion Tables, some cities appear in wrong location due to having same name with another city or alternate spellings.
City database history
Geography is my oldest hobby, starting around 6 years old I was studying geographic atlases and already learned all the countries and capitals. Between 1998 and 2005 I done few works in Word / Excel based on World atlas books or digital encyclopedias, updated them and published on website for first time in 2015.
2003-2005 – I compiled list of cities of Europe, former USSR countries, Middle East and Africa. Using Encarta 2002, which have a limited amount of cities shown on map so I was planning to include ALL cities shown on map, meaning cities over 20.000 people, but for biggest countries of Europe would have been too many so I have included only cities over 100.000 people.
2005 – my family installed internet connection. I discovered Wikipedia and no longer saw a purpose of making my own world cities database, the internet provided already up-to-date population numbers and updating my table constantly is a waste of effort. I left geography and migrated to other hobbies such as architecture, without finishing the list of World Cities as intended (to include all cities over 20,000 people from EVERY country).
2006-2007 – I included few more countries: China, Japan, Mexico, etc, using Encarta 2006, this one having a lot more cities shown on map, would take a lifetime to include all, so I decided to include only cities over 100,000+ people. Then I let the city database abandoned.
2015 – given by the success of one of my other hobbies, Car Database, which turned into a business by selling Excel databases to companies and web developers, I started questioning whenever the World Cities hobby, if converted to Excel, may be useful for web developers for CSV and MySQL databases. I published for the first time the incomplete city database on my website, one-country SAMPLE, inviting people to contact me if they wants the Word file for free, or Excel conversion as paid service.
Over next months I completed remaining countries from Asia, North and South America, including cities over 100,000 people, using Encarta 2009 (last Encarta). I created 2 Word files, one for all countries including cities over 100,000 (60 pages) and one for 20 biggest countries, including cities by province over 20,000 or 100,000 depending by case (70 pages).
2016 August – first person contact me to ask for city database. An Indian wanted list of cities in Excel, I offered him to convert the Word to Excel but he was not patient and asked me to give what I currently have (the Word file). However, I started converting it to Excel to be ready for sale when next customer is coming. I created 2 Word files, both containing all countries, LITE one containing cities over 100,000 and BIG one contain cities over 20,000 by province, plus the Excel file similar with Word FULL one, and put them for sale so people can buy directly (instead of contacting me to get complete database). Also expanded the page to get more traffic from Google.
The city database is NOT FINISHED and will never be 100% complete, but may be sufficient for your needs, if not, I can do additional work if required, such as adding cities between 20,000 and 100,000 people from countries that I have not added yet, or do changes in data structure to make it more suitable for your project. Contact me and tell me your requirements!
By November 2017, 5 people purchased it, reason for which I decided to offer an update adding more cities from specific countries.
Note: city classification is done based on the icon shown in Encarta Encyclopedia 2009, and it do not always reflect the actual population indicated in city article. For example there are cities with icon of 1,000,000+ but having population around 900,000, as well as cities with icon of 500,000+ but having population over 1 million. Most cities under 100,000 do not even have article to indicate exact population. This was Encarta Encyclopedia, the only resource of information available for me before connecting to internet.
Another world city database
In November 2017 one customer gave me his world city database in exchange of a database made by me, with permission to resell it on my website.
This world city database contains 76799 cities, sorted by 212 countries, 51 US states, 13 Canadian provinces, 4 UK countries. For other countries it do not contain administrative divisions. The database focus primarily on United States, as it contains 43299 cities from US compared with 33500 cities from rest of world. Price: 1 dollar per 1000 cities.
Romania city population .XLS – I made this originally in 1999 in Word sourcing data from books then remade in 2006 in Excel format sourcing data from Wikipedia pages of each city (hours of work!). It does contains all 320 cities of Romania with their population at every census from 1912 to 2011 (next census probably in 2020) except when they did not had city status at the census year. Why Romania? Because here I was born.
United States city population .XLS – including 304 cities with over 100.000 inhabitants. Source of data: Wikipedia, copy-pasted in Excel and enhanced visually. Contains population at 2010 census, 2015 estimate, as well as area, density and GIS coordinates, latitude and longitude.
I think that I can also make a worldwide database of cities, sorted by country and administrative divisions, preferably without population because require too much effort and each country have different census year. But before starting this megalithic work I need to know who would need such database. Please tell how do you intend to use city data and what info / columns the database should contain!
Other geography stuff
In 1998-2003 I also wrote lists of natural features, seas, rivers, etc, and astronomy-related stuff…. all in Word. I started using Excel more than Word in 2003. Today I consider them useless because such info exist for FREE on Wikipedia and is not necessary to make my own databases.
News: I made a new database in April 2019, for someone requesting me to scrap data from SkyscraperCenter.com primarily interested in companies, tenders, architects and engineers involved in each building construction.
After making databases of HDB and condos from Singapore, as well as public and private housing estates from Hong Kong, and other databases, in November 2015 I started making a database of skyscrapers, published it in January 2016, having over 15,000 buildings taller than 100 meters. Under construction buildings are included so the database will be valid for next 3-4 years.
This was done soon after learning about web scraping in August 2015. I made the skyscrapers database for purpose of showcasing my ability to create databases by scraping data from websites, and for my personal interest in studying evolution of skyscrapers over history.
Free version, containing:
Tallest buildings today 300 meters / 984 feet, 218 buildings, completed and under construction
Tallest buildings in year 2000, 1980, 1960, 1940, 1920.
Timeline of tallest buildings in the world, in United States, and outside North America.
Download FULL database
All buildings over 100 meters / 328 feet, 15727 buildings, completed, under construction, planned, vision
Sorted by city (~600 cities having buildings over 100 meters), can be re-sorted by height or any other column.
How the Excel database was made
Identify on map the cities with many buildings, check them on Emporis, make a list of skyscrapers for each city (most cities checked do not have any skyscrapers), then input list of URL in scraping software to get details about each building and put in Excel table, then enhance Excel table, remove day/month from construction dates (leave only year for easy historical analysis), etc… dozens hours of work!
How the Excel sheet can be used
Filter the table and make top tallest buildings by country, state, city, building usage, etc.
Filter the table by year and make top tallest buildings in any given year of history.
Analyze data and make statistics what cities have most buildings over 100, 200, 300 meters.
Convert Excel file to CSV or MySQL and create your own skyscrapers website or mobile app.
I offer buildings database as web scraping service from Emporis.com. Contact me for a price quote based by your requirements (worldwide or a list of specific countries, skyscrapers only, buildings over X meters, etc). If you want updates in the future you need to pay each update 50% of initial price.
New database: 160,000+ buildings from United States
In April 2016 someone seeing my skyscrapers database asked me if I can create a database of buildings in Ontario state of Canada. DONE!
In November 2016 someone asked me if I can scrap data from Emporis to create database of not just skyscrapers but also high-rises in United States! DONE in about 5 days!
Buy FULL database:
He further asked to include low-rise buildings. So I decided to re-scrap website with all types of buildings and also included all 50 data fields and no just the ones of my and his interest.
DONE in about 2 weeks, including 20 hours of manual work to build the list of URLs for all buildings then many hours of scraping. Emporis have an anti-scraping feature blocking my IP after accessing 1000-2000 pages, so had to run in batches of 2000 buildings (taking 15 min each) then join the files and re-scrap blocked URLs. A huge effort than deserve this price, not counting highly valuable information collected.
These 163,938 buildings are not all buildings ever built, but all buildings included by Emporis as December 2016 (except New York City, note below). Maybe not 100% complete, but still the BEST resource of building information that you can get.
Emporis include famous buildings, and most (if not all) large public buildings, commercial buildings, churches, etc. But in case of residential buildings most are not included because are not famous.
Note for New York City: this is the only city in the world where every single building was included in Emporis, including 390,000+ low-rise buildings and 320,000+ houses (more than double of whole rest of US). Would have taken extra 30 days to scrap them all, and the customer said “need few dozens buildings from every US city” and “in case of NYC scraping high-rises and skyscrapers is more than enough”.
Meantime the original skyscrapers database did not had any sale… so I changed page name from “Skyscrapers database” in “Buildings database” and published the US all-buildings database, in case other people are interested too.
History of skyscrapers
United States is well-known for skyscrapers, but in reality only a small area in the downtown have skyscrapers, and the era when United States was the only country with skyscrapers ended long time ago. A lot of high-rise buildings were built since late 19th century. What is less known, is that early skyscrapers were not so well received by population and most US cities set height limits to counter the “skyscraper race”, but in New York the law failed to be approved, so taller and taller skyscrapers were built. Early skyscrapers had continuous facade at street line, ugly side walls and airwells, making street to look like canyons. 1916 Zoning Resolution prevent buildings to cover entire plot of land.
The 1930 recession ended skyscraper construction in North America, for about 30 years no more tall buildings were built. Between 1933 and 1953, in entire world were 19 buildings over 600 feet (183 m), all of them located in United States, 17 of them were in New York alone (source).
A thing less known, Russia attempted to take away the title of world tallest building from USA, with Palace of the Soviets started in 1938, but due to the World War II the construction was abandoned. After war, seven smaller but still megalithic buildings were built. Moscow State University completed in 1953 was the tallest building outside United States, until 1975.
Since 1990s a new skyscraper race began, probably helped by computer technology. New buildings over 300 meters appeared in Malaysia, China, Taiwan, as well as Europe. The world had 26 completed buildings over 300 meters as 2000, 53 as 2010, 126 as 2015, and 218 including under construction ones, according my database.
Asia started building skyscrapers with over 200 meters slowly during 1980s and nowadays it have most skyscrapers under construction.
Singapore’s UOB Plaza was tallest building outside United States between 1986 and 1989, and remains tallest in Singapore due to the height restrictions.
Malaysia surprised whole world in 1998 when completed Petronas Towers with 452 meters, taking the title of world tallest building from United States, and now is constructing KL118 with 635 meters, which in 2020 will become 3rd tallest in the world.
China had in 2000 only 4 of the world’s 25 buildings over 300 meters, and now it does have 98 of the 218 buildings over 300 meters according my database (including under construction).
Buildings over 500 meters have been proposed also in India, Indonesia, Philippines, South Korea. These countries may join the race during 2020s. How the earth will look like in 2050?
Latin America has been building skyscrapers since 1930s, Martinelli Building (built 1934, 130 meters), then Altino Arantes Building (built 1947, 161 meters), being also the tallest building outside United States. But the evolution was slow and not much taller buildings were built over years. Presently Brazil does have few hundred buildings over 100 meters, mostly residential, but the tallest, Mirante do Vale (built 1960), is only 170 meters. Tallest skyscraper in South America was Parque Central Complex built in 1983, only 225 meters until 2014 when Gran Torre Santiago was completed, with 300 meters.
Europe is the place where skyscrapers enjoyed little popularity, except Russia. Most skyscrapers of Europe are located in Moskow. Eiffel Tower was the tallest structure in the world from 1889 to 1930.
Middle East does have relatively few buildings over 100 meters but between 2012 and 2015 it had the two tallest buildings in the world, Burj Khalifa and Makkah Royal Clock Tower, and now is building a new world’s tallest, Jeddah Tower, 1 kilometre high.
Total buildings in database: 15727 (as January 2016)
10212 existing buildings
1867 under construction buildings
1833 planned buildings
1764 unbuilt buildings (cancelled / vision)
51 demolished buildings
4142 North America (US & Canada)
1031 Latin America
519 Australia & New Zeeland
1395 Europe (including Asian countries of the former Soviet Union)
1301 Middle East
412 South Asia
1424 South-East Asia
5377 East Asia
Countries with most buildings:
3983 China (including Hong Kong and Macao)
3383 United States
550 United Arab Emirates
526 South Korea
Cities with most buildings:
1327 Hong Kong (1270 completed)
879 New York (670 completed)
478 Tokyo (412 completed)
470 Dubai (247 completed)
464 Chicago (297 completed)
438 Toronto (222 completed)
Number of buildings with over # meters
See how many tall buildings were built starting from 1980s and especially during 2000s, probably helped by computer technology. How the world will look like in 2050?
Update October 2018: I made a web scraping software to extract data from Apple Music with a speed of ~2 seconds per music album. I can provide Excel / CSV files of your favorite artists / bands in a matter of minutes, up to 10 artists FREE of charge. Watch the video!
If you are professional looking to pay for a larger database (dozens, hundreds, thousands artists), you are invited to discuss project requirements.
Original music database with songs rating
Shortly after connecting to internet in 2005 (I was 16 years old), I started creating a database in Excel with the MP3 songs downloaded, to review each song and give a rating from 0 to 16, and make top best artists, best albums, best genres, etc, using complex mathematical formulas, to show to friends exactly what music I like and how much.
This music database was NOT intended to contain every possible song released, but ONLY my favorite artists / bands plus small a selection of artists / bands representative for each region of the world and each music genre.
Database is useful for me to organize my own music. How useful is for other people, I don’t know… your feedback is needed to make a better database for YOU and future customers!
Some local friends, the ones who know me in real life or the ones I sent my music to them via internet, blamed me for listening to the shittest music possible. What is the problem if I rarely listen music from my own country? if I can speak 4 languages and I listen music in languages that they do not understand, as well as other languages that I do not understand myself too, does not mean that the music is a fucking piece of shit. I love collecting music sung in as many languages possible and I do not always care about the lyrics.
Also the friends said that this Excel Music Database is the craziest thing made by me, or most useless thing they ever saw, and suggested me to STOP wasting time doing such things.
Music database evolution & releases:
Jun 2007 edition – 2000 rated songs, 37 artists in top
Feb 2008 edition – 3000 rated songs, 57 artists in top
Aug 2009 edition – 4000 rated songs, 71 artists in top
Apr 2012 edition – 5000 rated songs, 92 artists in top
Late 2013 edition – 7523 total songs, 5467 rated songs, 97 artists in top
Jun 2016 edition – 8365 total songs, 6043 rated songs, 105 artists in top
Feb 2019 edition – 32078 total songs, 6839 rated songs, 110 artists in top
Note: until 2016 edition with 6000 rated songs, the database contained only artists from which I have downloaded mp3 files, listened them and rated at least a couple of their songs. The 2019 release included also the best-selling artists regardless I downloaded mp3 and listened them to add ratings. The top include only artists with minimum 10 songs and at least half of their songs rated.
How my hobby for music started
Hobby for music started in 1999, after replacing our 486 computer with 200 MB hard disk with an AMD K6-2 333 Mhz with 4.3 GB hard disk, my dad brought from friends some CDs with mp3 songs, mostly pop and rock songs from 1960s to 1990s, and made a selection of songs according his own preferences, unfortunately deleting many songs that I liked. The he burned the selected songs on 2 CDs. I had no rights to decide what to listen, only parents were putting music in our home, and for many years, my dad’s songs selection was the only music I was listening. I wonder, if I did not had these restrictive parents, were my music preferences different today?
Out of my dad selection, I also made own selection of few dozens songs which I was listening only when I was home alone.
After 2003 revolution and removal of restrictions imposed by family, I was able to listen my music anytime I wanted, and soon I got bored by my few dozens songs, I was desperate to get more music, and started recording via TV-Tuner (ending in having lots of songs in bad quality). I wanted more songs from certain artists, I went to music stores in the city but my parents did not agreed to pay money for music CDs.
In 2005 we connected to internet so I was able to download music freely for first time using DC++ file sharing network (Youtube was not yet launched). In November 2004 I accidentally turned on TV when was Junior Eurovision Song Contest, won by Maria Isabel. So, first songs that I downloaded after connecting to internet were Maria Isabel’s 2 albums and 2 more children artists discovered while looking for Maria Isabel: 3+2 and Danna Paola, additional songs from the artists I already had songs (Aqua, Shakira, Shakin Stevens, Thalia), also originals of about 100 songs recorded from TV in bad quality.
In 2006 while looking on Youtube for Danna Paola I found accidentally a video featuring Tatiana, so I started looking for Tatiana music too, on ARES (file sharing software popular in Latin America) and direct downloads (like MegaUpload) and by this way I downloaded also Fandango, Flans, Timbiriche, R.B.D, and got addicted to Mexican 1980s-1990s pop-rock. In 2008 I had 3000+ songs of which 30% being from Mexico. Tatiana remaining my all-time favorite even in 2013. In the same time I started watching Mexican TV shows and I learned Spanish.
I was looking for more diversity, so since 2009 I also downloaded music from other Latin American countries, and got addicted to Brazil country music as well as 3 big artists hosting children shows (Angelica, Eliana, Xuxa), which generated bad comments from my overseas friends (are you retarded? why do you listen to children music?), by this way in just one year I learned Portuguese to the level I am able to understand lyrics of any song. I also downloaded American country music (Alan Jackson, Garth Brooks, Shania Twain, Taylor Swift, etc), British, French, German, Italian pop, rock and folk (ABBA, Al Bano & Romina Power, Alizee, Andrea Berg, Ricchi e Poveri, etc), Japanese and Chinese pop music (many small artists), I liked all them but none caused long-term addiction until 2013 discovery of Kyary Pamyu Pamyu. There is also music that I can’t tolerate: Arabic and Indian music, and most of hip-hop music.
The idea of creating an Excel music database
The idea of using Excel to make table with songs, and rate each song, dates back from 2002. I added all songs in WinAmp, clicked “generate HTML playlist” and copied into Excel, then added a numerical rating. This means no complete discography of any artist, no year of release, etc.
In 2005, thanks to the internet connection, access to internet music stores and filesharing networks, I could get information about artists and complete discographies, with album names and release date, I decided that is the time to start making a serious music database.
I did not intended to include ALL songs from my computer, or to reach certain number of songs in database within specified deadline. I just added artist by artist at random basis, originally adding only my favorite artists (most of them having short music career), and since 2008 I paid attention to famous artists, adding in database a selection of artists representative for every region of world and every genre of music.
When I started database in 2005, iTunes was the biggest music store and I could copy-paste whole album’s table of song with just few clicks, so columns in my database matched columns in iTunes. iTunes app was redesigned in 2010 so I had to copy songs name one by one. Database also contains albums sourced from other websites if they are not available on iTunes, as well as names of mp3 files found on the internet (possible incorrect spelling).
In 4 years, the music database reached over 4000 rated songs, after which I continued to add new songs at slower rate.
I published music database on my website in 2010 with a simple download link. While my other childhood hobbies such as car database and real estate databases gained interest along professionals paying big $$ for a database and allowed me to make a living, the music database turned to be one of most USELESS things that I ever made!
As 2016 I released a new edition with 6000 songs, together with writing this long article. I made a “free purchase” button that require visitors to enter an email in order to download files. Over 200 people downloaded it and I emailed a couple of them asking if my database helped their needs and how they use it. Only 2-3 replied saying that were needing an Excel spreadsheet for a school project. They were not even interested in music!
Starting from March 2018 I put for sale at $1, inviting people to contact me IF you want to download it for free. By end of year 10 people paid $1 without any prior communication with me, meaning that people are willing to pay money for a music database, but what should contain the database to be usable for you?
A new era started in 2018: I discovered Spotify, a music streaming service where you can listen full songs free of charge. I also used for the first time a web scraping software to get data faster from Apple Music, at rate of 2 seconds per page (album), Amazon Music Store may be bigger than Apple, I found many artists on Amazon that do not exist on Apple, but due to inconsistencies between various albums on Amazon I need to spend extra time to clean up scraped data, so I prefer to source data from Apple Music unless it miss my favorite artists.
I expanded database with few extra columns such as Source of data or Record labels, and using scraping software I quickly added thousands new songs in database (also able to create custom music database of any artist / band at your choice) then using Spotify I listened full songs and rated them without having to dig for mp3 files on torrents and other pirated music download websites.
Songs rating system
Since the Music Database was started in the era I was fascinated by Base-16 numbering system, the songs are rated with numbers ranging from 0 to 16, originally lower values being better, but in 2016 I inverted the ratings, making higher values better. Total: 17 possible values, which is my birthday and my favorite number.
Rating is composed from 4 categories, each having value from 0 to 4.
Sound: I love instrumental diversity and guitars. Some rock and country songs can win rating 4 in this category, pop songs are around 1-3, while hip-hop songs have rating 0.
Voice: I love nice voice and lyrics diversity, but I don’t care about the lyrics content. The songs sung in languages unknown by me or artificial languages can win rating 4 too. The rating drops if lyrics contains too many repeating words, or if the song is only instrumental, the rating is 0.
Mix: I love the songs which have a continuous and fast rhythm. Some dance songs can win rating 4 in this category. most rock songs have rating 2-3, most pop songs have rating 1-3, slow songs or bad mixed songs gets rating 0.
Addiction: some songs attract me so much that I listen them again and again for hours, they win rating 4 in this category, they are bubblegum dance, Japanese pop as well as songs from children show of Latin America (this is what attract negative comments from my friends, that I listen childish music, music for retarded people, etc). Rock and country despite of winning in other categories, makes me bored after listening few times so they have rating 1-2, while the louder songs like hard rock which make pain for my ears that I cannot listen a song until its end have rating 0.
To rate each song, is enough to listen 30-second preview on iTunes, but I prefer to rate only when I download full songs. Addiction rating is hard to be decided initially and sometimes I modify it after days or months. The 17 ratings are distributed like Gaussian curve, but asymmetric, rating 8 having 10% of songs, rating 0 having 2% and rating 16 having 0.2%.
My everyday playlist is composed by songs rated from 12 to 16, including songs with rating 8-11 temporarily and keep them if addiction rating is 3 or higher. This create a playlist of about 20% of songs included in database.
Artist ranking system
In 2005 I made a ranking based on average ratings of all songs of each artist. But this turned into a problem: the top places were occupied by small artists that produced just few but good songs, while the most famous artists occupied last places. Is natural that the artists with long career to not be able to make many songs good as the few good songs.
In 2006 I added a SCORE for each artist calculated by a more complex formula. I added columns for number of songs and the total value of songs. Song value is calculated like inverted binary logarithms: value 16 divided by every rating, a song rated 0 have value 1, a song rated 8 have value 2, a song rated 12 have value 4, a song rated 14 have value 8, exception for rating 1 which have value 12 and rating 0 which have value 16.
In 2008 I further improved the ranking by adding a multiply factor for song diversity, calculated like this: total value of songs divided by number of songs divided by average song rating, sum resulted square rooted and and multiplied by 2, resulting a multiply factor between 1 and 1.5. Artists having diversity, few good songs in a total of mostly bad songs, are helped by having higher multiply factor than the artists who have all songs at same medium rating.
How the score is calculated: average song rating (ranging 0 to 16) multiplied with 4 (I can increase this multiply factor to boost artists with one but good album or reduce the multiply factor to boost artists with long career), plus square root of total value of songs (ranging 4 for one-album rappers to 30 for Tatiana’s 20+ albums), sum of these 2 is multiplied with diversity factor between 1 and 1.5, them multiplied with 128 to get a nice-looking 4-digit score for all artists varying from 2500 to 9000+. This numerical value have no other meaning than classifying artist in top. Do not consider that an artist with score 8000 is two times better than an artist with score 4000.
Leave comments at bottom of pages!
Comment house designs, real estate writings, etc.
Constructive critics are more appreciated than praises. Feel free to comment the comments left by other visitors too. Don’t leave the communication to be only visitors vs site admin.
When the website was last updated?
Every week I do changes on some pages of website. Few pages may be unchanged for few years, because I lack ideas what else to write or they provide already sufficient info.
If you think that a page needs update or have ideas for new content, please leave a comment!
See the list of updates
What else do you like to see on site?
Don’t say “is your website, do what do you like, I do not like what you are doing so I go elsewhere“. Yes it is personal website but is made for YOU, not for me! Public opinion matters!
Do you hate website theme?
Show me another website that you like more… and I will try to replicate its theme. The theme need to follow few requirements.