CDDB as the first user-created database
The origin of CDDB is one of my favorite stories about the evolution of technology, and the discovery of value in places where nobody initially expected them to be. This is how I understand it went: sometime in the 80’s a smart guy gets a shiny new Sun workstation with a CD drive. He figures there must be some way to play his audio CDs on it, so he creates Workman. Then he decides he wants to see the track information while the CD is playing, so he types in the info for his CDs. Of course, he doesn’t want to type it in each time he puts the CD in, so he invents a way to use some of the data on the CD as a key, and creates a database to look it up.
Now, he also gives this program away for free, and as a bonus, includes his own database, just in case there’s any overlap between his CD collection and the next guy’s. Soon other folks are sending their databases back to the original guy to be merged into the distribution, and after a while, it’s pretty big. So, he starts distributing it separately, and other clients start to use it too. Then this internet thing comes along, and another guy has the brilliant idea to make it client-server. A few more clients get written, people start submitting lots of CD information, and the database starts to get pretty huge. Eventually, in a very controversional decision, they decide to make the service a for-profit one, and now according to Wired, Gracenote is “a dominant player in music recognition.”
The story I heard matches reasonably closely with this summary on the MusicBrainz site:
The roots of this project are in a software program called Workman for playing Audio CDs on UNIX systems. Workman had the ability to display the name of the track it was currently playing. An index file was used to store the tracknames for each Audio CD. After a while a large index file with information about thousands of Audio CDs was created by the Internet community.
This was a long time ago. The index file system became more widely used when Windows users started using this index file, but the system was not very mature then. The Windows Audio CD player could use an index file with track information, but the index size was limited to 640KB. This meant that Windows users could not use the large Internet index file without correcting software.
In 1996 things changed when the Internet Compact Disc Database was created. Instead of a flat file with information of thousands of Audio CDs, the client/server model was applied. A single central server called “CDDB.com” could be used to access the information of Audio CDs. This server accepted new submissions of Audio CD information. At that stage the index file was reported to have grown by up to 800 Audio CDs per day. But these numbers say nothing about the quality of the submissions. The number of duplicate Audio CDs that now exist in the database is high — 10 entries of the same Audio CD under a different number is not uncommon. Many entries also contain numerous spelling errors. CDDB.com had no mechanism to correct errors.
Despite this, the system became popular and useful. Things changed dramatically when the open CDDB.com server was bought by a company that wanted to make money from the contributions that users had made. The index file created by the Internet community could no longer be copied. Patents were obtained and granted. A large public outcry resulted, and led to the start of several projects to create an Open Source competitor for the commercial CDDB.com (now Gracenote).
I think that CDDB is probably the first good example of a user-generated database. Now, user-generated content is all the rage, but it is interesting to separate it into things that seem more like databases, like CellarTracker and Wikimapia, versus the more free-form things like blogs. More on this from an investment standpoint in a future post.




