Over four and a half years ago, blogtrack.com went live. It’s hard to believe that is has been that long. It originally ran on a Dell P-III Poweredge server that I had leftover from another project I was working on.
At the time it felt like we were on to something. Back then, blogs were starting to ping Dave‘s weblogs.com, but you were only allowed to grab the data once every hour. So on top of checking weblogs.com, we would go out, on demand, and take a digest of the site you wanted to scan. The algorithm was simple enough, we would request two copies of the site, one after the other, and would then run a DIFF on the two pages, which would effectively remove any random elements from the site (advertising, rotating quotes, etc), we would then boil the page down to a hash key, using some standard perl libraries. Blogtrack would then store that array and compare it to a previously-stored array, and if they were different, we knew the page had changed. At it’s height, blogtrack was incredibly popular (in my eyes), but in the end, it was neglected terribly.
Later on I changed the digest algorithms and we could do interesting things like identifying what content had changed, or tracking content as it began to show up on multiple sites, similar to tailrank now.
Almost six months prior to the launch of Bloglines.com, we had a fantastic working demo of an application that can only be described as “just like bloglines” at the time, but we let it slide as we got excited about other projects, namely instigating a blogging platform for universities and enterprises.
That was a lot of fun.