Tag Archives: content

Early thoughts on bo.lt

bo.lt is a service that let’s you copy, edit and share webpages. TechCrunch labels bo.lt as “bit.ly on steroids”.

The core copy and edit functionality looks really solid and works well. It finds all the images/css/js and other dependencies of the page and replaces them in the page and serves it up from their own S3 account. They have done a good job with the design as well.

As a user, I can think of times when I would want to use this service, but not as much as I use bit.ly. Most of the time when I share links, I don’t need to edit the pages.

As a content publisher, I am not too sure if I want users to use this. I understand that anyone can copy my content, edit it and host on their servers right now even without bo.lt. I believe bo.lt makes it easier for content publishers to lose control over the content. Any future changes to the original content will not be passed onto users with bo.lt.

According to TechCrunch: “…is content provider friendly in that Bo.lt still serves up a given page’s ads and analytics systems.” I haven’t tested the ads, but bo.lt does seem to append their own Google Analytics code to the existing GA code on the page. I am not sure how they work with other third-party or home-brewed analytics systems.

Another issue as a content provider is that bo.lt now competes with the original page for Search Engine rankings. For instance this page competes with  the original page without any reference to the original page. Currently, this whole model is breaking the web as we know it to multiple versions of almost same content with different URLs. Depending on how important getting indexed by Google is in their strategy, it can be easily fixed by either adding ‘noindex’ to bo.lt pages or even better adding a canonical tag to the original page. Perhaps, this is more of a reason for content publishers to have a rel canonical tag on their pages so that they get credited for their content.

It takes a lot of effort to change elements of a webpage on the fly and expect most of them to function as the original one. I think bo.lt is an impressive technology, but I am not too sure that they will be well received by publishers with their current model. I hope that they evolve and make it compelling for publishers to be comfortable with their product. They have raised $5 Million from Benchmark Capital.

Update 1: Brian Rutledge mentions that bo.lt seems to be moving rel=canonical tag into the body section, making it obsolete. However, that’s not what my experience was. Looks like a bug in bo.lt.

Update 2: I wanted to see if there is a way a bo.lt can be blocked to copy a website, same way crawlers can be excluded using robots.txt. I found that bo.lt does not look for robots.txt prior to grabbing a page. I looked through my apache logs and it never requested robots.txt today. Another problem is that they are faking the User Agent to be Firefox 3.6.4 (see image below). Whois on the IP address confirms it is owned by Boltnet, Inc. The only way to stop bo.lt from copying your pages is by banning IP address 199.204.84.2, until they change it.