During the early days of the Internet, search was dumb – pure keyword search, and manually maintained directories were the order of the day. But as the Internet exploded, these techniques started to fail until Google showed up with PageRank and quickly dominated search.
And as the Internet kept expanding exponentially, search engine rankings became more important than ever – so much so that 70% of Google’s revenue comes from selling search results ad placements.
And the same was true for websites that rank high on Google. Being on the first page of a search query directly translated to Google gold.
Since the payout is so high to rank on the top of Google search results, spamming and manipulation quickly followed the money, spawning an army of “bad actors” (link farms, link-buying, etc.) to game the system.
As a result, Google aggressively iterated on its ranking algorithms to defend against these practices. (and Google argues, for good reason). This in turn spawned an SEO consulting cottage industry that started the SEO arms race.
And I speak from personal experience – I have been running several successful websites since 2001 using intuitive, common-sense techniques that has allowed me to strike out on my own and be financially independent. I can remember when search engine optimization was much simpler. It all boiled down to having good content and becoming an acknowledged authority in your selected domain – “Content is King“.
However, as the SEO wars escalated, there was a lot of collateral damage. As Google continuously tweaked its ranking algorithm to defend against black hat SEO techniques, discoverability started to suffer, and incented even legitimate websites to start playing the SEO game. On several occassions, Google’s changes even destroyed legitimate web-based business models without warning. “Content was King“.
I’ve survived a lot of these changes over the years, but over time, it has taken an increasingly bigger chunk of my time and I feel like a dog chasing its tail. And ironically, it has taken time away from me focusing on the content.
Google is only indexing the “Public” Web. However, the Public Web, is only a small fraction of the data available. The Deep Web (Invisible Web), by some estimates, is 500 times larger than the Public Web Google indices.
NYC’s Open Data Initiative is part of this Deep Web. And sad to say, data spelunking even in the relatively small NYC Open Data Catalog is still very cumbersome.
Granted, Socrata has done a great job exposing several search mechanisms – it is heads and shoulders above what was available before in the old NYC Datamine. But we wanted to go a bit further than that, we want to increase discoverability in several ways:
Short-term: (aka Google-friendly way – since people primarily use search engines, and developers are people too
)
This helps in a small, “baby-step” way. It still doesn’t address the “Deep Web” discoverability issues of NYC Open Data. It doesn’t even begin to address the need for federated queries (i.e. data mashups) and the need to integrate external datasources.
Longer term, we want to do something more. We want to apply the general principles of SEO to Big Open Data, but without all the bad stuff that came with the SEO arms race.
In my next blog post, I’ll talk about what those additional steps are…