lysergicjava

oh, technology

Menu Close

Oh, you crazy semantic web (YAGNI part 2)

Perhaps you’ve heard of the “semantic web“, aka Web 3.0, aka the Amazing Everything Machine.  Perhaps you haven’t.  Either way, it doesn’t make much difference — I’m going to boldly predict that it won’t happen.  (Now, I’m well aware that making bold predictions may make me look like a total fool later on.  I’ll take the risk.  In any event, I’m predicting it won’t happen on anything approaching the scale that its boosters envision.)  Briefly, if you’re new to this, the semantic web is a blanket term for a variety of technologies that will define, in machine-readable ways, the semantics (or meaning) of information on the web.  So for example, rather than just having some arbitrary string of letters saying “Posted by John Haren” (which any literate human can recognize as a byline) you’d have some non-arbitrary, agreed-upon standard string of letters saying “the author of this trash is none other than John Haren”. Well, why bother with that?  It depends upon who you ask.

To the average, sane person, it sounds kinda cool at first.  Having all the web’s semantics defined in a uniform, machine-readable way would, in principle, enable for much more intelligent usage of the web’s teeming content.  Just imagine!  Imagine if your computer understood what you meant when you asked it to search for people in your area with tickets to the opera/art gala/footie match and yet no ride to same.  Why, you could hook up in a twinkling!  Or, maybe you’d like to see what correlation, if any, exists between citing Yukio Mishima and Sylvia Plath as major artistic influences while in college and heavy smoking.  The semantic web is at your service, sir!  Oh, and the answer is yes, there is a correlation, and it is 1.0.  Less sarcastically, the semantic web does hold promise as serving as a vast inference engine.  If realized, it could revolutionize ways that information is organized, and the way that intelligence is gained from information.  What’s not to like about that?

Glad you asked. In a word: work.  As in, it’s too much work.  Here, all along, you thought the web was a huge, fathomless pile.  Well, that’s nothing compared to the pile of work necessary to implement the grand vision of the semantic web.

Grand, or merely grandiose?  The original vision of the semantic web is often (who am I kidding, often?  Always) attributed to Sir Tim Berners-Lee:

“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.”

Visionary!  [clap clap clap clap] How does he do it? [clap clap clap clap]

I guess no one’s stopped to ask Mr. Sir Tim what, exactly, he means by “the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines” but given that sentiment alone, one could make a more than reasonable case that reality has exceeded his vision already.  Nor is it clear what he means by [computers] “capable of analyzing all the data on the Web” — specifically, the analyzing part.  Doesn’t Google Zeitgeist do that?  Depends on what you mean by analyze, I suppose.  What Berners-Lee (or anyone) specifically sees happening has… well, not been made specific.  But it’s hard to get more hand-wavy than Berners-Lee when he said this:

“People keep asking what Web 3.0 is. I think maybe when you’ve got an overlay of scalable vector graphics – everything rippling and folding and looking misty – on Web 2.0 and access to a semantic Web integrated across a huge space of data, you’ll have access to an unbelievable data resource.”

Yeah, Web 3.0 is “looking misty” indeed.  But what, you may ask, is the big deal?  Let’s grant, you say, your argument that the vision of what the semantic web will actually do for us is vague at best.  So what?  Why not just throw some tags in a cloud or whatever the hell it is they’re talking about and see what comes out of it?

Glad you asked, again. Again, in a word: work.  As in, it’s way too much work, on many, many different levels.  Since this is supposed to be a developer’s blog, let’s start from the perspective of Joe. Q. Developer as he approaches the wonderful world of RDF, the principal specification for realizing the semantic web.

RDF provides a standardized way to express facts.  That is, an RDF expression is a statement about a resource, in the form of (one or many) subject-predicate-object expression(s).  So the fact “RDF boils the ocean” can be stated (in RDF-XML format — yes, there are multiple, competing formats) like this:

<urn:concepts:metadata-data-models:RDF> <http://yagni/verbs/boil> “Ocean”

Where “<urn:concepts:metadata-data-models:RDF>” is the subject, “<http://yagni/verbs/boil>” is the predicate, and “Ocean” is the object.  As it turns out, any logically expressible statement of fact can be structured as a subject-predicate-object expression.  Some concepts are very unwieldily to express thus, but that’s not my beef with RDF.  Some concepts are hard to express period, and it’s all well and good if some cognitive tool can make hard things easier, but it’s not necessary.  I don’t think that it’s too much to ask that a new tool doesn’t make easy things hard, though, and that’s what RDF does.

Let’s take a look at some code I had to write recently to read an RDF entry on some book data:

def root = parser.parse(xml)
def product = root[om.Product].find { it.attribute(rdf.about) == "$id" }
def shortID = id - "urn:x-domain:oreilly.com:product:"
def title = product[dc.title].text()
def listPrice = product[om.price][rdf.Description].find { it[dc.spatial].text() == "USA" }[rdf.value].text()
// ha ha, you thought THAT was awesome, wait till you see what we have to do to find the author
// first, find the rdf:resource attribute of the dc:creator node
// next, get the foaf:name of the foaf:Person with the rdf:about attribute = creatorRef
def creatorRef = product[dc.creator][0].attribute(rdf.resource)
def authorName = root[foaf.Person].find { it.attribute(rdf.about) == "$creatorRef" }[foaf.name].text()

And that’s in Groovy, a very expressive and powerful programming language.  All I needed to do was read and parse an XML document (I spared you all that), find a specific book, and pull some simple data about that book, like the author and the list price.  In a half-sane XML document the XPath to get the author node, for instance, would look like “//book[@isbn=’9780596123765′]/author”.  But instead, I’ve got to futz around with attributes in namespaces that reference other attributes of nodes in other namespaces to get a simple string’s worth of data.  It should have been easy, but it wasn’t, and it won’t be the next time I have to do something similar.  With RDF, it’s harder than it has to be to do simple things.

And now, at last, we’re getting to the crux of the problem.  In case after case, it’s just too much work to accomplish my goals.  I now officially have more hoops to jump through, and no benefit for having done so. And that’s after someone burned I don’t know how many months marking up the records in the first place. They used to be half-sane XML.  Now they’re not.  The RDF initiative has been work whose primary yield has been… more work.  For everyone involved.  For now on, until the end of time, or at least until the company comes to its senses.  Yeah, the end of time sounds about right.

Now multiply my frustration by ten bajillion, because that’s what would be involved in marking up even a significant portion of the web.  Now raise that to the power of its own factorial, because that’s what it would take to maintain that markup and ensure that it’s accurate as time goes on.

Because, ultimately, the proponents of the semantic web are basing their vision on two flawed premises: (1) that vague promises of highly accurate, context-sensitive search make all this work worth it, and (2) that real people in the real world are going to do what is necessary for the whole thing not only to work, but to keep working.

Back in the far-off days of 2001, Cory Doctorow predicted that the “meta-utopia” as he put it then “would never happen” because it is “a pipe-dream, founded on self-delusion, nerd hubris and hysterically inflated market opportunities”.  Specifically, Doctorow lists seven reasons why the semantic web will never be what Berners-Lee and his acolytes want it to be:

  1. People lie
  2. People are lazy
  3. People are stupid
  4. Mission Impossible: know thyself
  5. Schemas aren’t neutral
  6. Metrics influence results
  7. There’s more than one way to describe something

And while Doctorow’s examples have some specifics that set one to chuckling (Napster?  Oh, yeah, I remember that…) his reasoning is as strong today as it ever was.  Stronger, even.

Some claim that it’s just a matter of time, that RDF and its ilk will triumph, that people only need to see the benefits of semantic markup and they’ll enthusiastically climb aboard.  I think the opposite will happen: as more people get exposed to the complex and finicky world of semantic data modeling, they’ll throw up their hands in frustration and ask for something simple.  Even if it can’t find them a ride to the footie match.