Perfect or sloppy - RDF, Shirky and Wittgenstein
The first thing to say here is that this is not an attack on RDF. I do think RDF is great and very useful.
But when I read various blogs from the semantic web community using a trivial argument to debunk Clay Shirky's essay I have to come to his defence.
Clay and Adam Bosworth are smart people, don't you think they understand that you can have multiple different RDF descriptions of the same concept? Of course they do. The point is that this STILL creates a single ontology (RDF itself if you like) because RDF is based on the identity of concepts not the comparability of concepts.
This point is profound and subtle. The same and similar are worlds apart. No two things in our world are the same.
It essential hinges on this, do you believe two people have ever in the history of humanity shared the same (i.e identical) concept. Do you believe that concepts exist as perfect entities that we share or infact do we say a concept is shared when we see a number of people using words in a similar enough way. i.e is the world fuzzz, sloppy and uncertain or is it perfect? Are concepts A Priori or derived?
So I do not think the Semantic Web community is hearing what Wittgenstein and Shirky are saying. There is a subtle yet very profound error in the arguments for RDF and the semantic web.
The artificial intelligence community fell into exactly the same hole, many AI efforts were built on the premises that they just needed to collect enough assertions into one system and they would then be able to use propositional logic to infer answers to questions. The results were poor unless the system was kept trivial.
This seems to be exactly what the semantic web community is trying to recreate, the web contains the assertions in RDF, we pull them together and into a central system (exactly as the AI guys did) and bobs your uncle.
The reasons it didn't work is the same reason that RDF exactly doesn't equate to the Wittgenstein view, and of the islands of meaning analogy.
The AI community tried propositional logic and it failed them. They discovered the need to develop means of dealing with uncertainty, incompleteness, fuzziness because that is how our world is and how we describe it with proposition. Fuzzy logic and neural networks rule modern expert systems not discrete propositional logic.
Even Wittgenstein himself had to go through a similar trial of propositional logic in his great work Tractatus Logico-Philosophicus, which after completing he realised the limitation of that approach and described it thus "the propositions of the Tractatus are meaningless, not profound insights, ethical or otherwise ". He then went on to develop his famous works on the role of language and meaning.
This is the essential error that Wittgenstein points out in his later work. There is no single shared meaning that we all can describe in our different ways. To believe so is to believe that a meaning exists A Priori and that language is just our means of describing it. Instead Wittgenstein turns it on its head and says, meaning is nothing more than the way a word is actually used by people. Now two people let alone two groups ever use a word in exactly the same way. The world is continuous yet we break it up into discrete concepts, however the exact boundary between these concepts is fuzzy and vague. Each persons concept is a slightly wider or narrow than somebody else's. I might say "that is sleet" where as some else might say "that is snow", where is the boundary between sleet and snow or chair and stool.
The truth is no two peoples concepts of anything are identical,.... but they are comparable. The fact that concepts are comparable but never identical is why fuzzy logic, uncertainty and incompleteness needed to be the corner stone of the AI approaches not propositional logic. This is what Clay is talking about.
You say of RDF
"It allows you to describe something and then relate it to another person's description of the same thing that was made using _different terms_"
But this is exactly the error, RDF requires these two descriptions to be about an identical concept if you are to relate the two descriptions.
RDF is fundamentally built upon the premise that two different groups or individuals can describe an identical, not similar or comparable but identical, concept; it doesn’t allow for fuzziness.
Here is an example from a very well defined domain. Two different RDF descriptions of Harry Potter and the prince of darkness exist. Both include many concepts like publication date (is that the date it is first printed, or warehoused or in the shop or the ISBN is registered??) and they share the same concept called Editions sharing the same URI. They have several other differences but at least they share the concept of Editions. Problem is, when exactly is a book a new edition? There are two different covers for this work, adult and child but the content is the same. Some librarians call these two different editions others say it is one edition because the content is identical. So you have a contradiction between these two because in reality the concept Edition, like all concepts, is fuzzy.
The natural reaction to this fuzziness by the RDF community is to create ever more fine grained descriptions, so separate editions with just cosmetic changes from those with content changes and so on. But this just makes the problem worse. The more accurate you try to make the description the more erroneous it becomes.
If you examine the details of exactly what any persons means by and concept you will find they are all different, exactly when does a stool become a chair.
I have seen some truly amazing ontologies with such fine grained concepts that I certainly couldn't say what the differences were meant to be.
So I say to the semantic web community "Don't you think the problem is more fundamental that a shared data model? Is it not that fact that our world is fuzzy, the way we transcribe it into computers is fuzzy, computers are not fuzzy and don't deal well with similarity. "
Does this mean that RDF and OWL are useless. Of course not. They will solve many problems, but as Tim Berners-Lee admits himself (http://www.w3.org/DesignIssues/RDFnot.html search for FOPC), only over trusted consistent data. Which is Clay's point, there is not much of that!
References (2)
-
Response: Wittgenstein's LaptopOk, I did it again...
-
Response: linen luxuryPerfect or sloppy - RDF, Shirky and Wittgenstein
Reader Comments (9)
I think that RDF/OWL are interesting because they try to mimic the way separated humans access reality : identifying things (URI) i.e cut continuum in discrete things AND bind things together with typed links.
World representations of different people are indeed never the same : that's why we need a standard to prevent two people to create identical URIs.
Sharing concepts is quite hard as the profusion of OWL ontologies or XML Schemas show it : we will need some translations mecanisms, i.e going from owl:sameAs to classes of properties that express similarities...
MIT's Haystack is such a good example of a 1-user system based on RDF.
in your essay. Belief in the validity of the lower layers does not infer belief in the possibility of the upper.
one further point re Wittgenstein and the RDF community. wittgenstein was a big fan of showing not telling.
in my experience the tagsonomers/loose and sloppy crowd have done a better job of illustrating value to the user. who knows what an RDF browser is or what it does? how come noone ever sends me a link and says check out this awesome RDF demo. more of the philip glass browser style please.
It took 20 years for relational databases to mature so I wouldn't be so quick to dismiss RDF. RDF is an "assembly language" for modelling, designed to be scalable to the web. It can work like a RDB in a suitable application area. That's a really good start.
As for sloppiness, any web-wide open system, in which anyone can say anything about any resource, is definitely going to be sloppy and have contradictions. Maybe some Google veterans will tackle that problem with science. Meanwhile much will be done manually - when an inference engine hits a snag interested parties will get a notice and fix the problem wiki-style with an edit or a few clarifying triples.
The Semantic Web is just a way to organize what people know - power of a database underneath, flexibililty of folksonomy on top, cool algorithms in between. What's not to like?