Having the fortune to work as CSO for Talis, an innovative UK software company, in one of the most exciting times for software and the internet; I thought I would share some ideas and insights I am finding exciting at the moment.
The million tonne beam
Over the last few years, I have been involved in many arguments about different approaches to software development. One of the recurring themes is "Software engineering should be more like traditional engineering. More repeatable". It is a worthy argument.
Often it is said "well engineering as a discipline is much older than software so of course it has a much stronger analytical approach i.e. maths". But I wonder if it is the age of the discipline at all that makes the difference? There is one other huge difference between, say structural engineering, and software engineering. Moore's Law!
What would structural engineering methods look like if price/performance of construction material doubled every 18 months?
So that between 1970 and 2005 there is a 1 million times increase.
A beam supporting 1000 tonnes would be lighter than carbon fiber and cheaper than paper.
The vast majority of existing types of construction could be put together in a safe way by anyone i.e. even the worse design would be many times stronger than the loads. What would be the need for such a disciplined approach to design as engineers use today.
Also, every 5 years a whole new vista of possible uses for construction would become economically viable. Things you just wouldn't even think of doing with the existing price/performance. There would still be projects that stretched the very limits of what was possible but the vast majority would be much more about working with the client to understand exactly what it is required; because of the cheapness and simplicity, building in module parts that could be thrown away or altered if not right might be the cheapest way to get to the final RIGHT solution for the customer.
I.e. The limiting factor on satisfying the customer is the customers ability to describe what they want. So rapid iterative approaches to showing the customer what a particular solution could mean for them in practise would be very valuable.
That is, the real cost is in the work for the customer to understand what they really need or want rather than the costs of construction or materials.
You can see what I am getting at. You might need Agile structural engineering!
Costs in today's engineering lie in a different place to most software projects. The comparison, for the majority of software development projects is not good. Of course for safety critical or low level real time projects the story is different , but these are a very small subset of development projects.
When solutions are not pushing the envelope, performance can be scarified for simplicity, this is perhaps what has allowed the software stack to form and evolve. We have the raw power to hide the complexity so most developers today do not need rigorous engineering methods to achieve customer satisfaction.
In fact as we add yet more layers to the software stack (and I am convinced that web2.0 and the semantic web need new layers) we lower the skills and economic barriers to software development enabling a new audience to participate. The concept of the developer itself shifts. For example, is the departmental boss who uses access to create a simple database application that solves a niche problem in his department a developer?
Maybe we should talk about application authors, application developers and software engineers as very separate disciplines that would typically work at different layers in the software stack?
What happens when the average person can "author" application software just as they can now be global publishers of dynamic content on the web?
So maybe it is more correct to relate software development with the whole construction industry, not just engineering.
If I need an extension built, the local builders can handle that quite easily without a structural engineer because we are not pushing any envelopes.
If I want to be a huge dam, I do need an engineer.
The difference with software is that every 18 months the builders can do ever more impressive works.
On a slightly different point, how many more builders are there than structural engineers?
I expect the number of "Application Authors" will massively outweigh the software engineers. By extension, I would also expect far more innovation to come from the low tech "Application Author" community than the high tech software engineer community, see Web services and the innovators dilemma
Perfect or sloppy - RDF, Shirky and Wittgenstein
The first thing to say here is that this is not an attack on RDF. I do think RDF is great and very useful.
But when I read various blogs from the semantic web community using a trivial argument to debunk Clay Shirky's essay I have to come to his defence.
Clay and Adam Bosworth are smart people, don't you think they understand that you can have multiple different RDF descriptions of the same concept? Of course they do. The point is that this STILL creates a single ontology (RDF itself if you like) because RDF is based on the identity of concepts not the comparability of concepts.
This point is profound and subtle. The same and similar are worlds apart. No two things in our world are the same.
It essential hinges on this, do you believe two people have ever in the history of humanity shared the same (i.e identical) concept. Do you believe that concepts exist as perfect entities that we share or infact do we say a concept is shared when we see a number of people using words in a similar enough way. i.e is the world fuzzz, sloppy and uncertain or is it perfect? Are concepts A Priori or derived?
So I do not think the Semantic Web community is hearing what Wittgenstein and Shirky are saying. There is a subtle yet very profound error in the arguments for RDF and the semantic web.
The artificial intelligence community fell into exactly the same hole, many AI efforts were built on the premises that they just needed to collect enough assertions into one system and they would then be able to use propositional logic to infer answers to questions. The results were poor unless the system was kept trivial.
This seems to be exactly what the semantic web community is trying to recreate, the web contains the assertions in RDF, we pull them together and into a central system (exactly as the AI guys did) and bobs your uncle.
The reasons it didn't work is the same reason that RDF exactly doesn't equate to the Wittgenstein view, and of the islands of meaning analogy.
The AI community tried propositional logic and it failed them. They discovered the need to develop means of dealing with uncertainty, incompleteness, fuzziness because that is how our world is and how we describe it with proposition. Fuzzy logic and neural networks rule modern expert systems not discrete propositional logic.
Even Wittgenstein himself had to go through a similar trial of propositional logic in his great work Tractatus Logico-Philosophicus, which after completing he realised the limitation of that approach and described it thus "the propositions of the Tractatus are meaningless, not profound insights, ethical or otherwise ". He then went on to develop his famous works on the role of language and meaning.
This is the essential error that Wittgenstein points out in his later work. There is no single shared meaning that we all can describe in our different ways. To believe so is to believe that a meaning exists A Priori and that language is just our means of describing it. Instead Wittgenstein turns it on its head and says, meaning is nothing more than the way a word is actually used by people. Now two people let alone two groups ever use a word in exactly the same way. The world is continuous yet we break it up into discrete concepts, however the exact boundary between these concepts is fuzzy and vague. Each persons concept is a slightly wider or narrow than somebody else's. I might say "that is sleet" where as some else might say "that is snow", where is the boundary between sleet and snow or chair and stool.
The truth is no two peoples concepts of anything are identical,.... but they are comparable. The fact that concepts are comparable but never identical is why fuzzy logic, uncertainty and incompleteness needed to be the corner stone of the AI approaches not propositional logic. This is what Clay is talking about.
You say of RDF
"It allows you to describe something and then relate it to another person's description of the same thing that was made using _different terms_"
But this is exactly the error, RDF requires these two descriptions to be about an identical concept if you are to relate the two descriptions.
RDF is fundamentally built upon the premise that two different groups or individuals can describe an identical, not similar or comparable but identical, concept; it doesn’t allow for fuzziness.
Here is an example from a very well defined domain. Two different RDF descriptions of Harry Potter and the prince of darkness exist. Both include many concepts like publication date (is that the date it is first printed, or warehoused or in the shop or the ISBN is registered??) and they share the same concept called Editions sharing the same URI. They have several other differences but at least they share the concept of Editions. Problem is, when exactly is a book a new edition? There are two different covers for this work, adult and child but the content is the same. Some librarians call these two different editions others say it is one edition because the content is identical. So you have a contradiction between these two because in reality the concept Edition, like all concepts, is fuzzy.
The natural reaction to this fuzziness by the RDF community is to create ever more fine grained descriptions, so separate editions with just cosmetic changes from those with content changes and so on. But this just makes the problem worse. The more accurate you try to make the description the more erroneous it becomes.
If you examine the details of exactly what any persons means by and concept you will find they are all different, exactly when does a stool become a chair.
I have seen some truly amazing ontologies with such fine grained concepts that I certainly couldn't say what the differences were meant to be.
So I say to the semantic web community "Don't you think the problem is more fundamental that a shared data model? Is it not that fact that our world is fuzzy, the way we transcribe it into computers is fuzzy, computers are not fuzzy and don't deal well with similarity. "
Does this mean that RDF and OWL are useless. Of course not. They will solve many problems, but as Tim Berners-Lee admits himself (http://www.w3.org/DesignIssues/RDFnot.html search for FOPC), only over trusted consistent data. Which is Clay's point, there is not much of that!
Web Services and the Innovators Dilemma
Web 2.0 is a vision of the web where content and functions can be remixed and reused to create new content or new applications. Web services and the semantic web are two of the key enablers for this vision but there appears to be dual approaches to both web services and the semantic web emerging. Why is that? Which is best?
Web Services
SOAP & WSDL - opens up new vista of possibilities by solving some of the real hard problems (WS-this that and the other), requires expertise and new infrastructure e.g. toolkits app servers to manage complexity. Unsurprisingly the app server vendors are driving the new standards in enterprise software.
REST - open up a new vista of possibilities by making it very easy to use web application APIs, so new audiences can get involved and doesn't require much in the way of changes to existing software stack. This is largely being driven by a very different community from the enterprise web services lot.
Semantic Web
RDF & OWL - open up new possibilities by solving some really hard problems. Requires expertise and therefore tooling and new infrastructure like a new query language, data storage, parsers etc. Driven by standards bodies like the W3C .
XHTML & Microformats - opens up new possibilities by lowering the barrier to participation for producers and consumers, uses existing technology, can be hand crafted i.e. disintermediates the expert.
It seems to me that the difference in complexity and cost between the approaches is actually a symptom of something deeper.
SOAP Webservices are trying to go beyond what expert developers could already do with RMI, DCOM etc.
By its nature it must compete with what is already possible which is mission critical software systems that are Trusted, secure, reliable, accountable, and typically have a high cost of failure. Most of these developers could not buy into a new way of working if it mean going backwards in any of those critical areas.
Similarly, RDF & OWL are trying to go beyond what expert developers can do with semantics in XML today.
If you are familiar with the work "The Innovators Dilemma" by Clayton M. Christensen, you may recognise this as the classic description of sustaining innovation. It must be better than what went before because it competes along the same dimensions with the same audience.
Clayton also describes what he terms as "Disruptive Innovation" of which one type is the low-end disruption. This is where a technically inferior innovation radically reduces the barrier(be that skill, cost or location) to entry thereby allowing an audience that was previously excluded to participate. This competes on new dimensions with a new audience.
This massive new audience is currently excluded from the traditional solution so the disruptive innovation only competes against being better than nothing for this audience.
So disruptive innovation allows a new, less skilled community to participate and do new kinds of things. Almost by definition this community is larger than the community of experts i.e. it is the long tail.
If we consider both REST and Microformats we see that neither are technically as good as Web Services and RDF. But both are significantly easier with lower skill and cost barriers for both producer and consumer. And sure enough Amazon are finding that the vast majority of the users of their platform are using the REST APIs.
Software standards have always had a massive network effect. What good is a standard if nobody else uses it. This makes the size of the community around any standard or approach hugely important. The pace of innovation is also deeply linked to the size of the community that can innovate. Consider the number of web authors(including bloggers) who can probably get their heads around REST and Microformats. It is vastly larger than the community hard core software developers on the planet.
Clayton describes, with many examples, how low-end disruptions rapidly become better and better until the complex high end solutions are pushed off the map.
It wouldn't be the first time that innovation become de facto outside the corporate firewall but eventually become good enough to be adopted by the enterprise.
Am I saying that Webservices and RDF are doomed. The truth is I have no idea, but I doubt it. The reason that experts create these new solutions is because they are needed to solve those difficult problems. But, on the other hand, vastly more innovation is likely when ordinary people gain the ability to do what is a "solved problem" for the expert. I would put money on Web2.0 emerging first from the ordinary web user rather than the software experts.
Microformats:
http://www.microformats.com/
http://www.tantek.com/presentations/2004etech/realworldsemanticspres.html