10 Years of Semantic Web: does it work in theory?

hgzhaolw 2011-11-03

展开全文

10 Years of Semantic Web:

does it work in theory?

Keynote at ISWC 2011,

Frank van Harmelen

Original PowerPoint source without the script is here.

Duck & Birdie

Apology for wrong title

Conference organisers always ask you for a title when you only have a rough idea what you want to to talk about.
Then you get esprit d'escalier and realise what you should have said. Better title would have been "The Semantic Web: does it work in theory?"
When looking back at 10 years of Semantic Web, there's no question about the engineering feats we have achieved, and I'll have a bit more to say about that later.
But: besides the engineering, did we learn any permanent, generic, scientific knowledge? Can we discover any laws that arise from our decade of work?

Jeff Naughton slide

talking about science is always a pretentious thing to do (and I mean talking about science instead of just "talking science").
Jeff Naughton (leader in DB field) recently gave a talk at ICDE, which was all about how we have organised the scientific process, and he used the following health warning at the start of his talk. I've freely borrowed his slide to give you the same health warning:
But notice the flawed piece of logic in the final bullet:
if you don't give the keynote, you might still well be a washed up has been. In which case I thought I might as well give the keynote anyway.
talking about generic laws is certainly a pretentious thing do (and certainly in computer science), so you have been warned.

Philosophical confession

and even worse: before I can speak about our community discovering any scientific laws, I must first explain how I think about scientific laws, and how there could ever be any scientific laws in computer science.

Telescope slide

My view on science is that of a "realist".
Quote: philosophical realism is the belief that our reality, is ontologically independent of our conceptual schemes, linguistic practices, beliefs, etc.
believes in a world out there, existing independently from us
task of science is to find the laws that govern that independently existing world.
so, I'm not a *constructivist,
Constructivists, maintain that scientific knowledge is constructed by scientists and not discovered from the world. Constructivists claim that the concepts of science are mental constructs .

I do not believe that scientific knowledge is just a mental or social construction, and that our scientific laws have only relative and subjective value.

Clearly, such a view realist view, with laws describing an independently existing world would apply to physics, but what does it have to do with computer science?

Laws about the Information Universe

Well, I believe that data, information, and knowledge have inherent structure & properties, and
that there are laws that govern these structures & properties.
I believe we can discover these laws (just like we can discover physics laws).
thus: just like the physical universe "exists out there" (and is not just a mental or social or cultural construction), so is the information universe "out there" (and is not just a mental or social or cultural construction).
Of course, many of the actual objects in the physical universe are our own construction (billiard balls, space ships, nuclear power stations), but the laws that govern these objects are not just mental/social constructs, these laws are "objective", "real", they are "out there to be discovered".
In the same way, the actual objects in the informational universe are our own constructs (programs, databases, languages, URI's), but the laws that govern these objects are not just mental/social constructs, these laws are "objective", "real", they are "out there to be discovered".

Distorted Mirror slide

of course it is the case that our perception of these laws at any particular time during our scientific progress will be somehow coloured by our perceptions and social and mental constructs.
what we perceive to be the universe may well be coloured
- by the limitations of our cognitive machinery,
- by culturally shaped expectations and desires
- by the limitations and distortions of our experimental apparatus
and in general it is hard to distinguish the "real" laws about the external universe from cognitive artifacts and observational bias.
but that doesn't imply that all laws are only fictions of our culturally biased imaginations.
and it is the role of science to continuously chip away at these cognitive, cultural and historical biases to find out what the "real" laws are like.

Now, the parallel with physics is of course a bit pretentious.
Physics is a very mature science, with a high degree of mathematisation.
and it will be a long time before Computer Science will reach the same degree of maturity,

Physics slide

and before we can write the beautiful sets of concise equations about the information universe.
we cannot yet hope for such beautifully mathematised laws, in such a concise language that fits on a very compact space

in fact, Computer Science is a very young field, and I think that instead of comparing ourselves with physics, maybe we are more comparable something like alchemy,

Alchemy slide

historians of science describe alchemy as a "protoscience"
it was not just a failure to turn lead into gold,
it was a "protoscience",
searching for proper goals,
proper conceptual framework
developing their experimental apparatus
and this is now recognised as having lead to the more mature sciences of chemistry and physics that we now know.
and in fact, one of the originators of modern science, Isaac Newton, was an active alchemist.
So there's really no negative connotation to the description of computer science as alchemy, it just describes the fact that our science is very young, and that perhaps we have not discovered many of the laws about the information universe yet.

So, the central question that I will boldly (and perhaps rather foolishly tackle in the rest of this talk is this one:

Question slide

Did a decade of Semantic Web work help to discover any Computing Science laws?

What have we built over the past 10 years

So let's first take a look at what we actually built in the past decade.

We can characterise what we have built over the past 10 years in 3 parts:

Babel Towers slide

We built a whole lot of vocabularies (including the languages to represent them, the tools to construct and deploy them, etc)

Naming slide

We built a whole lot of URI's to name lots of things in the world, in fact, many billions of URI's

Neural Network slide

We connected all of these in a very large network

Engineer slide

But all of these have been mostly treated as one very large engineering exercise.

And it's obvious that as engineers we have succeeded.

Governments (and not just US and UK anymore)
BBC (worldcup football web site)
Retail (GoodRelations),
search engines (schema.org)
Oracle (DB product),
publishing industry (e.g. New York Times)
Electricity de France (personalised energy saving plans for 350.000 customers a day)
etc, just look at my Good News Quiz slide deck at slideshare for many more examples.

Now, remember the goal of this talk is:

Did we learn any science, ideally science that is valid beyond the particular artifacts that we have so successfully built over the past 10 years?

10 years experiment

So what I'm going to do now, is to treat the past 10 years of SemWeb engineering as one giant experiment:

designing languages for representing information and knowledge on the web
building very many ontologies in all kinds of domains
building many ontologies in a single domain (eg medicine)
building DBPedia,
building, populating and linking the Linked Data cloud
the widespread use of RDF, RDFS and OWL across very many domains (these are now the most widely used knowledge representation languages ever, by a very large margin).

So take that as a giant experiment and ask the question:

If we would build the Semantic Web again, surely some things would end up looking different, but are there things that would end up looking the same, simply because they have to be that way?

for example

languages full of angle brackets. If you reran the experiment, surely it would be different, because it's just an accidental choice. That feature isn't governed by any "law in the Information Universe" (or at least not one that I can imagine).
but other features of what we've built what turn out in essentially the same way,
you would find the same pattern over and over again, every time we ran the experiment.
And that is because they are governed by fundamental laws that rule the structure and behaviour of information and knowledge.

So, let's see if we can discover any of such laws, such stable patterns that we would rediscover by necessity every time we ran the experiment.

Now, fortunately, we don't have to start from scratch. Some well known laws of Computer Science already can be seen to apply to our 10 year experiment as well. I'll give you two examples:

Zipf law

Zipf law says that many datasets have long tail distributions
Roughly this means that the vast majority of some phenomenon of interest is caused by a vast minority of items, and that the vast majority of items (the long tail) each barely contribute to the phenomenon

We know from our 10 year long experiment that our datasets also obey Zipf's law, and this has been well documented in a number of empirical studies.

this phenomenon is sometimes a blessing, sometimes a curse
- nice for compression
- awful for load balancing

It's important to realise that knowing Zipf's law helps us deal with the phenomenon, both in the cases where it's a blessing (so we can exploit it) and in the cases where it's a curse (so that we can try to avoid it).

that's why it is worth trying to discover these laws.

Here's a second well known law from Computer Science:

Use vs Re-use

Another known law also applies:

Use vs reuse: use = 1 - re-use
(of course don’t take linear form literally)
lesson from ontologies
Law of conservation of misery, you can’t have it both ways

OK, so now I'll start proposing some "laws" that originate from our own field, and from our own 10 year experiment:

Factual knowledge is a graph

the dominant life-forms in our information space is the graph.

The vast majority of our factual knowledge consists of simple relationships between things,
represented as an ground instance of a binary predicate.
And lots of these relations between things together form a giant graph.

Now this may sound obvious to us in this community, but stating that factual knowledge is a graph is not obvious at all.

For example, if you would ask this question to a DB person, they'd say: factual knowledge is a table. And a logician would say: knowledge is a set of sentences.

I know that you can convert one form into the other

every table is a (simple) graph, and every graph can be hacked into table format (but not so nicely)
every graph is a (simple) set of sentences, but not always the other way round,

but that's a bit beside the point: just because all our programming language are Turing complete doesn't mean that there aren't very real and important differences between them.

So in the same way, graphs, tables and sets of sentences are all really different representations, even with the theoretical transformations.

And the law that I propose says that factual knowledge is a graph
and the DB people may think it's a table, but actually, many of their tables with lots of foreign keys are really encoding graphs.
and the logicians may think it's a set of sentences, but that representation is wildly overshooting the mark (and typically not even aimed at or used for representing factual knowledge)

So let's switch to a less controversial law;

Terminological knowledge is a hierarchy

this law has been rediscovered in knowledge representing and information modelling many times over.
the details may differ, but the notion of simple hierarchies with property inheritance is widely recognised as the right way to represent terminological knowledge.

And this observed repeated invention, makes this a much stronger law.

So to say: this experiment has already been rerun many times in the history of computer science, and this has proven to be a stable finding.

So now I've talked about both factual and hierarchical knowledge. But how do these two types of knowledge compare?

Terminological knowledge is much smaller than the factual knowledge

or alternatively, in a picture:

Small hierarchy, big graph

And again, this may sound obvious to all of us in this audience, but really it wasn't all that obvious before we started the 10 year experiment. And in fact, it sharply contrasts with a long history of knowledge representation

traditionally, KR has focussed on small and very intricate sets of axioms: a bunch of universally quantified complex sentences
but now it turns out that much of our knowledge comes in the form of very large but shallow sets of axioms.
lots of the knowledge is in the ground facts, (not in the quantified formula's)

And with this law, we can even venture to go beyond just a qualitative law, and put some quantitative numbers on it.

Jacopo numbers

Here are some numbers obtained by a Jacopo Urbani, a PhD student in our lab (and some of you will have seen these figures in his presentation yesterday), in the session on reasoners:

three of the largest datasets around (two real, one artificial)
compute full deductive closure of schema hierarchy only
runtime counted in seconds or small number of minutes
then compute full deductive closure of schema + instances
then runtime counted in hours

notice that this is now using an interesting measure of "size" here: we're not just counting triples, but we're measuring somehow the complexity of these triples by seeing how expensive it is to do deduction over them.

And we observe that the graph is 1-2 orders "larger" or than the schema.

So, if we revisit the diagram I sketched before:

Small hierarchy, big graph

then the size of the hierarchy (although already small) is actually still vastly overstated. If we have to believe the numbers on the previous slide, the real size of the terminological knowledge wrt to the size of the factual knowledge is like this

Now the black dot representing terminological knowledge is 2 orders of magnitude smaller than the size of the factual graph.

To put this in a slogan:

"It's the A-box, stupid"
knowledge is much more dominated by specific instances than by general rules

Apparently, the power of represented knowledge comes from from representing a very small set of general rules that are true about the world in general,

together with a huge body of rather trivial assertions that describe things as they happen to be in the current world (even though they could easily have been different).

And again, understanding this law helps us to design our distributed reasoners. It is the justification that when building parallel reasoners, many of us just take the small schema and simply replicate it across all the machines: it's small enough that we can afford to do this.

We've already seen that the factual knowledge is very large but very simple. We can ask ourselves how simple or complex the terminological knowledge is.

Terminological knowledge is of low complexity

When we go around with our data telescope, and we try to observe what real ontologies look like when they are out there in the world, what do we see?

Telescope with OWL

We see very wide spread of expressivity in ontologies, all the way from undecidable OWL Full to very simple RDF hierarchies. But this spread is very uneven: there are very many lightweight ontologies, and very few heavyweight ones.

This is of course well captured by Jim Hendler's timeless phrase:

A little semantics goes a long way (JH)

And combining both this law and the previous law, we can now see that his "little semantics" means both: low expressivity and low volume

We could also phrase this as "the unreasonable effectiveness of low-expressive KR"

And there is another way in which this law is true:

Of course it is nice that we can express also the highly expressive ontologies in our languages (like OWL2).
And some of these languages have very scary worst-case complexity bounds.
But when writing ontologies in these expressive languages, we often find that the behaviour of the reasoners for these expressive languages perform quite well.
In other words: the information universe is apparently structured in such a way that the double exponential worse case complexity bounds don't hit us in practice.

If the world of information would be worst case, we wouldn'’t have been able to deal with it, but apparently the laws of information make the world such that we can deal with the practical cases.

So: for highly expressive KR we could say that it works better in practice then in theory

The next law has of course been staring us in the face ever since we started this work on the semantic web (and it has been staring database people in the face for quite a bit longer):

Heterogeneity is unavoidable

It's for a good reason of course that I choose a Tower of Babel to symbolise our vocabularies:

Tower of Babel slide

A crucial insight that perhaps distinguishes the work in this community from many earlier pieces of work is that instead of fighting heterogeneity, we have seen that it's inevitable anyway, and that we might as well live with it.

And actually, I would claim that the fact that we have embraced this law (instead of fighting it) has enabled the enormous growth of the Web of Data.

Compared to many previous attempts, which try to impose a single ontology, the approach of let a 1000 ontologies blossom has been a key factor for the growth of our datasets.

But of course, embracing heterogeneity is nice when you are publishing data, but it's not so nice when you are consuming data. So heterogeneity is not only an opportunity, it's also a problem. And the question is: can we solve that problem.

Heterogeneity is solvable

I'll argue that yes, heterogeneity is solvable, but maybe not in the way that our community likes to hear).

We can see what's going on by looking at the Linked Data cloud.

LOD cloud

This is the picture we all know so well,
it's carefully hand crafted, and kudos to the hard work that went into it,
but actually the picture is also somewhat misleading.
It (no doubt unintentionally) suggests an evenly spread out cloud of lots of colourful datasets.
The true image of "let a 1000 ontologies blossom".
It suggests lots of connections between lots of datasets

But that's not actually the structure of the Linked Data cloud.

Instead, the Linked Data cloud looks like this:

circular cluster map

This is a picture generated on the LOD cloud as it was last week,
it shows a heavily clustered structure.
And here's the same picture,
but now with some more emphasis on displaying the clusters;

linear cluster map

so, LOD cloud is not evenly connected
(unlike traditional LOD cloud diagram),
but highly clustered
with strong links inside the clusters
and low links between the clusters)
And how did these clusters come about? T
not by ontology mapping,
but mostly by a combination of social, economic and cultural processes:
Why is SNOMED so important in the medical domain? Partly because it was the first to be around
Why will schema.org be so important: Because it carries the economic weight of 90% of the web-search market

etc.

Does that mean that ontology mapping should be abandoned?
No, it doesn't.
Many of the links inside these clusters are created by algorithmic ontology mapping.
But I would claim that this is only possible inside such a cluster, ie the fine-grained structure of the graph,
whereas the the course-grained structure of the graph is determined through social, economic and cultural processes.

For the next law, we must remember that we are not only a semantic web community, but also a semantic web community. So let's look at distribution:

speed decreases with distribution, centralisation is necessary

The original dream of this community has sometimes been formulated as turning the Web into a database.

earth globe slide

But unfortunately, observations from our 10 year experiment tell us rather the opposite:

the Web is a good platform for data publication,
but it's a pretty bad platform for data consumption.

Indeed, the distributed model for data-publishing is a key factor that has enabled the growth of the Web and indeed of the Web of Data, but for data-consumption, physical centralisation works surprisingly well.

And this is not just us finding this out.

Google is combining our distributed publishing with their centralised processing,
Facebook is combing our distributed publishing with their centralised processing,
Wikipedia, etc.
So, you might think that centralisation would become a bottle neck. wrong, distribution is the bottle neck,

The Web is not a database, and I don't think it ever will be.

So if all this massive data has to be in one central place to process it, how are we going to cope? Well, the good news from the Information Universe is that

speed increases with parallelisation

at least for our types of data. I'll show you how well this works.

Jacopo graph 1

This was the performance of triple stores on forward inferencing, somewhere in 2009.

Jacopo graph 2

and this is how much parallelisation improved the performance. So apparently, the types of knowledge and data that we deal with are very suitable for parallelisation.

And it's interesting to see that the previous laws actually help us to make this possible: the combination of

factual knowledge being a graph
terminology being hierarchical
terminological knowledge being small
and of low complexity

(which were my proposed laws 1-4) make the design of our parallel reasoners possible.

So, that brings me to the final law

knowledge is layered

Contrary to the other laws, this law does not come so much yet from our own observations in this field. But other fields tell us that knowledge is like a set of Russian dolls:

Russian dolls

with one doll nested inside the other.

From fields like

Cognitive Science,
Logic,
Linguistics,
Knowledge Representation

we know that statements of knowledge need not only refer to the world, but that they may refer to other bits of knowledge, creating a multi-layered structure.

The examples are plenty: we may say that a fact in the world is true, and then we can say

what the certainty of that statement is,
or what the provenance of that statement is,
or what our trust in that statement is
or at what date that statement was made, etc.

Now curiously enough, there is lots and lots of demand in our community for this kind of layered representation, but our representation language serve this need very poorly. Re-ification can be seen as a failed experiment to obtain such layering, and now people are abusing named graphs because there is nothing better.

So, being more aware of this law would have helped us to create better representation language sooner.

So, we're reaching the end of the talk, final slide in sight:

Final slide

and I'll end with the same slide that I started with:

does it work in theory?
well, what theory?

My hope for this talk is that - many of you might disagree with some of my proposed "laws" - and some of you may even disagree with all of them

but regardless of that,
I hope that I will have prompted you to start thinking about the notion of laws in the Information Universe:
- that such laws may exist
- and it's our task to discover them

And this has very concrete impact on how we organise our community:

it's an invitation to journal editors and conference chairs to also consider papers that have the ridiculously ambitious aim to discuss one of these laws
and it's also a challenge to you:

Of course we won't really redo the last 10 years of our experiment, but when you do your research and write your papers, try to think about what are the repeatable patterns, these laws, and try to separate the incidental choices you make from the fundamental patterns you are uncovering.