RDF: Ontologies and Metadata

hgzhaolw 2011-10-13

展开全文

Ontologies and Metadata

A Draft Discussion of issues raised by the Semantic Web Technologies Workshop, 22-23 November 2000.

Author: Libby Miller
Date: 2000-11-30
Latest version: http:///discovery/2000/11/lux/

Abstract

A discussion of what ontologies might mean in the context of the semantic web. This is not a full and complete description of the workshop: a link to presentions will be made when they are available.

Status of this Document

This is a draft! comments welcome.

Introduction

22-23 November I attended a 'semantic web technologies workshop' [SWTW] in Luxembourg. The workshop was organised under the auspices of the EC's Information Society Technologies program.

The theme of the conference might be summarised as 'making content machine understandable' on the web. The invited talks included presentations about ontologies, the wireless web, multimedia, agents, and business opportunites on the semantic web. There were also 25 short presentations, on a variety of subjects, some very relevant and interesting; some half-baked project proposals.

Although the presentations had many different themes, I am going to look at two basic focal points of the workshop: ontologies and metadata.

Before attending the workshop I had only the vaguest idea of what an ontology was. I knew that ontologies were used in logic programming, and that was about it. The presentations themselves did not provide an easy introduction to ontologies, or explain really why they were so important: this document has therefore turned into a kind of exploration of what ontologies might mean to non-logic programmers - to people who develop subject gateways, or people like me who think of themselves as trying to contribute to the semantic web, but who are not logic programmers.

About Ontololgies

As I understand it, ontologies are rather like classification schemes. They are ways of defining the relationships between objects in the world. But ontologies also have more to them than that: a classification scheme is usually a way of organising objects by placing them under subject categories, but an ontology also defines how you are going to divide the objects up. This might not be by subject.

An example: the description of a hierarchy of employees in a business:

 
employee
 subclassOf person

director
 subclassOf employee 

project manager
 subclassOf employee
 worksOn Project
 reportsTo Director 

lackey
 subclassOf employee
 worksOn project
 reportsTo project manager

An ontology will define the way these things in the world interact (can a project manager be a lackey?) and cardinality constraints (can a project manager work on more than one project?), and so on.

Ontologies have two main functions in logic programming

they provide a way of viewing the world, and hence for organising information.
they are required for interoperability, to define a shared vocabulary and meanings for terms with respect to other terms.

For the first case, for example suppose you are a logic engine who is given the following information:

Libby worksOn IMeshtk
Libby worksOn Harmony

and you want to work out if Libby can work on two projects. We would need to know if IMeshtk and Harmony are projects, and if Libby is a person, and if Libby is a project manager or a lackey or a director. We would also need to know from the definition of a project whether a person can work on two projects, and whether this differs if the person is a lackey or a project manager. This is the sort of information an ontology would need to contain.

For the second case, imagine that you have an inferencing engine that already has information about project managers and so on in one company, but then is given information about a different corporate heirarchy which could be subtly or dramatically different, in terms of the names of its parts and the way they are defined and relate together. In this case reasoning about data with respect to the new way of looking at the world within the old framework would require a 'cross-walk' mapping items and connections in one ontology to the other. Similarly, if you want to combine ontologies to talk about different aspects of objects, then you need to describe how the ontologies relate to each other. This can be a very difficult and time consuming problem.

Ontologies are often very complicated, and are difficult to write, maintain and compare. The problem of building an ontology, say for an organisation, is the same as the problem of building a model of the important elements of that organisation. There will be different ways of looking at the organisation, and there will be different priorities for different people. Then, as you get more information, your view of the organisation may change, or the organisation might be restructured, requiring that you have rewrite the ontology. The problem is rather like deciding on the structure of a relational database and then perhaps having to reorganise it after you have added lots of data.

Ontologies and Subject Gateways

What surprised me was how similar many of these problems are to those faced by the library community in its web manifestation in defining and augmenting systems to organise resources, for example lists of internet sites in subject gateways. Librarians created DDC, a huge organising system for subject-based classification of resources. With DDC, librarians classify resources according to their subject; however subject is only part of the commonly used metadata (data about data) used to describe resources. Books, for example also have a title, an author, a publication date and so on. While a classification scheme enables you to relate books by their subject, the relationships between the properties we see as important about books or resources need to be defined in another way.

For example, depending on how we wanted to describe a book we could say

book
 hasPart chapter

chapter
 hasElement page

page
 hasElement paragraph

...

or, we might find the following more useful for the purposes of finding what we need from a book

book
 hasTitle text
 hasDescription text
 hasSubject descriptor
 hasDatePublished date

So (to me at any rate) a schema like this looks like at least the beginnings of an ontology, although we might expect an ontology used in logic programming to be more formally defined, in terms of the sorts of things a title can consist of, how many titles are allowed per book, and so on.

But this sort of thing, perhaps informally defined, is exactly what subject gateways use in the classification of resources. To a lesser extent, any search engine will also use properties of web pages such as title, description, date and url.

One piece of knowledge that subject gateways and the logic programming experts share is that complex classification systems/ontologies are hard to create and manage, and even harder to share. One approach is to put effort into making the creation, management and sharing of ontologies easier: several of the presentations in Luxembourg were about systems which provided tools for the creation (Ian Horrocks, OIL) and viewing (Mikael Nilsson, Conzilla) of ontologies, and which offered the possibility of sharing them (Dietel Fensel, On-to-knowledge proposal).

Although this approach may help, it doesn't solve the problem that different organisations or users of ontologies will tend to have different needs, maybe differing only subtly, but still making reuse of ontologies difficult. Consider the Dublin Core, which is a set of metadata elements for describing documents on the web, including their title, description, identifier and so on. The Dublin Core Metadata Element Set Reference Description [DC] is a textual description of how one should use the elements to describe metadata.

Dublin Core has a very flat, very general structure, and and its elements look like they should also be useful for describing lots of things that are similar to web documents, for example images. RDFPic (Bert Bos, [PIC]) is a very nice Java tool for embedding classification data inside JPEGs, and uses Dublin Core, but you have to interpret several of the elements to fit into what they really mean in the context of a picture. For example the dc:creator is interpreted as the photographer and the dc:coverage is the place where the photograph is taken. Both of these are compatible with the definitions given in [DC] but are also slightly different to what you use them for if describing a document.

One solution is simply to define a new ontology whenever you need it. However, this can mean that there is a proliferation of ontologies which may not be related to each other.

RDF and metadata

Metadata is just data about data. RDF (Resource Description Framework) can be used to describe metadata. Here's an example of the Dublin Core Metadata about this page generated by DC-Dot [DOT] in RDF

<?xml version="1.0"?> 

<rdf:RDF
  xmlns:rdf="http://www./1999/02/22-rdf-syntax-ns#" 
  xmlns:dc="http:///dc/elements/1.1/">

  <rdf:Description about="http:///discovery/2000/11/lux/">

    <dc:title>
      RDF: Ontologies and Metadata
    </dc:title>

    <dc:creator>
      Libby Miller
    </dc:creator>

    <dc:subject>
      Ontologies and Metadata; RDF
    </dc:subject>

    <dc:description>
      A Draft Discussion of issues raised by the Semantic Web
      Technologies Workshop, 22-23 November 2000; 
    </dc:description>

    <dc:date>
      2000-12-05
    </dc:date>

    <dc:type>
      text
    </dc:type>

    <dc:format>
      text/html
    </dc:format>

    <dc:format>
      15112 bytes
    </dc:format>

    <dc:language>
      en
    </dc:language>

  </rdf:Description>
</rdf:RDF>

Within RDF there is a mechanism for including different ways of classifying things in the same documents, using XML namespaces. This means that several different ways of classifying the world can be combined. The easiest example of how this is done is with RSS 1.0 [RSS], which is a way of describing web resources in a very simple way (using title, description and link) but which can be extended by adding modules under different namespaces, so that you can describe the same links in different ways, adding further information to the description (Eric Van der Vlist, RSS1.0 [RSS-Eric]). An example (taken from [RSS]) is below. The black text is the plain rss channel, essentially just a list of links. The blue text is the dublin core module, which is declared using a namespace
xmlns:dc="http:///dc/elements/1.1/"
and then used to add infiormation about the resource http://c./click/here.pl?r123 such as distription, publisher and subject.

<rdf:RDF
  xmlns:rdf="http://www./1999/02/22-rdf-syntax-ns#" 
  xmlns:dc="http:///dc/elements/1.1/"
  xmlns="http:///rss/1.0/"  >

  <channel rdf:about="http://meerkat./?_fl=rss1.0"> 
    <title>Meerkat</title> 
    <link>http://meerkat.</link> 
    <description>Meerkat: An Open Wire Service</description> 
    <dc:publisher>The O'Reilly Network</dc:publisher> 
    <dc:creator>Rael Dornfest (mailto:rael@oreilly.com)</dc:creator> 
    <dc:rights>Copyright &#169; 2000 O'Reilly & Associates, Inc.</dc:rights> 
    <dc:date>2000-01-01T12:00+00:00</dc:date>
    <sy:updateFrequency>2</sy:updateFrequency> 
    <sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>

    <image rdf:resource="http://meerkat./icons/meerkat-powered.jpg" /> 

    <items> 
      <rdf:Seq> 
        <rdf:li resource="http://c./click/here.pl?r123" /> 
      </rdf:Seq> 
    </items> 

    <textinput rdf:resource="http://meerkat." /> 

  </channel> 

  <image rdf:about="http://meerkat./icons/meerkat-powered.jpg"> 
    <title>Meerkat Powered!</title> 
    <url>http://meerkat./icons/meerkat-powered.jpg</url> 
    <link>http://meerkat.</link> 
  </image> 

  <item rdf:about="http://c./click/here.pl?r123"> 
    <title>XML: A Disruptive Technology</title> 
    <link>http://c./click/here.pl?r123</link> 
    <dc:description> 
      XML is placing increasingly heavy loads on the existing technical
      infrastructure of the Internet. 
    </dc:description> 
    <dc:publisher>The O'Reilly Network</dc:publisher> 
    <dc:creator>Simon St.Laurent (mailto:simonstl@simonstl.com)</dc:creator> 
    <dc:rights>Copyright &#169; 2000 O'Reilly & Associates, Inc.</dc:rights>
    <dc:subject>XML</dc:subject>
  </item> 

  <textinput rdf:about="http://meerkat."> 
    <title>Search Meerkat</title> 
    <description>Search Meerkat's RSS Database...</description> 
    <name>s</name> 
    <link>http://meerkat./</link> 
    <ti:inputType>regex</ti:inputType>
  </textinput> 

</rdf:RDF>

Using XML namespaces in RDF we can use several different ways of looking at the world when describing the same resource, for example a webpage. For machine understandable content however, we need a way of defining how these ontological structures described using namespaces relate together.

In the example

libby foaf:mbox libby.miller@
libby rdf:type wn:person

The RDF Schema RDFS allows you to create a schema for the namespace http:///foaf/0.1/ (abbrieviated to foaf) and to use this to say that foaf:mbox should always point at something of type wn:person, for example. The ontology creation language OIL OIL extends RDF Schema and allows you to be much more specific about what sort of thing a person is, the properties a thing needs to have to be a wn:person and so on.

Suppose we said (using RDF or OIL or something similar) that foaf:person is a subclass of wn:person. Then we have created a one-to-one crosswalk between a part of these ontologies, which is one way of relating them. Several of the presentations at the Luxembourg workshop talked about methods of creating cross-walks between ontologies (e.g. Jerome Euzenat; Atanas Kirakov, OntoMap). But even one cross-walk between complex ontologies is extremely time consuming if done by hand. Another problem with this incremental approach is that it can't help a robot which does not understand either of our schema; nor can it provide a mapping where one has not been created.

However, if it is an RSS 1.0 robot, it will understand the underlying RSS core framework of titles and urls and so it will be able to do something with the data it finds, even if it is unable to interpret the information under certain namespaces (modules).

In a similar way, a very simple base classification schema of things into say, people, documents and organisations, could provide a short-circuit to the cross-walk problem. People would have to define their schemas or data in terms of these simple classifications, but this is different to trying to create a complete universal ontology that can be mapped to all other ontologies without loss of information. Instead, this would be a way of simplifying the data found if there the robot did not understand the schema that the data was actually written in.

Closed and Open worlds

Where interoperability is not important, for example within a closed world system such as the learning environments described by Peter Fankhauser and Luca Bottori in Luxembourg, metadata about objects can be used with a single ontology (or several with a known cross-mapping) to do things like describe the qualities of a student, or in different areas describe the characteristics of a device (Johan Hjelm), describe the characteristics of a document or a multimedia objects (Harold Boley and Jos de Roo). It can be used for classification by hand or autoclassification of results, and for combining objects for curricular or presentations (Wolfgang Klas; Lynda Hartman) to create complex multimedia objects.

But if the management of ontologies is devolved to the people using the web, then mapping between ontologies becomes a very difficult and very labour-intensive process. It might be possible to create ad hoc cross-walks between ontologies as they are needed, and maybe partially automate this process. There is also the simpler idea of 'dumbing down' used by RSS 1.0 and proposed in the ABC document [ABC].

For logical inferencing the problem is even more acute. Again, if inferencing is occurring in a closed world, a single ontology is both necessary and sufficient for it to function. But if it is an open world, there will be large gaps in the information the inferencing engine has, because some ontologies will be unknown, which will render large chunks of information useless.

Essentially, someone, somewhere needs to tell the system that each object in a new ontology maps to something it knows about. One generalized solution for open systems like the web is a very simple mapping between objects in the system and a very simple schema or ontology of things, at the level of documents, people and events. However, it is an open question how one would decide what this schema should consist of.

Conclusions

It seems to me as someone new to logic programing that the methodology of creating complex ontologies and cross-walks between them is not appropriate for the web because it isn't a scalable strategy. The logic programming methodologies might be appropriate for closed systems and B2B applications, but they are not necessarily appropriate for a mass-market, open and chaotic system like the web.

I have been working on ways to make RDF usable and accessible, by using RSS 1.0 and a simple query language for RDF (I actually did a presentation at the workshop [LIB], although I didn't really think it fitted in too well with the general logic programming theme). From this point of view I think that the semantic web need not start as a web of reasoning robots; lots of gains can be made with much simpler systems, which can also perhaps provide provision for inferencing systems at a later date.