分享

OntoSim

 yanniyanxin 2013-03-19

OntoSim

OntoSim is a Java API allowing to compute similarities between ontologies. It relies on the Alignment API for ontology loading so it is quite independent of the ontology API used (JENA or OWL API).

OntoSim provides a framework for designing various kinds of similarities. In particular, we differentiated similarities in the ontology space [David2008a] from those in the alignment space [Euzenat2009b,David2010b]. The latter ones make use of available alignments in a network of ontologies while the former only rely on ontology data.

OntoSim is based on an ontology interface, shared with the Alignment API Ontowrap package allowing for using ontology parsed with different API.

Structure

string/
Measures for comparing strings, including the inclusion of second string measures
entity/
Measures for comparing ontology entities (OLA, triple-based, alignment-based, string-based)
vector/
Measure for comparing objects in a vector space (cosine, Jaccard, Kendall)
set/
Measures for comparing sets of objects (linkages, weighted sums)
aggregation/
Classes for aggregating several measures (Means)
align/
Measures for comparing ontologies in the alignment space (e.g., largest coverage, union path coverage, agreement, disagreement).
extractor/
Coupling extractor from a similarity matrix (e.g., Maximum coupling, greedy algorithm, Hausdorff)
util/
utility classes for caching measures values on disk, storing efficiently large sparse matrix of double

API

The Measure interface

The OntoSim API is based on a very simple generic Measure interface. The Measure interface defines 3 main methods: getMeasureValue, getSim, getDissim.:

public interface Measure<O> {
	static enum TYPES {similarity, dissimilarity, distance, other};
	
	public TYPES getMType();
	public double getMeasureValue( O o1, O o2);
	public double getSim( O o1, O o2);
	public double getDissim( O o1, O o2);
}


These 3 methods take in parameter the two objects to compare and return a double value.

This generic interface is instanciated for defining the profile of various basic categories of measures:

String measures
O=String
Entity measures
O=Entity<?>
Set measures
O=Set<? extends S>
Vector measures
O=double[]

MatrixMeasure

The MatrixMeasure interface adds 3 other methods to the Measure interface (getMeasureValue, getSim, getDissim) which take two sets of objects and return a Matrix object or MatrixDoubleArray (already defined in OntoSim). The AbstractMatrixMeasure implements all the additional methods so that implementing it, can be reduced to defining the O parameter of the Measure interface (entity, string, etc.).

This allows to define the SetMeasure interface parametrised by:

  • a local Measure;
  • an Extractor: Thresholding, MWGM, Stable Marriage, max (for FullLinkage), min (for SingleLinkage), max-min for Hausdorff, etc. Some set measures, i.e. AverageLinkage, do not need to use an Extractor.
  • an AggregationScheme: some average (arithmetic, geometric, harmonic, etc. means), weighted sum, etc. Some measures, those which use only one value such as FullLinkage,SingleLinkage or Hausdorff, do not need Aggregation Scheme.

Extractor

The Extractor interface regroups both alignement extractor and filtering notions used by matching algorithms. It is the basis for implementing MWGM, StableMariage, etc. (all methods extract; defined in DistanceAlignment).

The interface is:

public interface Extractor {
   Cardinality getCardinality();
   Object[][] extract(Matrix m);
   //or
   Matching extract(Matrix m);
}

The getCardinality() method returns the cardinality of the produced matching (1-1, n-n, etc). The Matching object is a kind of lighter alignment object.

AggregationScheme

The AggregationScheme interface has two implementing classes: one for averages and another one for weighted sums.

public  AggregationScheme {
    double getValue(double[] vals);
}
With the new version of OntoSim, the DistanceAlignment class should be refactored in order it can easily use OntoSim entity measures.

public abstract class AggregationScheme {
    public abstract double getValue(double[] vals);
    public abstract <O> double getValue(Measure<O> measure,Matching<O> matching);
}

Ontology measures

From this it is possible to define general classes of measures such as:

Measure
|- AlignmentSpaceMeasure (abstract: OntologyNetwork)
|- OntologySpaceMeasure (abstract: )
|- VectorSpaceMeasure (abstract: Collection<LoadedOntology>)
These measures are simply defined with respect to their internal variables.

Example

OntologyFactory of=OntologyFactory.getFactory();
LoadedOntology o1 = of.loadOntology(uri1);
LoadedOntology o2 = of.loadOntology(uri2);
Vector<LoadedOntology> ontos = new Vector<LoadedOntology>();
ontos.add(o1);
ontos.add(o2);

VectorLexicalSim m = new VectorLexicalSim(ontos,new JaccardVM(), DocumentCollection.VECTOR_TYPE.TFIDF);
System.out.println(m.getSim(o1, o2,));

SetMeasure<Entity> cm = new MaxCoupling(new OLAEntitySim());
GeneralOntologyMeasure m2 = new GeneralOntologyMeasure(cm);
System.out.println(m2.getSim(o1, o2,));

EntityLexicalMeasure lm=new EntityLexicalMeasure(new StringMeasureSS(new Levenstein()));
SetMeasure<Entity>  cm2 = new MaxCoupling(lm);
GeneralOntologyMeasure m3 = new GeneralOntologyMeasure(cm2);
System.out.println(m3.getSim(o1, o2));

License

OntoSim is available under the LGPL 2.1 or above.

2 http://secondstring./
3 http://lucene./java/2_2_0/api/org/apache/lucene/analysis/snowball/SnowballAnalyzer.html

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约