Semantic Space: Bus topology

This note outlines how a D-Bus interface can form the basis for an abstract semantic space on GNOME and KDE desktops. Nothing is final.

Comments are appreciated, either here or on anders@feder.dk - thanks.

Overview

A dedicated 'semantic bus', separate from system and session buses, is installed on the system. All semantic applications broadcast queries and semantic data on this bus. Backend services (e.g. Soprano) listens to these broadcasts and serves those they support.
Rationale: To spare system and session buses and the non-semantic programs they serve from unnecessary overhead.

"hp: don't know anything about rdf in particular, but as far as dbus goes, I don't think there's any performance win from having a separate bus - only clients that care about a given message would receive it anyway, and the same cpu is processing messages whichever bus process they are in"
This is probably true, and staying on one of the existing buses rather than creating a separate one will automatically turn the semantic space network transparent with the existing buses, if one such day comes.

All messages sent on the semantic bus are RDF data - even queries. A piece of information sent on the semantic bus is said to be 'asserted in the semantic space'.
Rationale: RDF messages are unambiguous and the standard for communication on and with the Semantic Web.

Queries must be transformed into RDF according to a special query ontology (e.g. NRL views) before being inserted into the semantic space; libraries automating this process will be available.
Rationale: RDF is easily transformed and processed in a heterogeneous environment.

Queries may be persistent. Persistent queries are saved by an 'query service', which also monitors matching semantic data on the bus and passes it back to the relevant client(s).
Rationale: Many applications (e.g. mail clients, news readers etc.) needs information continuously as it becomes available, rather than a finite temporal (sub)set.

Query services may also perform semantic mapping on the data in the semantic space, on behalf of their clients. For instance, if a client has requested NCO data, and the query service sees a piece of FOAF information on the bus, the query service may transform the FOAF data into NCO and pass the latter on to the client.
Rationale: Applications are bound to use different ontologies - with semantic mapping, maximum compatibility is possible anyway.

Several special 'mapping services' may assist in this process.
Rationale: Any one service may not be able to know all possible mappings between two arbitrary ontologies.

Data storage is done by storage backends that pick out information they support from the bus and store it as desired.
Rationale: Storage backend designers know better themselves what information they support than we do.

The semantic bus natively supports named graphs (contexts).
Rationale: Countless operations are simplified when groups of statements can be treated together as a whole rather than one-by-one.

The semantic space is a 'commons' shared between all agents connected to the bus and all agents are expected to exhibit care for other users of the space, just like with any other system resource.

Ontologies

The following ontologies would be useful to support the semantic space:

Query ontology: Information retrieval and storage.
Agent ontology: A way for each service on the bus to define what it is and what it does.
Mapping ontology: "Information x can be mapped into ontology y" (OWL equivalence is probably sufficient)

Difference from other systems

The key point that sets the semantic space approach apart from other messaging and storage paradigms is one of 'transparency' - that as much of the information flowing through the system as possible should be made available for other processes to complement.

The semantic space is not intended as a replacement for existing IPC mechanisms, but rather as a convenient supplement to them, specifically for semantic applications.

Tracking the semantic space

(This really falls under the open-world assumption I think, but for the sake of explicitness ...)

To save bandwidth on the bus, services that entered the semantic space at the same time can be assumed to have the same graph of information from the space. If a new service enters, and the existing information is required for some operation by that agent, the given information must be resent.

This will usually occur naturally, with the new service making an inquest when asked to operate on unknown data. In that case it's important that the new service does not rely on caches or the like.

Agents should also take care to not run into asserting infinite loops in the space, e.g. statement A mapped into statement B mapped into statement A mapped into statement B etc. Simple loops can be avoided by not asserting statements that have already been asserted in the space. More complex loops must be avoided on a more discretionary basis.

Possible way to match queries with backends

If the semantic bus is too heavy on backends (i.e. too many messages). backends may be matched with only the exact queries they support through an query service.

Let A be an application, B be an backend, C be the query service.
1. B defines a persistent query upon the semantic space, according to the queries it accepts, and passes it on to C.
2. C saves B's persistent query.
3. A prepares a SPARQL query, requesting RDF from the semantic space.
4. A transforms its query into an RDF construct, according to the special query ontology and inserts the construct into the semantic space.
5. C compares A's query construct with B's saved query and finds a match.
6. C sends A's query construct on to B.
7. B processes A's query and returns the results.

A similar process can be applied by applications that only want to receive a subset of the information flowing through the semantic space.

Interface

All programs connected to the semantic bus are called 'agents'. All agents should implement the org.semanticspace.SemanticBusAgent interface defined below.

Assorted comments on the specification:

The semantic space is one big set of resources / graph of its own.
The Assert/Retract signals allow quadruples to be added/removed from that graph.
Context allows blank nodes to be used in the space (assuming blank node identifiers doesn't conflict within a context).
Retract signals gives agents a simple way to express that graph data should be physically deleted.
Resource URIs can refer to larger graphs (i.e. contexts).
Statement messages can be filtered with just D-Bus match rules, but requires larger graphs to be atomized by sender and possibly reconstructed by receiver.
Queries can be made by generating a new context, setting up a match rule for that context, asserting the query in the context and waiting for response.
- Would destroy stored context information. Perhaps a fifth 'view' parameter could be added?
  - Solution: Use reified triples - each statement on the bus must include an identifier.
AssertResource signal can e.g. assert a quadruple reified as a resource.
Dereferencing of signals will always be a matter of interpretation by the agent (i.e. the developer).

Specification

<!DOCTYPE node PUBLIC "-//freedesktop//DTD D-BUS Object Introspection 1.0//EN" "http://www.freedesktop.org/standards/dbus/1.0/introspect.dtd">
 <node>
  <interface name="org.semanticspace.SemanticBusAgent>
   <signal name="AssertResource">
    <arg name="URI" type="s">
    </signal>
   <signal name="RetractResource">
    <arg name="URI" type="s">
   </signal>
   <signal name="AssertStatement">
    <arg name="Identifier" type="s">
    <arg name="Subject" type="s">
    <arg name="Predicate" type="s">
    <arg name="Object" type="s">
    <arg name="Context" type="s">
   </signal>
   <signal name="RetractStatement">
    <arg name="Identifier" type="s">
    <arg name="Subject" type="s">
    <arg name="Predicate" type="s">
    <arg name="Object" type="s">
    <arg name="Context" type="s">
   </signal>
  </interface>
 </node>

Ontologies

Scenarios

Query service

An application calls a method on the query service with a SPARQL query string as an argument.
The query service asserts the graph pattern of the query as statements in a new context in the semantic space.
Common semantic services now complement the query context as desired.
Agents transform the augmented query context into a suitable format and queries data sources.

Outstanding concerns

Possible race conditions when multiple sources wants to respond to the same query.
- This can be good because: first-come, first-served only encourage speedy returns on queries and always gives clients the fastest response and responses should be identical anyway.
- This can be bad because: if multiple, different results are asserted, which one applies? Are they registered as changes?
- Solution for now: let apps sort it out - they have better contextual understanding anyway.
Assuring clients receives all necessary information at the right time.
- This can be fixed with statement annotations...
Watch out for loops.
The solution to many of the concerns that might be with this architecture is to take a deep breath and realize that most of them are not any different from those of other system resources.

Other

Security: it would be cool if we could support cryptographic graphs somehow, possibly through services.

Conclusion

The bus topology solution to AndersFeder/SemanticSpace outlined above lets agent assert and retract statementes from the semantic space. Annotations may be made with statements with triples/quads as subjects.

Semantic Space: Bus topology

Overview

Ontologies

Difference from other systems

Tracking the semantic space

Possible way to match queries with backends

Interface

Specification

Ontologies

Scenarios

Query service

Outstanding concerns

Other

Conclusion

See also