How to create your own vocabulary

Let’s build a very simple vocabulary describing obelisks (tall, four-sided, narrow tapering monuments that end in a pyramid-like shape), so that Cleopatra and Caesar can share information about their personal collections.

From plain English to a graphical representation

Let’s start by stating in English what we’d like to put in our vocabulary:

  • An obelisk is owned by a person.
  • An obelisk is built by a sculptor.
  • An obelisk has a height, which is a numerical value.

The highlighted elements of these sentences are going to be the ‘terms’ in our vocabulary. We can identify two types of terms: the things that we talk about (e.g. obelisk or sculptor), and their properties (e.g. built by or height). Let’s make a graphical representation of that by putting the things in bubbles, and the properties in squares:

The obelisk vocabulary

From plain English to RDF

Identifying everything with IRIs

RDF is the language used to build vocabularies for use across the Web (aka Linked Data). In RDF, everything is identified by IRIs (which are simply standard web URIs (Universal Resource Identifiers), but just a little bit more modern in that they can contain characters from a more Internationalised set of characters (e.g. ‘α’, ‘δ’, or ‘ό’) - for more information, see Wikipedia).

First, we’ll need an IRI to represent (or identify) our new vocabulary (as we said, everything in RDF is identified with IRIs!), e.g. http://w3id.org/obelisk/. From there, let’s now update our plain English example a little bit:

As we can see, identifiers quickly become unpleasant to read when they are IRIs, so RDF introduces the notion of prefixes (a simple concept borrowed from XML namespaces). From now on we’ll use the prefix obelisk: to stand in for our vocabulary identifier http://w3id.org/obelisk/, which means our vocabulary now looks like:

Things, and Properties of Things

From the above we can see that we want to describe both ‘Things’ (e.g. Obelisks and Sculptors), and the ‘Properties’ of those things (e.g. their height, or who owns them). RDF allows us to explicitly distinguish between these by referring to ‘things’ as Classes, and ‘properties’ as, well, Properties!

Defining Classes of Things

In RDF, the general things that we can talk about are called Classes. Therefore everything that went into a bubble in our diagram above is a Class, so we could add the following to our vocabulary:

If we look at these sentences, they are structured exactly like the ones from the rest of our vocabulary. Let us underline the important bits in the same way:

What we need now are IRIs for the “is a” property and the “Class” Class. Fortunately, these are defined in the RDF and RDFS vocabularies: “is a” is defined by rdf:type, and “class” by rdfs:Class. Therefore, we can now write:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

obelisk:Obelisk rdf:type rdfs:Class .
obelisk:Person rdf:type rdfs:Class .
obelisk:Sculptor rdf:type rdfs:Class .

And congratulations, you’ve just created your first snippet of valid RDF! This particular RDF syntax is called Turtle, there are many other standardized syntaxes, but we don’t need to cover them in this tutorial.

Defining properties of things

The properties of things in RDF are called properties (how convenient). Therefore, as we did for Classes, we might write:

We already know that is a is identified by the IRI rdf:type, and property is identified by rdf:Property so we can now go ahead and change that into:

Which leads to our vocabulary looking like this:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

obelisk:Obelisk rdf:type rdfs:Class .
obelisk:Person rdf:type rdfs:Class .
obelisk:Sculptor rdf:type rdfs:Class .

obelisk:ownedBy rdf:type rdf:Property .
obelisk:builtBy rdf:type rdf:Property .
obelisk:height rdf:type rdf:Property .

Adding information for humans

Using labels and comments

So far we have created identifiers that are primarily intended for machines (although it is certainly not recommended, the IRIs themselves do not need to be meaningful to humans at all). For example, the following would technically be an equivalent vocabulary:

@prefix o: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

o:C001 rdf:type rdfs:Class .
o:C002 rdf:type rdfs:Class .

o:p001 rdf:type rdf:Property .
o:p002 rdf:type rdf:Property .
o:p003 rdf:type rdf:Property .

Even if we don’t want our vocabulary to look like this, the point is that it’s really useful to also provide human-readable descriptions of the terms in our vocabularies. To do so we’ll use the properties rdfs:label to add a human-readable label for the term identified by the IRI, and rdfs:comment to add a few sentences describing what is meant by the term in the context we use it. This could lead to something like this:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

obelisk:Obelisk rdf:type rdfs:Class ;
    # A label for readability...
    rdfs:label "Obelisk" ;
    # ... and a more descriptive comment for a fuller explanation of this 'thing'.
    rdfs:comment "An obelisk is a four-sided pillar with a pyramid-shaped top." .

obelisk:Sculptor rdf:type rdfs:Class ;
    rdfs:label "Sculptor" ;
    rdfs:comment "An artist who sculpts obelisks." .

obelisk:ownedBy rdf:type rdf:Property ;
    rdfs:label "owned by" ;
    rdfs:comment "Relationship between an obelisk and the person who owns it, which is typically the person who ordered it, or to whom it was offered." .

obelisk:builtBy rdf:type rdf:Property ;
    rdfs:label "built by" ;
    rdfs:comment "Relationship between an obelisk and the person who built it." .

obelisk:height rdf:type rdf:Property ;
    rdfs:label "height" ;
    # Note: so far we didn't specify any units for the height (we'll fix this properly later), but we can however provide a hint in the comment.
    rdfs:comment "The distance from the ground to the highest point of the obelisk, in meters." .

Please note that we are using a shortcut provided by the Turtle syntax to avoid repeating the thing that we talk about when adding multiple properties to it (e.g. obelisk:ownedBy in the next snippet):

  • The long version:
    obelisk:ownedBy rdf:type rdf:Property .
    obelisk:ownedBy rdfs:label "owned by" .
    obelisk:ownedBy rdfs:comment "Relationship between an obelisk and the person who owns it, which is typically the person who ordered it, or to whom it was offered.".
    
  • The shortcut:
    obelisk:ownedBy rdf:type rdf:Property ;
      # We removed the repetitions of obelisk:ownedBy, and replaced the end of
      # line by ; instead of .
      rdfs:label "owned by" ;
      rdfs:comment "Relationship between an obelisk and the person who owns it, which is typically the person who ordered it, or to whom it was offered." .
    

Adding multilingual support

So far, all our labels and comments are written in English, yet there is no explicit indication that the text is actually in English within the vocabulary itself. To make the language of any text explicit, RDF provides the concept of a language tag, which can be placed directly after the text string itself. The value of these language tags is defined by the international IETF standard BCP-47 - for example, we can use @en for English, or @fr for French.

This example shows how easy it is to explicitly provide both English and French labels and comments for terms in our vocabulary:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

obelisk:Obelisk rdf:type rdfs:Class ;
    # A label explicitly in English...
    rdfs:label "Obelisk"@en ;
    # ... as well as a comment in English...
    rdfs:comment "An obelisk is a four-sided pillar with a pyramid-shaped top."@en ;
    # ...and the same comment in French...
    rdfs:comment "Un obélisque est un pilier à quatre côtés dont le sommet est en forme de pyramide."@fr .

obelisk:Sculptor rdf:type rdfs:Class ;
    rdfs:label "Sculptor"@en ;
    rdfs:label "Sculpteur"@fr ;
    rdfs:comment "An artist who sculpts obelisks."@en ;
    rdfs:comment "Un artiste qui taille des obélisques"@fr .

Of course for many text values the concept of ‘language’ is meaningless, for instance Social Security Numbers in the United States are often written as strings, as they contain hyphens (e.g. ‘123-12-7890’), or the concept of a username (or nickname) will most often not have any associated language. For these common use-cases, simply not specifying a language tag at all is expected.

Adding some metadata

The finishing touch to this vocabulary is to add some metadata about the vocabulary itself, so that people we share this vocabulary with (or who search for it, or who just stumble across it on the web), can know who created it, and when, and what it’s intended purpose is, without having to go through all the details of the individual terms contained within it.

We already decided that the IRI of our vocabulary would be http://w3id.org/obelisk/, so this is the identifier we are going to use in RDF to say stuff about the vocabulary itself. In Linked Data terminology a vocabulary is called an owl:Ontology, so the first thing to say is:

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .

# `obelisk:` is equivalent to http://w3id.org/obelisk/
obelisk: rdf:type owl:Ontology .

# The remainder of the vocabulary is unchanged.
obelisk:Obelisk a rdfs:Class ;
    rdfs:label "Obelisk" ;
    rdfs:comment "An obelisk is a four-sided pilar with a pyramid-shaped top." .
# ...

Adding a description

Much like we described each term with human-friendly labels and comments, we can now add a title (using the property dcterms:title) and a description (using the property dcterms:description) to our vocabulary. To make it easier to reuse, we can also indicate a suggested or preferred prefix (vann:preferredNamespacePrefix) and a suggested or preferred IRI (vann:preferredNamespaceUri) (since multiple IRIs may point to the same vocabulary).

@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix vann: <http://purl.org/vocab/vann/> .

obelisk: rdf:type owl:Ontology ;
    dcterms:title "Obelisk ontology" ;
    # The description can be a multi-line text.
    dcterms:description """
    The obelisk ontology aims at describing obelisks.
    """ ;
    vann:preferredNamespacePrefix "obelisk" ;
    vann:preferredNamespaceURI <http://w3id.org/obelisk/> .

# The remainder of the vocabulary is unchanged.
obelisk:Obelisk a rdfs:Class ;
    rdfs:label "Obelisk" ;
    rdfs:comment "An obelisk is a four-sided pilar with a pyramid-shaped top." ;
# ...

Some simple naming conventions

You may have noticed some of the simple naming conventions used in our examples so far. These conventions are extremely common (but not universal!) across RDF vocabularies.

  • The basic convention is to use Camel Case for all your terms, e.g. ‘ownedBy’ or ‘builtBy’.
  • Capitalize the first letter of Class terms, e.g. Obelisk, or Sculptor.
  • Lower-case the first letter of Property terms, e.g. height or ownedBy.
  • Lower-case prefixes, e.g. @prefix obelisk: <...>.
  • Don’t use hyphens, use underscores instead, because it simplifies using them in some programming languages.

Reference

A reference version of this final vocabulary is available here, and you can experiment with the syntax using a live RDF validator.

Next step: publish your vocabulary on your Pod.