How to create your own vocabulary
Let’s build a very simple vocabulary describing obelisks (tall, four-sided, narrow tapering monuments that end in a pyramid-like shape), so that Cleopatra and Caesar can share information about their personal collections.
From plain English to a graphical representation
Let’s start by stating in English what we’d like to put in our vocabulary:
- An
obelisk
isowned by
aperson
. - An
obelisk
isbuilt by
asculptor
. - An
obelisk
has aheight
, which is a numerical value.
The highlighted elements of these sentences are going to be the ‘terms’ in our
vocabulary. We can identify two types of terms: the things that we talk about
(e.g. obelisk
or sculptor
), and their properties (e.g. built by
or
height
). Let’s make a graphical representation of that by putting the things
in bubbles, and the properties in squares:
From plain English to RDF
Identifying everything with IRIs
RDF is the language used to build vocabularies for use across the Web (aka Linked Data). In RDF, everything is identified by IRIs (which are simply standard web URIs (Universal Resource Identifiers), but just a little bit more modern in that they can contain characters from a more Internationalised set of characters (e.g. ‘α’, ‘δ’, or ‘ό’) - for more information, see Wikipedia).
First, we’ll need an IRI to represent (or identify) our new vocabulary (as we said, everything in RDF is identified with IRIs!), e.g. http://w3id.org/obelisk/. From there, let’s now update our plain English example a little bit:
- An http://w3id.org/obelisk/Obelisk is http://w3id.org/obelisk/ownedBy a http://w3id.org/obelisk/Person.
- An http://w3id.org/obelisk/Obelisk is http://w3id.org/obelisk/builtBy a http://w3id.org/obelisk/Sculptor.
- An http://w3id.org/obelisk/Obelisk has a http://w3id.org/obelisk/height, which is a numerical value.
As we can see, identifiers quickly become unpleasant to read when they are IRIs,
so RDF introduces the notion of prefixes (a simple concept borrowed from XML
namespaces). From now on we’ll use the prefix obelisk:
to stand in for our
vocabulary identifier http://w3id.org/obelisk/
, which means our vocabulary
now looks like:
- Use the prefix ‘obelisk:’ for our vocabulary identifier http://w3id.org/obelisk/.
- An obelisk:Obelisk is obelisk:ownedBy an obelisk:Person.
- An obelisk:Obelisk is obelisk:builtBy an obelisk:Sculptor.
- An obelisk:Obelisk has an obelisk:height, which is a numerical value.
Things, and Properties of Things
From the above we can see that we want to describe both ‘Things’ (e.g. Obelisks and Sculptors), and the ‘Properties’ of those things (e.g. their height, or who owns them). RDF allows us to explicitly distinguish between these by referring to ‘things’ as Classes, and ‘properties’ as, well, Properties!
Defining Classes of Things
In RDF, the general things that we can talk about are called Classes. Therefore everything that went into a bubble in our diagram above is a Class, so we could add the following to our vocabulary:
- obelisk:Obelisk is a Class.
- obelisk:Person is a Class.
- obelisk:Sculptor is a Class.
If we look at these sentences, they are structured exactly like the ones from the rest of our vocabulary. Let us underline the important bits in the same way:
What we need now are IRIs for the “is a” property and the “Class” Class. Fortunately, these are defined in the RDF and RDFS vocabularies: “is a” is defined by rdf:type, and “class” by rdfs:Class. Therefore, we can now write:
@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
obelisk:Obelisk rdf:type rdfs:Class .
obelisk:Person rdf:type rdfs:Class .
obelisk:Sculptor rdf:type rdfs:Class .
And congratulations, you’ve just created your first snippet of valid RDF! This particular RDF syntax is called Turtle, there are many other standardized syntaxes, but we don’t need to cover them in this tutorial.
Defining properties of things
The properties of things in RDF are called properties (how convenient). Therefore, as we did for Classes, we might write:
We already know that is a is identified by the IRI rdf:type, and property is identified by rdf:Property so we can now go ahead and change that into:
- obelisk:ownedBy rdf:type rdf:Property.
- obelisk:builtBy rdf:type rdf:Property.
- obelisk:height rdf:type rdf:Property.
Which leads to our vocabulary looking like this:
@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
obelisk:Obelisk rdf:type rdfs:Class .
obelisk:Person rdf:type rdfs:Class .
obelisk:Sculptor rdf:type rdfs:Class .
obelisk:ownedBy rdf:type rdf:Property .
obelisk:builtBy rdf:type rdf:Property .
obelisk:height rdf:type rdf:Property .
Adding information for humans
Using labels and comments
So far we have created identifiers that are primarily intended for machines (although it is certainly not recommended, the IRIs themselves do not need to be meaningful to humans at all). For example, the following would technically be an equivalent vocabulary:
@prefix o: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
o:C001 rdf:type rdfs:Class .
o:C002 rdf:type rdfs:Class .
o:p001 rdf:type rdf:Property .
o:p002 rdf:type rdf:Property .
o:p003 rdf:type rdf:Property .
Even if we don’t want our vocabulary to look like this, the point is that it’s
really useful to also provide human-readable descriptions of the terms in our
vocabularies. To do so we’ll use the properties
rdfs:label
to add a
human-readable label for the term identified by the IRI, and
rdfs:comment
to add a few
sentences describing what is meant by the term in the context we use it. This
could lead to something like this:
@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
obelisk:Obelisk rdf:type rdfs:Class ;
# A label for readability...
rdfs:label "Obelisk" ;
# ... and a more descriptive comment for a fuller explanation of this 'thing'.
rdfs:comment "An obelisk is a four-sided pillar with a pyramid-shaped top." .
obelisk:Sculptor rdf:type rdfs:Class ;
rdfs:label "Sculptor" ;
rdfs:comment "An artist who sculpts obelisks." .
obelisk:ownedBy rdf:type rdf:Property ;
rdfs:label "owned by" ;
rdfs:comment "Relationship between an obelisk and the person who owns it, which is typically the person who ordered it, or to whom it was offered." .
obelisk:builtBy rdf:type rdf:Property ;
rdfs:label "built by" ;
rdfs:comment "Relationship between an obelisk and the person who built it." .
obelisk:height rdf:type rdf:Property ;
rdfs:label "height" ;
# Note: so far we didn't specify any units for the height (we'll fix this properly later), but we can however provide a hint in the comment.
rdfs:comment "The distance from the ground to the highest point of the obelisk, in meters." .
Please note that we are using a shortcut provided by the Turtle syntax to avoid
repeating the thing that we talk about when adding multiple properties to it
(e.g. obelisk:ownedBy
in the next snippet):
- The long version:
obelisk:ownedBy rdf:type rdf:Property . obelisk:ownedBy rdfs:label "owned by" . obelisk:ownedBy rdfs:comment "Relationship between an obelisk and the person who owns it, which is typically the person who ordered it, or to whom it was offered.".
- The shortcut:
obelisk:ownedBy rdf:type rdf:Property ; # We removed the repetitions of obelisk:ownedBy, and replaced the end of # line by ; instead of . rdfs:label "owned by" ; rdfs:comment "Relationship between an obelisk and the person who owns it, which is typically the person who ordered it, or to whom it was offered." .
Adding multilingual support
So far, all our labels and comments are written in English, yet there is no
explicit indication that the text is actually in English within the vocabulary
itself. To make the language of any text explicit, RDF provides the concept of a
language tag, which can be placed directly after the text string itself. The
value of these language tags is defined by the international IETF standard
BCP-47 - for example, we can use @en
for
English, or @fr
for French.
This example shows how easy it is to explicitly provide both English and French labels and comments for terms in our vocabulary:
@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
obelisk:Obelisk rdf:type rdfs:Class ;
# A label explicitly in English...
rdfs:label "Obelisk"@en ;
# ... as well as a comment in English...
rdfs:comment "An obelisk is a four-sided pillar with a pyramid-shaped top."@en ;
# ...and the same comment in French...
rdfs:comment "Un obélisque est un pilier à quatre côtés dont le sommet est en forme de pyramide."@fr .
obelisk:Sculptor rdf:type rdfs:Class ;
rdfs:label "Sculptor"@en ;
rdfs:label "Sculpteur"@fr ;
rdfs:comment "An artist who sculpts obelisks."@en ;
rdfs:comment "Un artiste qui taille des obélisques"@fr .
Of course for many text values the concept of ‘language’ is meaningless, for instance Social Security Numbers in the United States are often written as strings, as they contain hyphens (e.g. ‘123-12-7890’), or the concept of a username (or nickname) will most often not have any associated language. For these common use-cases, simply not specifying a language tag at all is expected.
Adding some metadata
The finishing touch to this vocabulary is to add some metadata about the vocabulary itself, so that people we share this vocabulary with (or who search for it, or who just stumble across it on the web), can know who created it, and when, and what it’s intended purpose is, without having to go through all the details of the individual terms contained within it.
We already decided that the IRI of our vocabulary would be
http://w3id.org/obelisk/
, so this is the identifier we are going to use in
RDF to say stuff about the vocabulary itself. In Linked Data terminology a
vocabulary is called an
owl:Ontology
, so the first thing to
say is:
@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
# `obelisk:` is equivalent to http://w3id.org/obelisk/
obelisk: rdf:type owl:Ontology .
# The remainder of the vocabulary is unchanged.
obelisk:Obelisk a rdfs:Class ;
rdfs:label "Obelisk" ;
rdfs:comment "An obelisk is a four-sided pilar with a pyramid-shaped top." .
# ...
Adding a description
Much like we described each term with human-friendly labels and comments, we
can now add a title (using the property
dcterms:title
) and a description (using the
property dcterms:description
) to
our vocabulary. To make it easier to reuse, we can also indicate a suggested
or preferred prefix (vann:preferredNamespacePrefix
)
and a suggested or preferred IRI (vann:preferredNamespaceUri
)
(since multiple IRIs may point to the same vocabulary).
@prefix obelisk: <http://w3id.org/obelisk/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix vann: <http://purl.org/vocab/vann/> .
obelisk: rdf:type owl:Ontology ;
dcterms:title "Obelisk ontology" ;
# The description can be a multi-line text.
dcterms:description """
The obelisk ontology aims at describing obelisks.
""" ;
vann:preferredNamespacePrefix "obelisk" ;
vann:preferredNamespaceURI <http://w3id.org/obelisk/> .
# The remainder of the vocabulary is unchanged.
obelisk:Obelisk a rdfs:Class ;
rdfs:label "Obelisk" ;
rdfs:comment "An obelisk is a four-sided pilar with a pyramid-shaped top." ;
# ...
Some simple naming conventions
You may have noticed some of the simple naming conventions used in our examples so far. These conventions are extremely common (but not universal!) across RDF vocabularies.
- The basic convention is to use Camel Case for all your terms, e.g. ‘ownedBy’ or ‘builtBy’.
- Capitalize the first letter of Class terms, e.g.
Obelisk
, orSculptor
. - Lower-case the first letter of Property terms, e.g.
height
orownedBy
. - Lower-case prefixes, e.g.
@prefix obelisk: <...>
. - Don’t use hyphens, use underscores instead, because it simplifies using them in some programming languages.
Reference
A reference version of this final vocabulary is available here, and you can experiment with the syntax using a live RDF validator.
Next step: publish your vocabulary on your Pod.