Going from controlled vocabulary to ontologies ONTOFORCE 2024

BLOG

Going from controlled vocabulary to ontologies

Ontologies enhance data by adding context to data, maximizing value of information. The journey to transform data into meaningful insights is a step-by-step process through controlled vocabulary, hierarchy, and relationships.

Bérénice Wulbrecht
4 September 2024 3 minutes

Data itself often has little to say, it’s the way we use it that defines its value.  Ontologies are key players in structuring data in a meaningful way. Despite their importance, there is often a lack of clear understanding of what ontologies are and how to use them effectively. 

Ontologies add context and meaning to data by defining categories and associations. Even if you have an excellent AI algorithm, a backbone for your data is essential to clarify the signals and get the most value of it. To effectively create an ontology, you need to understand the reasoning behind it and follow some essential steps. In this article, we will familiarize you with these different steps to guide you out of your data maze. 

The pathway through the maze: a step-by-step approach 

1. Controlled vocabulary

The first step is about reducing variability and consolidating related information. By creating generalized categories such as ‘drug’ and ‘disease’ and listing the authorized values, your data already becomes more meaningful and standardized. Synonyms can be grouped together, and typos can be identified, leading to an unambiguous dataset. 

Going from control vocabulary to ontologies 2024-5

2. Hierarchy

After creating categories, subbranches refine your data. This allows you to adopt different perspectives on the information. For example, as a neuroscientist, you are probably more interested in data on neurodegenerative diseases alone instead of all diseases together. 

Going from control vocabulary to ontologies 2024-2

3. Thesaurus

While the previous steps establish definitions and descriptions of the datapoints, the third step identifies associative relationships between them. An antibody can have a link with another drug modality, such as a small molecule, or it can be related to a certain disease. The nature of these relationships is, however, not yet identified. 

Going from control vocabulary to ontologies 2024-3

4. Ontology

Lastly, accompanying the established relationships with consensus definitions structures your data as a knowledge graph. A drug can, for example, either be indicated or contra-indicated for a certain disease. These associations allow the implementation of machine learning applications, further increasing the value of your data. 

Going from control vocabulary to ontologies 2024-4

It’s about the journey, not the destination 

Although it’s tempting to jump in headfirst, building an ontology is a journey. Each step builds the foundation for the next one. First, you need to ‘get to know’ your data and identify key concepts, hierarchy and relationships, and acquire domain knowledge through conversations with experts. Since this whole process is time-consuming, it’s important to emphasize that the final goal is not building an ontology per se. The priority lies in increasing the value of your data with each step along the way and focusing on what works best for you at this moment. Not every application needs an ontology, and a controlled vocabulary is already a huge leap. It is better to view it as a continuous process of updating your backbone as new data comes in and re-evaluating your needs. 

Taking the first step 

Different tools exist to get you started on your journey. For a controlled vocabulary, an Excel file or simple database will suffice. For more advanced applications, dedicated platforms to visualize your ontology are recommended. Some examples include Protégé, which is free and open access, and CENtree (Elsevier), TopBraid (TopQuadrant), SmartLogic (Semaphore). 

A good start is half the work: taking the first step and adapting your course along the way will help you obtain as much value as possible from the data you worked so hard for.