Escher, the complexity of coffee, knowledge engineering with ontologies – all these are nodes of the semantic networks associate professor Maria Keet keeps and grows in her mind, openly sharing them on the Web. With plenty of curious intersections they live online, openly available, just as her textbook – An Introduction to Ontology Engineering. Having emerged from Maria’s research interests in knowledge engineering with ontologies, concept modeling and related natural language generation, this textbook is the first of its kind globally in this subfield, recently being recognized as outstanding by UCT Open Textbook Award.
With her textbook “An Introduction to Ontology Engineering”, Professor Keet, part of the Department of Computer Science at the University of Cape Town (UCT), introduces students, and thankfully, us, the broader audience interested in this kind of research, to the essential components of ontology engineering, providing plethora of examples, exercises, slides, FAQ sections and interactive ways to engage with the content.
I am more than happy to introduce you to Maria’s answers to the questions that I sent her back in 2019 when she was in the midst of several projects going on. And despite her full schedule, she managed to send me answers. Yet, I wanted more :) And asked her to please elaborate on some of my questions. And … Patience pays. This January, on her sabbatical, Maria chose to share with us the threads of the semantic networks I told you about, diligently and patiently answering in a longer manner (yay!) all my broad enquires about ontologies, web life and everything :).
Enjoy Maria’s semantic networks and her brave researcher’s heart!
Teodora: Back in 1997 you worked on a thesis was carried out at the Department of Microbiology, section Molecular Ecology at the Wageningen Agricultural University, the Netherlands, titled “Effect of maize rhizosphere on degradation of 3-chlorobenzoate by Pseudomonas B13 or Alcaligenes L6.”. What can Microbiology tell us about our Macro world? It feels like there is is a connection between the micro and the macro world, worth exploring. In other words:
Where do molecular and Web ecology intersect?
Maria Keet: To me, they don’t really; or at best, perhaps, on two aspects, after some stretching. I did read an interesting book recently mentioning the Wood Wide Web, which refers to the connections between fungi and plants, drawing a weak parallel with plants as the routers and fungi as the cables or the plants as webpages and the fungi as the clickable links. That MSc thesis topic in molecular ecology of microorganisms that you refer to also played out in the rhizosphere, but then with bacteria that are also part of that part of the ecosystem. And there’s the old joke from my introductory microbiology class in first year at Wageningen University, which nowadays does seem to hold in the ‘Web ecology’ as well: “you’re never alone!”, since a large number of people are increasingly ‘always-on’ connected to the Internet, and each human has about as many bacteria as they have human cells so you always have company of non-human organisms as well.
One could try to find parallels or go for biologically inspired computation, in a similar way as has been done with swarm intelligence and evolutionary computation, but I did not do so eventually – ontologies had more allure than those sub-fields in computing. If one were to do so for bacteria rather than insect behaviour or genetics-only, I can imagine it might be something along the line of looking for similarities in models of bacterial colony dynamics and online social networks or app usage. For instance, colonies can continue to grow as long as there’s food, but at some point the bacteria start to die at the centre due to lack of access to nutrition and perhaps there are analogues to find online on the Web, like how people drift to certain apps but where early adopters, who are initially at the centre, leave first because of having gotten bored with it and lacking stimulation (as the ‘nutrition’ they seek). And there’s chemotaxis (bacterial movement induced by chemicals), which could be inspiring for algorithm development to solve certain tasks, but I’m really guessing here because you asked. As to the ‘ecology’ and the evolutionary aspect of it (the research group that I was part of at the time), that would align with aforementioned evolutionary computation more so than the Web Ecology.
Mostly, though, I see it as two separate specialisations—the microbiology I did then and the computing I’m focussing on these days. Microbiology is probably what I miss the most in having had to choose one specialisation eventually. But not the labwork though! It was that analysis of data obtained in the lab that made me go try to switch in specialisation.
Teodora: Among many other roles, you are also a Principal Investigator of the project MoRe NL: foundations of a Modular Realisation Engine for Nguni Languages.
What common ground do you find between Biology and language?
Maria Keet: None so far, either—at least not in the way the question is formulated. As disciplines, I consider them quite different, for with biology one can do science and discover how the world works and that’s irrespective of the humans living in it, whereas whatever we learn about human natural language is inherently dependent on humans. For knowledge in both fields, there are ‘general rules but still some messiness’ – as compared to computing where there are more rules and less messiness – but that could be a similarity of a characteristic, not a common ground. If we take language more broadly, also beyond humans, then one could consider how language manifests in nature, but also that is not quite a common ground related to the way I dabble in both.
This question taken together with the previous one, give me the impression that what you actually want to ask about, is how the seemingly different topics I’ve been working on all fit together. The short answer to that is: analysis and modelling. There’s a lot of analysis and modelling going on in computing, not only in ontologies, but also conceptual data modelling for database and software applications, modelling with domain-specific languages, model-driven engineering, and so on. There’s some in biology as well, where they aim to capture the theory of what happens and how things work, for which very many domain-specific languages exist to represent; e.g., metabolic pathway diagrams in a cell or organism or the carbon flow in an ecosystem (explore, e.g., here), which can be linked to models in computing or used for bottom-up development of domain ontologies. There’s also some modelling in language, and linguistics in particular: how to formalise the grammar of a language, say, and one even could create and draw concept maps to structure the content of some text for reading comprehension, which is then also a model, albeit rudimentary. So, the biology and language aspects come into play insofar as I can play around analysing and modeling or invent or improve sub-tasks that will facilitate analysing and modelling in order to improve the quality of their outcomes.
How did you get involved in Ontology Engineering in the first place?
Maria Keet: During my final MSc thesis on microbiology (in 1997), there were exciting novel bioinformatics tools like the RNAbase that could do things humans could not, such as aligning the digital representations of some bacterial RNA, and some would have been cool to have but did not exist yet. This got me really interested in IT: what else could be automated to make the research easier, or even doable in the first place? I wanted to combine the two, and set out to do so after graduation, via a little detour to get the funding.
In the 1990s there was no bioinformatics degree in the country I grew up, let alone anything on bio-ontologies, however, so I went on with IT in industry and studied for a degree in computer science and IT in the evening hours. For the so-called ‘honours project’ in that degree, I managed to combine the two: to build a database about bacteriocins, which are small molecules produced by some bacteria to repel, or even kill, other bacteria, which are used for food safety and food preservation. During the literature research to try to figure out how to represent the bacteriocin-encoding genes in my conceptual data model, I stumbled upon the Gene Ontology, and got interested in ontologies. We’re in 2003 meanwhile. Looking up more information about ontologies, I walked from one thing into another—ontologies for data integration, for improving the quality of conceptual models, with Ontology and even more modelling. Any job to do with ‘bio-ontologies’ did have as prerequisite a PhD, however, which I did not have. So, onward to trying to get into a PhD programme, which I did. While at the start of my PhD research, I still wanted to combine the two to focus on bio-ontologies, that gradually grew more broadly and generally into ontology engineering, to try to figure out new and better ways to develop good ontologies.
What do you find the most challenging part in teaching ontology engineering?
Maria Keet: There is no neat satisfactory easy definition of an ontology, so it takes a bit of time to describe what they are and then I hope it didn’t confuse the students or make them doubt the field. And then I veer from the definitions debate into the other extreme, trying to make it look harmless and familiar, in that, in the end, for computer scientists and software engineers who build ontology-driven apps, there’s that flat text file to process. That doesn’t do justice to the notion of ontologies either, but the same approach works with teaching databases, where there are also plaintext files with structured data and there’s also a whole lot of theory and engineering around it.
It also doesn’t help that there are so many .owl files that sound like they would bea ontologies, because in the Web Ontology Language OWL, but aren’t. Mistakes and modelling styles in those files are typically propagated into the students’ ontology that they have to design as part of an assignment, as if any artefact that’s on the Web is a good thing. I’ve tried to make tutorial ontologies to address this at least in part, which has helped. Some sort of list of exemplar ontologies would help as well, just like students look at sample code for how to program, but that has practical hurdles on who will be the gatekeepers of that list and what the quality criteria should be. Ontology quality is one of the areas of research within ontology engineering, and there’s no easy answer with a simple checklist – well, not a complete one and not one that a community will agree upon. I have some other sketchy ideas for how to address the issue besides tutorial and exemplar ontologies, but first plan to experiment with that to see if it has the desired effect or not.
Why do you think words and classifications of particular domains are so difficult to agree upon?
Maria Keet: The question is very broadly construed such that it can refer to many things, so I’m not sure what exactly you refer to. As a first, broad answer, the following. People can disagree for many reasons, including, notably: misunderstanding, homonymy and synonymy, underspecification, ignorance and intellectual laziness, too short on time to analyse something in depth, mixing up desiderata in constructing the branches in a hierarchy, agreeing on the semantics but representing it differently formally, or letting politics and/or ideologies trump over scientific evidence. Some of these can be resolved with the good will of the people involved in the development of the ontology (the first five reasons), others probably cannot (like the last one).
On top of these ones, it can get more challenging in a multilingual or international setting, as then translations and cultural differences come into play. The reality is the same – well, anyway, that’s my philosophical stance – but one language may not have words for some things because it does not make such fine-grained distinctions, or they divided up the ‘space’ differently based on their salience in their society.
Last, but not least, there are fundamentally different viewpoints that can act out in devising words and constructing classifications. Like four-dimensionalism or not, where one may see ‘objects’, such as you, me, the table, a computer, etc., as ever-changing space-time worms unfolding in time and space, and another is convinced such objects are wholly present at any time when they exist. Or what the nature of a relation is: is there an ordering in the participants reflecting the ordering in the way we use them in speech, do the participants in the relation each play a role, or is there some cloudy complex they enter into? And do you grant ‘stuff’ a special status or not, as something categorically different from ‘objects’? The latter does happen in language, as can be seen in the distinction between mass nouns and count nouns, but just because we do so in language, does not mean there must be an ontological distinction and, in the other direction, perhaps we do so in language because philosophically we tend to agree they’re fundamentally different things. Also that debate hasn’t been settled, and there are others beyond these three that influence how your classification will look like and how your language – the natural language and the logic for representing it – deals with all that.
Teodora: In your book (freely available for download – big thank you for this) https://people.cs.uct.ac.za/~mkeet/OEbook/ you are explaining about ontology engineering in a very down-to-the-earth manner and I am tempted to ask you these questions:
What is essential for a good ontology?
Maria Keet: Upfront before answering this question: the notion of ‘good’. One may argue for an absolute notion of what is ‘good’ and a relative one in the sense of good for whom or what. Regarding the latter, one might take an applied stance in IT for some bio-ontology or enterprise ontology and then an ontology would be deemed good if it does its intended task well or if it can answer the competency questions that were formulated beforehand. In that case, one could go as far as to claim that anything goes, and some do so. I do not subscribe to that viewpoint when it is assumed to entail skipping the principled approach, as I do think there’s at least one aspect ‘essential’ for a good ontology, which induces more factors.
That one baseline essential aspect is application-independence. Ontologies were proposed as a way to solve the data integration problem, where it would have the generic knowledge represented that holds across a number of applications so as to provide a means to represent that common ground. A consequence is that ontologies are not supposed to contain knowledge about what are really application decisions, such as storing the ‘height’ of a plant as an integer rather than as a float or a string. If you want to represent such practicalities in your model, then create a conceptual data model for your application instead, be it formalised in OWL or not.
Once that is settled, then one could go several steps further on the principled road, as far as one is willing to go. An expressive enough logic to represent what you need to represent is essential, although a logic-based representation of a subject domain on its own is not enough. Here is where the ‘good, bad, and ugly’ come in regarding what is included in the ontology: something may be logically consistent, but that does not imply it is ontologically correct. To get things right ontologically, one can avail of the scientific principles that were uncovered and good practices learnt over the years, on the one hand, and bad modelling decisions to avoid, on the other. A good ontology will try as much as possible to avail of, or incorporate, the former and avoid the latter.
Finally, the ontology developed is still a human-created artifact, like software, a bridge, etc., so it also should be subjected to solid engineering, not be some afterthought putting things in Protégé on a rainy Sunday afternoon. The ontology has to be designed in a systematic way, with justifications for the engineering decision taken, and with an evaluation on its quality.
What do you think is the thing over which nets of conceptual and logical layers cannot be cast?
Maria Keet: I’m an optimist. A lot more can be represented in a logic than one may expect—there are many logics with advanced modelling features—although sometimes it will be hard and complicated to do. If you refer also to computation over such a representation, there are obviously the well-known limitations on computation and therefore, practically, there are many things that will not be formalised because there’s not much to gain from it, or at least not from the viewpoint of computing and IT. Note that ‘will not’ and ‘cannot’ be modelled are two different things.
As to the conceptual layer, that is limited by the human imagination and understanding (using the philosophical sense of concept, being a mind-dependent entity), which means that inherent in the very notion of ‘conceptual layer’ is embedded that we cannot specify now what cannot be cast in it, for as soon as humans conceptualise of that, it shows we can. And we cannot know what we don’t know in order to judge it cannot be cast at the conceptual layer.
Why is it so hard to define what an ontology is?
Maria Keet: I think it is due to various reasons. Detailed descriptions are more precise but won’t stick because they’re too long and need too much explanation of prerequisite concepts in order to grasp them, like the definition in the seminal FOIS’98 paper by Nicola Guarino that spans a whole paragraph or the updated version that uses 8 pages prior to define the terms used in that 5-line definition.
Simplified definitions don’t cover it properly because they’re either being too inclusive, too exclusive, or too vague, like Ian Horrocks et al.’s statement that an ontology is equivalent to a Description Logics knowledge base or Gruber’s oft quoted ‘specialisation of a conceptualization’. Then there’s terminological and ontological disagreement from the philosophers to take into account when trying to define it; e.g., concept, class, entity type, universal, and category are different beasts when you dig deeper into their meaning, and then millennia-old unresolved debates also enter the arena, like whether it’s the reality that’s being represented in the ontology, the human understanding thereof, just human’s conceptualisations that may not have to correspond with reality, or that there’s no reality anyway.
We really ought to have a broadly accepted short definition for the ‘ontology as artifact’ by now. I do describe it informally in the lecture with the aim of it being operationally useful for students, but I wouldn’t put it on paper as the definition, because then it has to be watertight against attacks from ontologists, which it may not be.
What is logic-based knowledge representation in the most simple terms? And why we need it?
Logic-based knowledge representation is a way to be mathematically precise about the knowledge of some subject domain. For humans, it helps communication and mutual understanding; e.g., is your definition of ‘student’ the same as mine. For computers, the precision is essential: the machine needs to be instructed in a way it can unambiguously and deterministically process what some thing is, so it can do various tasks, like comparing it with other descriptions from another application and if so, do some match making or data integration, find the elements that are instances of the class, assist users to navigate across texts and so on.
If we were to have only pretty pictures with boxes and lines, differences can be shoved under the carpet more easily and then only be discovered later when errors pop up in the application. For instance, ‘student enrolled-at university’ in some concept map glosses over pertinent details such as whether the students must be enrolled-at at least one university or not, whether they may be enrolled-at more than one university at the same time or only at most one, and whether they can be enrolled-at some other organisation that is not a university. Any logic-based representation necessarily forces one to clarify at least some of these questions and record it precisely. The more expressive the logic, the more can be cleared up; e.g., a temporal logic lets one also specify whether enrollment at multiple universities may be at the same time or only sequentially where the time intervals of registration do not overlap.
After this, a second benefit may be obtained: automated reasoning over the logical theory. That is, using the rules of inference, it will detect implicit knowledge that may or may not be desirable to have in the ontology and it checks for any contradictions. Those deductions obtained, in turn, may help improve the quality of the ontology, among other things.
Who’s gonna interview the interviewer?
Maria asked me several questions among which I had to pick one.
Meanwhile, I definitely can think of a few questions to you :) Here are some of them, and just pick the one you prefer answering:
– How did you, as a philologist, end up looking into the Semantic Web and ontologies?
– To what extent, if at all, can ontologies be of any use to philology or the activities philologists do? Or if not ontologies, then how can Ontology (analytic philosophy) be of use to philology or the activities philologists do?
– I have not come across much academic literature in ontology engineering where philology is used to improve some method. In what way(s) could the Semantic Web, and ontologies in particular, make use of philology? Or, reworded/alternatively: What does philology have to offer to the Semantic Web, and ontologies in particular, that we could use to improve ontologies or ontology engineering?
– What is your view on the interaction between language and Ontology, or language and ontologies?
Letting these sink and having no practical experience on working as a philologist on ontology projects, I tried to figure these out conceptually. And I am afraid I have more questions than answers to these.
In the first place, what led me to the Semantic Web was my life-long interest in what things mean, how does meaning emerge, where it lives and what are the elements that allow it to flow and fluctuate.
Being interested in meaning, its construction, deconstruction and paths of forming, the way programs “undersand” text was a curious direction to start walking along. Having studied (and taught) Latin I was used to dig into the mountains of cultural, personal and professional layers a word caries and transfers. And here I am now, asking myself how do we help machines help us with our mountains and the mountains we still haven’t found?
:) And trying to be more practical and down-to-the-earth as you were in your answers, I would say that the theories of text are what can bring knowledge engineering the methodology (with all its labyrinths) of hermeneutics, if I can say that hermeneutics has a method… Actually, I think I can, given for example, Marcello Vitali-Rosati’s work, e.g. Examining Paratextual Theory and its Applications in Digital Culture.
And speaking about hermeneutics, I see a beautiful intersection between the semiotic triangle in the Guarino’s paper you cited and the hermeneutic circle.
And that intersection is full of questions and things we need to do only if we can somewhat collide two communities. Which is super hard. It hard because one one hand we need, as you said, deterministic, unambiguous representation and on the other we need to keep language, and one of its engines – poiesis, away from reductionism. And this collision I know will give us answers (and more questions) – the collision where we decide what degrees of logic our illogical part can tolerate in a universe where we don’t know, yet have discovered so much.
Instead of an epilogue: Maria’s web writing
What draw you to categorizing Escher?
(See Maria’s The works of M.C. Escher )
I always enjoyed his works, and I made several paper sculptures of his works when I was young, alike these ones and kaleidocycles (movable ones you could turn around) that you could cut out from the book and glue together. I didn’t really categorise his works, though; the authors of the cited book largely did, which seemed out of print at the time and hard to access by other people, so I wrote my own summary of it, out of curiosity, to remember better what I had read, and thinking that other people might find it of interest as well.
I’ve written several web pages the first few years when I started out with my website in 1999, as a way to keep myself occupied and to find things to write about that interested me so as to fill my website with some content, in addition to the MSc thesis material I had initially only, and this page was one of them. Others were on, among others, the benefits of red wine and the psychological effects of colours that relied in part on information from my earlier studies, and some pages about IT. In 2006, I moved such writings to my blog, which is still active to this day, and an overview of the more interesting ones can be found here.
You can also dive in more conversations in the Dialogues section where I explore meaning through intense exchange and collisions of domains :)