Text and the Semantic Web have been part of enquiry for understanding human communication for quite a long time now. Questions about content, graphs, organisational change and the Yay! :) of the URI are always at the back of my mind when I explore how content on the Web is ideated, created and managed. And all of these questions I was able to ask Ian Piper of Tellura Semantics, inspired by his World IA Day (what a day! ❤️) talk From Chaotic Content to the Well-tempered Content Graph.
Ian’s talk caught my imagination and grounded it in a potential practical solution where text and graph live peacefully together. I have heard about “content graph” from Aaron Bradley before, yet I haven’t seen much around the topic. Until I met Ian.
Following careers in industry as a research chemist, information architect and usability specialist, Ian Piper is now working on improving information and knowledge management through applying semantic technologies. Ian and his team at Tellura Information Services are helping organisations in industry, academia and the public sector to use graph technology for solving organisational information management problems.
In this Dialogue I am happy and excited to share with you what Ian Piper has to tell us about the Semantic Web, graphs, content, ontologies and taxonomies.
How did you switch codes from atoms, molecules and ions to nodes, edges and URIs?
I started my career as a research chemist in academic institutions, and then moved to commercial businesses. I soon discovered that fast-moving chemistry research is a young person’s game; in effect my learning was frozen at the point when I left university research, and good commercial research requires a steady stream of young knowledgeable people. In any case I was becoming increasingly interested in the opportunities for application of computing in science. So a progressive move into IT was more or less inevitable. Luckily, I had a taste and a bit of a flair for it.
When did you first hear about the Semantic Web?
I was working for Glaxo in emerging computer technologies, and we set up the first dynamic web application architecture that the company had seen. We built this architecture, and the distributed applications that ran on it, using Tcl (a UNIX scripting language). This was in 1994, and the plain basic web was as far as most organisations had gone, so this was ground-breaking work. While this was an excellent tool for creating webs of connected information, we could see that there were a couple of things missing; in particular two-way linking between the things we called “pages” and any kind of meaning or types attached to those things and those links. Then I picked up on the First International Conference on the World-Wide Web, and Tim Berners-Lee’s talk on the future of the web. You can still see this talk, and I would urge anyone interested in the Semantic Web to look at it.
At a time when most people were struggling with putting up a basic web page, Berners-Lee set out a vision of a semantically linked web of knowledge, and it is as fresh and relevant now as it was then. I was hooked on it from that point, and one way or another it has informed all of my subsequent work.
When was the first time you saw the need for graph in education and business? How did it look like in practical aspects? How was that need voiced?
[I am referring to Tellura’s founding and also to the idea that sometimes, actually many times, companies are not aware of semantic web technology solutions and don’t talk about graphs, but rather have their own vocabulary of problems]
The idea grew out of the observation that information management in general business was not actually solving real problems. Content is chaotically spread around organisational silos and there is little scope for interoperability because there are no common standards for storing, characterising, finding or re-using stuff. In my experience the normal response to this, from the people given the job of sorting things out, is to propose that they put all of the content in one place. If all of our stuff is in one place, the argument goes, we will finally have a handle on everything and will be able to find everything and re-use everything. One System To Rule Them All.
Apart from the fact that this is just creating another silo, this argument fails to recognise the realities of human nature; people want to manage their own information and are probably the best qualified to manage it. Time and again I saw centralisation initiatives fail for this reason. By the way, 20 years on this is still the default mode of behaviour in most businesses that I work with. And it’s still failing.
I realised that a better way to address this would be to just leave content where it is, managed by the people who know it best, but to ensure that it was opened up to anyone else who needed to use it. This required well-defined content models and good communications protocols between information in one place and information in another.
I realised that this was really another way of describing the Semantic Web; disparate objects that can be characterised unambiguously and linked in meaningful ways.
With the emergence of the Resource Description Framework (RDF) and the Five-Star model for open data it seemed clear to me that the graph approach was a pragmatic way to address some of the problems of organisational information management.
In terms of practical application, a major part of my engagement with my clients is knowledge transfer; I try to infect my clients with knowledge and curiosity about the world of graph technology, while still anchoring this in realistic and achievable goals.
Where did the concept of a content graph emerge from?
It is derived from the general model of knowledge graphs. But I use the term content graph because people tend to equate knowledge graphs with knowledge networks and knowledge management, ideas that have had a patchy uptake amongst business users. By contrast, the key idea behind a content graph is to describe your content in terms of its structure and use, to break content down into the most granular chunks that are practically usable and then to link those chunks together using meaningful relationships. Rather than monolithic blocks of content described only by structural metadata (title, author, date, etc), such granular content objects can be stored in repositories managed by their type, their relations with other content objects and data properties.
Using a content graph in this way helps content owners, designers, writers and editors to make the most of their product. People know their content, and tend to be more receptive to a model described in this way.
How would you explain a content graph to the guy next door?
Graph ideas are not very intuitive, and so it’s difficult to do this. But here is an analogy that might help.
Imagine a visit to a book shop. The books in the shop are arranged in sections based on genres such as fiction, travel, photography, poetry and so on. They will usually only be found in one section, so if you are looking for a book that crosses genres (say, travel photography) it can be hit and miss finding your book. And what if you wanted something a little more personal or customised? That is just not possible in your book shop today.
But let’s look at a Semantic Book Shop built like a graph. Instead of rigid sections, there are virtual sections. You can explore these sections just like in a real book shop to find interesting books. But you know exactly what you want, and ask for books that are about (I’d say tagged with) travel and photography. And actually, you are really just interested in the work of National Geographic photographers. Your Semantic Book Shop goes off and finds content that has been classified according to these different parameters. Noting that these are all content of a similar type (they are all chapters), the shop assembles these on the fly into a larger object that is called a book, and presents this back to you.
In graph terms, what just happened was that you asked a massive content graph to find tailored content for you that was about travel, and about photography, and featured people who are National Geographic photographers. For convenience, the answers to your query were assembled as chapters into a custom delivery channel called a book.
Sounds like a great Semantic Web book shop. And where does URI come into play? How is it helping us make the information spaces we live and work in better knowledge repositories?
URIs are absolutely crucial components of graphs. A Uniform Resource Identifier is a globally unique, immutable identifier for an object (or a thing). The structure of a URI will usually also let you know where to find the object, in a similar way to a URL. Having a predictable structure, a
URI is also more practically useful than alternative methods of “unique” identification. Once you have assigned a URI to an object then you can unambiguously identify it as distinct from all other objects. The URI helps to establish an object as a single source of truth in an information space.
It looks like the content graph is the step after the taxonomy and before the ontology, am I on the right track?
I don’t think that is the complete picture. There are three things here (and a fourth that needs to be here), and I’ll talk about each separately.
First, an ontology. This is a model for a type of knowledge. Imagine that you want to describe a person in terms of an ontology. A person will have a variety of data properties – that is, links to a specific piece of data – like a first name, a last name, a date of birth. The ontology defines those data properties using semantic links like
Person -> hasFirstName -> [the actual name]
They will also have links to other objects like other people, again using meaningful verbs.
Person <-> knows <-> Person
Person -> hasParent -> Person
These relations are also defined in the ontology. So in summary an ontology is a model or a design or a template for an area of knowledge.
Coming closer to home, a taxonomy is a collection of real things that conform to an ontology. I usually create taxonomies based on an ontology called Simple Knowledge Organisation System or SKOS. This ontology defines what things will go into a taxonomy; it will have concepts and concept schemes; a concept will have a URI, at least one preferred label, broader and narrower links to indicate a hierarchy, and so on. A very attractive aspect of SKOS is that it can also bring in other ontologies to expand its capabilities. So if you have an information model in your enterprise you can apply it to your taxonomy too.
It’s important to be clear that while an ontology (such as SKOS) is a design for knowledge, a SKOS taxonomy contains the real examples (sometimes called instance data) that conform to that design.
At this point, I need to insert the fourth thing; the content object. Just as a taxonomy concept is a piece of information that conforms to the SKOS ontology, so a content object is a piece of information that conforms to your information model (which is an ontology) and capable of linking to other content objects and taxonomy concepts using the model.
Which brings us finally to the content graph. A content graph consists of content objects conforming to your information model, and taxonomy concepts that conform to SKOS (at the minimum) and probably also conform to aspects of your information model.
Once we have a content graph defined in this way, we can link content objects to taxonomy concepts and also to other content objects, which may in turn be linked to further taxonomy concepts and content objects. This leads to a network of transitively related things.
What can you do with an ontology that you can’t with a taxonomy?
You can design any kind of information using an ontology. You can then create instance data for that ontology. For example, if you had an ontology of banking information, that might include classes or types of information like Product, Service, Customer and Colleague. Each of these represents a facet of a bank’s knowledge domain. A Bank Account might be a sub-class of a Product, and you could create relations (such as hasAccount) between a Customer and a Bank Account. A Bank Account might also have data properties, such as an account number, a balance and so on.
You might be thinking, well, why can’t we create a banking taxonomy? Of course, you can do that, and have hierarchies representing products, services, customers and so on. And you could link content from your website to that taxonomy so that you know an article is about bank accounts. But because SKOS taxonomies store mainly concepts, there is no semantic information in the taxonomy – unless you import the ontology information and assign it to your taxonomy. Once you’ve done that, you have a hierarchy of things with names like Current Account, Savings Account, Investment Account and so on, but in addition to being taxonomy concepts they are also instances of the Bank Account class from your ontology. That means that you can remove any ambiguity about the link between the content and the taxonomy.
To cut a long story short, taxonomies contain information that conforms to the SKOS ontology, but can also hold information conforming to other ontologies. Together they provide a much richer collection of structured, specific and unambiguous information about the things that are important in the knowledge domain.
And where do ontologies fit in your model?
Many organisations start out intending just to use a taxonomy to tag content. Most of these organisations quickly reach a point where they want to store and explore richer networks of information than is possible with just the content tagged to the concept. This is the point at which they would typically start to design an information model (ontology) to represent the real things they want to keep track of. That’s the first step towards building a content graph.
What are the organizational change challenges for building a content graph?
The biggest single challenge is the fact that the Semantic Web is a disruptive technology. Taking it on wholesale can be seen as tantamount to agreeing that search doesn’t work, super-silos don’t work, monolithic content architecture doesn’t work and relational databases don’t work. Of course I don’t take all of those views, and I don’t think a move to semantic technologies needs to involve throwing out all of those current approaches. Nor does any of this need to be done all in one go. In my recent talk at World IA Day 2021, I set out a stepwise, incremental approach to building content graphs that can work alongside existing processes.
Another common challenge is the prevalence of the “we already have a better way to do that”. The problem with many people who are invested in the relational data world is that like many people armed with a hammer, everything looks like a nail. They see an ontology, or a taxonomy, and it looks just like another data environment to be solved with Oracle or SQL Server. I often have to deal with this situation in new clients, so I’ve learned to have a pocket full of arguments as to why, and in which situations, graph technologies deliver better results than traditional relational systems.
What do you think are the practical steps of linking a data model to a content model – and isn’t that a false dichotomy at the end of the day?
[Since everything is really deeply intertwingled and, as the Web and our knowledge and culture practices evolve, we are starting to create content with data in mind.]
I tend to see data and content (and, for that matter, knowledge and information) as a continuum, not as distinct separate things. It may seem simplistic, but I prefer to think of everything as “stuff”. Whatever stuff is, and no matter where it is stored, the nature of developing a graph, to me, is about providing that stuff with as much valuable and semantic metadata as possible, and looking for opportunities to link stuff with other stuff using meaningful relationships. Not only do we get intertwingled stuff as a result, it’s also meaningfully intertwingled.
The main practical steps that I normally advise are:
- Understand your content better by building an information model
- Implement a taxonomy management system, and use it to classify your content.
- Enforce separation of concerns in your information storage; narrative content in your CMS, taxonomy concepts in your taxonomy management system, project information and people information in their respective systems.
- Introduce semantic middleware components to mediate between your different content and information systems and your taxonomies.
- Use that mediation to build your graph.
Ian Piper’s Quick Favourites
Favourite thing you digged during your PhD research
Doing a PhD is a chance, for a brief spell, to become the world expert in just one thing. In my case, a rather obscure metabolic pathway that converted one type of non-protein amino acid into another type of non-protein amino acid. This was not “Structure Of DNA” territory, for sure, but I enjoyed investing my time in discovering that single thing; scientific progress is made up of thousands of similar small incremental discoveries.
Favorite paradox in knowledge management
Everyone talks about it, everyone’s an expert in it, yet almost no-one practices it.
Favourite graph element
Sorry, that’s an impossible one to answer. A graph has subjects, objects, predicates, object and data properties, and all are irreplaceable parts in making it all work. I can’t play favourites with that!
Who’s gonna interview the interviewer?
Teodora: And now, Ian, I would be grateful if you could spare several minutes to ask me a question or two. :)
Ian: Why these dialogues, and what made you get started?
Teodora: Thanks for this question! It prompts me to get back 6 years ago. The Dialogues series is what I felt as the way forward when it comes to sensing meaning. That meaning that cannot be formalized, the one that we run after and only in running after internalize. Little did I know back then that dialogues will become such an important topic for all my enquires and also that it has already been a path for researchers in the field of organisational, personal and pubic communication. I am refering to the wonderful book: Dialogue: Theorizing Difference in Communication Studies .
Here’s a beautiful quote from it that very much summarizes what I am trying to do with these Dialogues:
The single adequate form for verbally expressing authentic human life is the open-ended dialogue. Life by its very nature is dialogic. To live means to participate in dialogue.cit. Mikhail Bakhtin (1984)
Instead of an Epilogue: Advice on how to navigate our networked, unabridged world
One last question, Ian. As the Web threatened the orderly hierarchical world (ref. TBL’s Hypertext and our collective destiny) how do you think we as individuals and teams can best learn to navigate this networked, unabridged world?
- Keep an open mind.
- Look for the connections between this and that, and capture the meanings in those connections.
- Don’t hesitate to follow the trail across your knowledge spaces.
- Move beyond the search box; embrace new ideas on content discovery.
- Celebrate serendipitous discovery; it’s human nature to look past the book you’ve found on the shelf to other, nearby books, and you can do exactly the same thing using graphs.
- And finally, remember that knowledge networks are never finished; they are organic, developing things that need to change to reflect the changing nature of your business.
Thanks for reading!