Think cheeseburger. Now think rhythm. Now listen: “Cheeseburger, hot-dog, Cheeseburger, hot-dog”, this is how I once heard prof. Milcho Leviev explaining how he introduces people to the asymmetrical rhythms of Bulgarian folk music. [Check Maestro Leviev’s Bulgarian Boogie]
Fast forward to these when reading and thinking about knowledge, meaning and life brings me as much pleasure as a jazz festival would (I am serious!) I was privileged to meet Bob Kasenchak, who also straddles the domains of food and music in a different, yet equally inspiring way.
Bob Kasenchak is the Senior Manager for Client Solutions at Synaptica. He has been studying and teaching music for a decade and also designing and developing information projects at a leading taxonomy for 8 years after giving up on academia. Bob’s research interests include linked data and graph databases and his seemingly non-work related engagements with forms and structures include wine, tea and Javanese music.
Bob is also the guy with the cool Twitter avatar who thinks I am on an exciting way of seeing the Semantic Web through an intertextual eye :)
So, meet Bob. And enjoy the semiotic, I would call it, ride!
Bob, with all the questions about music, taxonomies, ontologies and the meaning of life (and sandwiches), I can’t help but set the stage of this conversation in the following way:
Tell us about your Twitter avatar!
Thanks Teodora! And thanks for having me.
The Twitter avatar…ah, well. So a couple of things came together there: one of my hobbies is pipes, and I have sort of a thing about rabbits. My mother-in-law found this delightful ceramic statue of the pipe-smoking rabbit at a yard sale, and I thought it made a great picture. I have considered making him a tiny mask but haven’t done it yet.
And now, you can also tell me what is the common thread running across the life of a philosopher, a music theorist, a wine geek and a taxonomist?
I suppose it’s information towards what Plato calls “greater understanding”. One of the bits about wine geekery is that it’s not just about the sensation of wine; there’s an entire world of vocabularies of places, regions, grapes, climates, production methods, and producers that you engage with during the process. Like in music school, there’s a component of blind tasting (analogous to music identification) that requires you to engage with and process sensations into information. And engaging with that world of information broadens the pleasure of the wine. I’m also working on being a tea geek.
Music, also, is information. As with wine (I argue), understanding more about music deepens the appreciation and understanding of it. Wine is tasty and music is pretty, sure. But engaging with the structures and implications is an even greater pleasure. For me, at least; I recognize that analytical approaches are not the only way to enjoy things.
So engaging with forms and structures as information encoded in music (and how they affect our perception of the music) is analytical and can even be data-driven. My main music theory mentor was into the intersection of mathematics and music; I used to make a lot of graphs, which other music students did not really relate to! Other information encoded in music is pretty subjective: it’s difficult to engage with music outside of your cultural understanding because the tropes and topics and symbols are subjective, broadly speaking. Music that evokes, say, marching or a martial atmosphere is predicated on understanding (perhaps unconsciously) how Western music has represented these things, so we know it when we hear it.
Engaging with music of other cultures (I’ve been playing Javanese gamelan music for a couple of years now) is difficult; we don’t even have common scales or modes or pitches, so the evocation of mood and feeling we’re used to apprehending (think major and minor, for example) are completely different and inaccessible, at least without a lot of listening and training and reading. Javanese music has its own set of moods (and even times of day!) associated with certain scales and modes, for example, that I struggle to internalize. Of course, certain aspects of the structure of Javanese music parallel some Western ideas; others are completely foreign.
So I guess when I fell into the information industry about a decade ago (after giving up on academia) it was sort of a natural fit. Humans are category-seeking creatures, so the opportunity to think deeply about what it means to create and use (not to mention agree on) categories is extremely interesting to me. How far can information theory and metadata and representation in triples take us towards knowledge? Or even greater understanding?
And also, as a taxonomist, what do you do? Is it hard to say, on a party, answering what do you do, “I am a taxonomist”
Yes, well. I have my elevator or party pitch about what I do prepared by this point; it mostly has to do with explaining taxonomy via ways they interact with metadata and categories electronically. Everyone understands hashtags on social media, so the idea of controlling tags for better retrieval is pretty easy to explain. And everyone who’s ever browsed a shopping site (or a content site, for that matter) has engaged with taxonomy without realizing it.
So I guess I’d say I build information structures to organize content, products, or whatever other electronic information needs organization. Taxonomies are everywhere online, and many of them are poorly conceived or deployed. As the amount of information available continues to balloon we need better ways to organize it so people can find things.
Many of the taxonomies I’ve built were for scholarly publishing clients with a lot of, say, journal articles and other content. These collections range from the tens of thousands to millions of objects, and free text search just doesn’t cut it when your entire business model is predicated on charging people for access to your content. If you have 900,000 articles on physics and I do a free-text search for “mercury” what am I likely to come up with? Articles about silvery metallic elements, articles about a planet, articles perhaps about cars or Roman gods, and you have perhaps tens of thousands of results for me to look through. But I’m an astronomer (or whatever) and I’m doing research and I just want articles about the planet Mercury; I’m certainly not going to browse thousands of results. Using taxonomy to enforce controlled subject metadata is the first step in solving that problem.
Please, connect the dots for me: taxonomy, content strategy, data model -> Semantic web technologies.(Yes, I know, it is the million-dollar question)
I think that, when talking about content, the data model is the empty structure into which the content — and metadata about the content — is put and stored and shared. Some elements of the model are to structure the content: title, abstract, paragraphs, references, or whatever. The metadata elements hold information about the content: author, date, topics, publication title (or whatever is applicable). Some metadata elements need to be controlled: this can be as simple as an authority file of country names (so everyone is using the same label for USA or U.S.A or United States or whatever) and, if done properly, controlled lists of subjects or topics. Any metadata element that requires control can be informed by some kind of controlled vocabulary, like a taxonomy. Content strategy, I think, encompasses or at least overlaps all of these things, but with a broader view that includes creation, distribution, and re-use.
Bringing in semantic web technologies: let’s say I have a document about a topic, perhaps an article on stomach cancer. I can tag it with the term “stomach cancer” from my taxonomy and deploy it for search on my website; that works fine. However, the next step towards the semantic web is giving that concept a link to a Linked Data resource about the same topic: a DBPedia or Wikidata page, for example, and equating my term to this using a URI. Now I’m asserting that the concept in my local taxonomy “stomach cancer” is the same thing as the concept represented by the URI to the Linked Data source, which in turn has other information (like a definition, pictures, references to Wiki pages, other places that publish on this topic, and other vocabularies that assert the same connection) which can then be extracted and included to enrich my content. It’s the same basic concept as including hyperlinks in text; it makes explicit the web of semantic connection between far-flung content: words.
And why is there a taxonomy vs. ontology debate? What is this about?
Well, there is a lot of confusion in this area. The sort of argumentative view is that ontologies will replace taxonomies, or are the next step, or are taxonomies with benefits, or something like that which really misses the point of ontology. And (ironically, in a discipline all about controlling vocabulary) “ontology” is used to mean quite a number of things. I think that taxonomies are good for some information organization tasks and ontologies for others. In some cases, an ontology would be overkill, and in others a taxonomy is insufficient.
But there’s a lot of overlap.
If you have a bunch of content you want to tag for retrieval about, say, medicine maybe you have a handful of taxonomies about drugs, diseases, treatments, demographic information, and so on and you tag content from each of these. But if you want to build a structure that describes which drugs are used to treat which diseases and what treatments are used and so on, we’re talking about an ontological structure.
Does this have to do with your question “Is hot-dog actually a sandwich?” Please answer that, keeping in mind that our readers have never dipped their toe into information architecture, the Semantic Web and the science and art of categorizing things.
[Dear reader, some time ago, Bob sent me a wonderful deck of slides, which today he made public for us to enjoy:
Taxonomies and categories and questions about how humans arrange categories are everywhere, and I thought that when this meme was going around a couple of years ago that it was a great way to engage people with questions that taxonomists wrestle with. How can we decide this? What defines a sandwich?
And, very crucially, who gets to decide? While this may seem trivial, as hot dogs are not very important, really, it becomes important when you have to make other, essentially ethical, taxonomy decisions. Who gets to say how many genders are listed on your web form that I have to fill out? Can I register for your service if I’m from a country called “Palestine” or do you not think that’s a country? So I used the hot dog-sandwich debate to frame a discussion about the ethics of naming things.
But things can get concrete pretty quickly: it’s one thing to decide where to put a hot dog on your menu (under sandwiches? Or not?) but if, let’s say, I’m collecting data about consumer behavior and I want to know about the popularity of sandwiches it’s pretty important to know that if, in the data (ontologically speaking) a hot dog is considered to be in the class of objects we understand as sandwiches.
When did you first hear about the Semantic Web?
I was vaguely aware of this as “something Tim Berners-Lee talks about” until I got into the information industry full-time about 10 years ago. Even then, Semantic Web concepts were sort of a nice-to-have in taxonomies, but more recently I think it’s become more critical.
And next, following question, why content strategists don’t use the SW technology, or maybe I am not aware of such use? I only know about the brilliant and super friendly Mike Atherton and the case for BBC. But what else?
Atherton and Carrie Hane and some other people in the space advocate for this, and we’re seeing it more in connected datasets in the research community. I think that with everything else the content strategists are juggling that Semantic Web seems like an add-on with limited practical use from a business perspective. I do think it’s gaining traction from what I can observe, but it also may be that the types of information architects I associate with are hip to it.
Which leads me to Graphite. You are part of Synaptica, the team that built that product. How exactly will this help people build ontology-based knowledge organization systems and knowledge graphs? And why would anyone go into the trouble of building an ontology.
So this is going to get technical pretty fast, for which I apologize. And I also don’t mean this as a product pitch, although of course I’m happy to talk about our software.
Anyway, Graphite is designed as a user-friendly layer for building ontologies that doesn’t require writing SPARQL or SKOS or OWL and lets a user easily define and relate concepts. It also allows the construction and enforcement of restraints. A knowledge graph, I think, is an ontology that’s also connected to some data (could be content or other sources of information) using Semantic Web principles. Ontologies are complicated to construct and maintain, but they allow you to connect data from disparate sources for rich information environments.
So let’s say, again, that you’re a publisher and you have 900,000 articles on Physics across 100 years and 26 journals. We can build a taxonomy to tag the content with topics so it’s easier to retrieve in search, as I described earlier. Next, if we extract the structured metadata from the data model — authors and the institutions they’re affiliated with, journal and issue information, publication date, topics from the taxonomy, references — and express these relationships in an ontology, now I have a connected network of information about the content that I can use. For example, maybe I want to construct a network of papers that cite other papers, or authors that cite other authors. Or maybe I want to see which journals publish on which topics, so that users reading some journal content might get recommendations for other articles they’d like. You can do analytics across your content (well, the data about your content), which is extremely useful. Now, let’s say I connect each topic in my taxonomy to some linked data source: each of my perhaps 10,000 topics is connected to the corresponding Wikipedia page, or something like that. Now I can offer information about the topic alongside my content, making the experience richer and more useful: that’s an example of a knowledge graph. This is essentially the model the BBC used that you refer to in your question above.
Other places that companies find it’s worth the effort to build ontologies are in medicine, finance (for things like fraud detection), and enterprise data.
To loop back, how music theory relates to the practice of building f knowledge graphs?
I think that if we’re not careful about how we structure things (and how we analyze those structures), be they graphs or gavottes, we’re still going to end up producing meaning, but we won’t have any control over it.
How is that? It feels like a bit of a stretch.
Favourite thing you digged during your PhD research
My first graduate seminar at the University of Texas was on Critical Theory and Music. At the conservatory where I did my Master’s degree we mostly studied, well, music. And so I had to read and engage with a bunch of authors and ideas of which I had only limited experience: Foucault, Barthes, Derrida, Said, Žižek, Adorno, and others. I loved it.
Favorite paradox in information architecture
The paradox of The User. It’s so hard to design a system (of any kind) and expect how people will use it. User research and testing are a huge part of information architecture, and still time and time again we see systems in the wild (I’m thinking of credit card payment systems at checkout, for example) with such disregard for the user experience that it’s baffling. Trying to understand user needs, behavior, and intent and resolve those with design and information infrastructures is incredibly difficult.
Perhaps more broadly: since meaning is relative (or at least extremely subjective) and intent is at least as problematic, having to codify and fix meaning and intent is massively paradoxical.
Music is certainly my favorite ambiguity; language is a close second.
Who’s gonna interview the interviewer?
Bob, now it’s your turn to ask me questions :)
[Bob] I promise not to ask about your dissertation; fortunately, you’ve already published a book.
[Teodora] :) Thank you for having mercy! By the way the book doesn’t help :) The good news is while I struggle with academic writing I see why people struggle with web writing, but that is another topic, sorry book :P
[Bob] I’m wondering what Semantic Web tools you see emerging for authoring content? How can we bring this world that you so richly describe and endorse to a larger audience? And, if more people start to engage with it: how can we keep the whole thing from exploding into an information mess?
[Teodora] Let me start with the non-trivial thesis: our job is not to keep information from exploding. Our job is to see the patterns in the bursts.
As for the tools and their use, you put it so well: there should be a way for people to engage with the Semantic Web without trying :) Honestly, I don;t see the Semantic Web tools being helpers in the authoring (improvizations, wildly unconstrained process). I rather see semantic technologies as the means to futher add a layer of machine-readable meaning. That is we do the heavy lifting of synthesis, Semantic Web tools do the heavy lifting of analysis. I still haven’t quantum leaped from understanding to Semantic Web as “small piece loosely joint” to a system which joins the pieces for you beforehand. :) But maybe I am, paradoxically, too rigid in my understanding of authoring and creation. Maybe I should be more open and say that a tool like Roam (h/t to Ivo Velichkov who told me about it) can help the process of conceiving, curating and creating tapestries of webby pieces and truly linking words and concepts.
Thanks for the questions, Bob! And now last, instead of an Epilogue two things:
As a word nerd to another word nerd: what do you think is the most intricate metamorphosis the Web brought to the written word?
I’m thinking here about Barthes and his notion of reading for pleasure as “skimming” over the surface of the text. It seems to me that hyper-rich text — that is, text with robust hyperlinking — allows a new kind of skimming from text to text and source to source. It’s easy and pleasurable to begin reading something, follow a link, browse around, and wind your way through linked texts and resources. This experience is extraordinarily different from getting up to find another book you were reminded of while reading, or going to the bookstore (which I still love) to find another book on the topic, or looking something up in a dictionary. We are now used to an intertextual world, as I think you might say, that provides endless and endlessly different ways to engage with text.
I do a lot of reading on my little tablet and a few weeks ago I was reading an actual, physical book in bed, and I found that I was frustrated that when I encountered a new word I couldn’t just hit a button to Google it. This of course struck me as very funny as well.
Having built so many interdisciplinary bridges, how do we build bridges across disciplines and work practices in the Web content world?
I think that the Semantic Web and related technologies are still seen as the domain of a very small subset of experts and expertise, and that even when engaging with it people are not aware of it; it’s very much under the surface doing the “web” thing. Some other technologies work this way: most people have little idea how Search works, but they’re comfortable using it when presented in a simple way, and it works. When it doesn’t work is the only time they can really feel the friction of engaging with it; otherwise it’s fairly frictionless. Think about the Google Knowledge Graph: this is genius, you don’t have to do anything to access it besides just use Google.
So perhaps the answer is to create low-friction environments in which people can engage with the Semantic Web without trying. What if we had authoring tools that would provide curate-able hyperlinks for topics and add metadata automatically?
Edit [June 8, 2020]: The Weave Goes On :) Me and Bob got a comment from Alan Morrison that is worth inserting here, as part of the semantic web (knowledge graph) tooling for writing topic. So here:
Bob Kasenchak and Teodora Petkova—George Anadiotis and I have started a working group already to explore this kind of question. Doing due diligence currently to see how far tools such as http://www.semanlink.net/sl/home get us.
With that our Dialogue is over. But the search for meaningful, one day maybe automagical, connection on the Web isn’t! Find Bob around Twitter and let’s weave on together.