What is that thing that bridges together content, data and knowledge? Could it be that taking a data-centric approach towards managing company’s data has little to do with tools and technologies and a lot more to do with mindset? What stops enterprises from walking the Semantic Web talk? And last but not least, isn’t machine language (broadly speaking) yet another language that one can learn in order to be able to communicate with the systems they want to exchange information with?
All these, seemingly loosely related questions, I was able to ask Alan Morrison, a Senior Research Fellow, Emerging Tech for PricewaterhouseCoopers’ Advisory Services. Alan managed to thread together with his fine-grained answers and more than 20 years expertise in emerging tech research and analysis at PwC.
Alan has consulted on a variety of IT-related client engagements and has been a featured speaker at web and data conferences such as CDOvision, SemTechBiz and Enterprise Data World. Before coming to PwC, Alan covered the RF semiconductor industry as an analyst for Strategies Unlimited, now a unit of PennWell. His military experience included five years as a Navy intelligence analyst, aircrewman, and Russian linguist. [ref. https://www.pwc.com/us/en/contacts/a/alan-morrison.html]
From our Dialogue, I found out that Alan will be starting a solo knowledge graph consultancy in December and is currently working on a book on removing the obstacles to knowledge graph adoption with his friends and colleagues Mark Ouska and Brian Stein.
Meet Alan!
Alan, when did you first meet the concept of the Semantic Web?
Teodora, I was part of a research team that did a lot of divergent + convergent research on various emerging information technology topics at PwC in the 2000s. At that point, we were publishing a quarterly called the Technology Forecast. In late 2008, for a Spring 2009 issue of the Forecast, we decided to scan for a novel solution to the nagging problems surrounding business intelligence.
The more we read about the semantic web in the divergent phase of our research, the more it seemed to be an answer to the main BI problem enterprises had–namely, how to make large scale integration of heterogeneous data possible to address the observational bias problem.
That problem, in other words, is the “drunk looking for his keys under the lamppost” problem. “Why are you looking under the lamppost, if you think your keys are over there in the dark?” he’s asked. “Because under the lamppost is where the light is,” the drunk answers.
We quoted Doug Lenat of Cycorp, who mentioned the observational bias/”looking for the keys under the lamppost” problem in our interview with him. The analogy isn’t original to Doug. Semantic graph integration makes it easier to bring more relevant data + description logic to bear on the analysis problem–the light can be in more places.
What this means in semantic web data terms is that you can join heterogeneous datasets easily, and bring different kinds of data together in ways that were difficult or impossible before. That’s powerful. Lexis-Nexis, a few years ago at an IEEE conference, talked about joining internal company data (in tables) with public property records (less structured text).
The client was a health insurance provider that had evidence of claims fraud around a hospital. People who weren’t obviously related to one another were causing accidents near the hospital and claiming injuries. Once Lexis-Nexis brought the property records into the integration, they discovered that the people causing the accidents and submitting the false claims lived quite near to each other, but had different family names. The property records, in other words, shed more light on who was causing the fraud and how they knew each other. This information wasn’t discoverable without heterogeneous integration.
In this example for this integration effort, Lexis-Nexis used their own proprietary platform that I don’t know much about. But we do know that joins are much easier with the W3C semantics methods, even with disparate kinds of data, because you’ve got a graph, which allows Tinker Toy-style integration, and you’ve got the ability to add context and thus a bit of standardization and connectability for each subgraph, making it easy to snap new pieces of the graph together with the core graph and keep it consistent.
For the convergent phase of our research, we interviewed quite a few people who were piloting semantic web techniques back then. I was the lead editor for that issue, did the research, wrote all the articles and excerpted the interviews for an issue of the Forecast that was published in 2009. I was hooked. We put a photo of a lighthouse on the cover.

PwC’s Technology Forecast, Spring 2009
We were quite bullish on the semantic web back then. In retrospect, there were four things we didn’t account for enough in our forecast:
- How alien the semantic web methods would be to enterprise IT and data management shops;
- How often enterprises couldn’t see the forest for the trees because of their preoccupation with applications, rather than interacting with data/information/knowledge more directly.
- How much tribalism and just pure ignorance or unwillingness of one tribe to learn from other tribes inhibits how technology evolves, and
- How much compute, networking and storage would have to improve to operationalize compute-intensive semantic graphs at scale.
Eleven years later, enterprises are still struggling with these problems.
Teodora: Fast forward to 2020. Recently, the EU’s Horizon 2020 Knowledge Graphs at Scale project launched. It somewhat confirms the wider adoption of knowledge graphs as ways for organizations to model knowledge and interlink data. What I miss here is the concept of content (which very much interests me).
Do you think one day knowledge graphs will be built with a view to using them for managing and publishing web content and marketing communications?
[I am following into the footsteps of Slide 5 from your presentation here:Data centric business and knowledge graph trends ]
Organizations reinforce old mentalities with how they hire and structure teams and departments. The KM folks manage “knowledge”, while the content management folks manage “content”, and the data team manages “data”.
In reality, “knowledge”, “content” and “data” are all the same thing to a machine–bits and bytes in buckets that represent people, places, things and ideas. These representations are often poorly described.
A commitment to knowledge graphs gives these three groups the opportunity to share one method and one toolchain to contextualize and better describe data, content and knowledge as commonly modeled representations. The right leader can understand this bigger picture and break down the barriers between the teams and departments.
Content management historically has dwelled on search engine optimization, and that has meant a controlled vocabulary (schema.org) for Google and Bing search.
Knowledge management has historically used taxonomies and business-specific controlled vocabularies to enable internal search.
Data management has historically used entity-relationship diagrams conceptually and modeled data at three different levels: conceptual, logical and physical. The resulting schema needs to be built first, and changes at scale (to add a column, for example) can be painful and costly.
With knowledge graphs, you’ve got one method that acts as an umbrella for vocabularies, taxonomies and ontologies. You can start simply and model as you go, change what you need to when you need to, add better logic (whether standard description logic, rules, or relationship logic) when appropriate, and encourage machine reasoning and the organic growth of the graph.
Departments should work together via the same graph, and developers should harness the model in the graph to drive their application development. Implied is a radically different organization, one that’s focused on sharing and collaboration of a unified, but distributed and sometimes federated resource.
In an interview you mentioned the near-term value knowledge graphs can deliver as a means of graph-based data management to facilitate GDPR and CCPA compliance. Can you elaborate on this?
Management of sensitive information has become a nightmare because of the pervasiveness of data siloing and code sprawl. In a traditionally laid out, silo-creating system, you’ve got to go silo by silo and figure out what’s in each silo to find the personally identifiable information (PII), not to mention unpack the application code and ponder the row and column headers for what the context of each silo might be.
Semantic graphs are a primary way to encourage desiloing, place the focus on the data and connecting logic, and disambiguate at scale so you can reliably discover and isolate PII, in the process harnessing automation more effectively. The best way to inventory your information assets and manage them is with the help of semantic web methods. Companies like data.world, Eccenca and Flur.ee are some of the innovators in this area.
The reason we see all these data breaches is because centralized data repositories with millions of correlatable identifiers and associated sensitive info are targets for thieves. How do you decentralize the storage? You use a combination of semantic graphs (to scale detection and management of PII) and the immutability, version control and cryptography of blockchains (to eliminate the need to transmit correlatable identifiers and allow individuals to keep their own personal data safe). An interview we did with Phil Windley of the Sovrin Foundation underscores how decentralized personal data protection works.
In a decentralized, personal data protection-enabled world, your personal info (such as a correlatable identifier, e.g., a passport number) stays encrypted and at rest on your phone (for example). It stays on the phone unshared. Verifying credentials does not require the exchange or duplicated storage of these identifiers, but instead just uses one-time, non-correlatable messaging, along with on-device matching (such as on an iPhone). Only the non-correlatable transactional message record is stored on-chain.
Decentralized identifiers and decentralized PII storage generally actually offer enormous risk reduction for enterprises, but most information risk professionals aren’t aware of the advantages yet.
The W3C-aligned folks behind Solid offer a comparable approach when it comes to Solid pods, and Microsoft and others are also aligned with the W3C decentralized identifier (DID) principles.
A while ago, you shared that you were a US Navy linguist in Europe in the 1980s. Can you tell us a bit more about that experience in relation to translating concepts and language. I have always wondered – isn’t machine language (broadly speaking) yet another language that one can learn in order to be able to communicate with the systems they want to exchange information with.
Linguist with a small ‘l’. Interpreter describes the role better, a kind of technical, military voice radio traffic interpreter. I got more linguistics training in college than I did in the Navy.
Your question reminds me of the machine translation systems the US Army tried to use in the 1980s. The Army published and distributed the output of those systems, but those translations weren’t worth the paper they were printed on.
It’s not like those problems were specific to the Army. Simple recognition was hard in those days. In the Navy, we had OCR (Optical Character Recognition) back then, but it was so fussy that you had to use a special, ultra-readable typeface and special paper with memo borders printed on it so you’d make sure to stay inside the margins for the memos you were typing and scanning so machines could reliably spit out consumable text on the other end. In those days, we were still using IBM Selectrics to generate the text, so a special OCR typeface meant a special OCR ball you used in your typewriter.
In each communications center, we had a teletype. That was our connection to a text-only, private internet of sorts. On the midwatches (the night watch or graveyard shift), I used to chat with other operators by typing and sending and receiving messages on the teletype. There wasn’t any magnetic storage, so the conversation all ended up printed mechanically on rolls of paper. Chatting this way was fun because it was so tactile and auditory at the same time. You could feel and hear the machine responding as you typed, and you could hear when the other party was responding.
Recognition still has its challenges, decades later. In 2020, I can’t get Google Voice Typing to recognize my spoken words accurately enough to be able to dictate into a Google Doc as fast as I can type. And I’m a slow typist.
What I learned in the Navy at that time was a form of knowledge management. It was the painstaking process of collection, filtering, analysis and as close to real-time reporting as we could manage. We did the front end of the knowledge/data lifecycle–collecting and reporting on the most perishable, relevant observations we made–on board the planes. Like reporters, we took notes on the voice transmissions we heard and typed flash reports. That information was correlated with other kinds of information from other operators on the plane, like radar signatures, before it was brought together into intelligible, more holistic form.
On the ground, we transcribed relevant parts of the recordings we made and contributed to more detailed, long form reporting when necessary that also harnessed other means of signals collection.
The 2010 business equivalent to what we were doing was the BBC’s Dynamic Semantic Publishing. DSP was a lot more machine readable, dynamic, semantic, inclusive and scalable than what we managed in the 1980s in our little analog network.
To your question about machine language, back in 2017 in a Quora answer, I said, “There’s no machine understanding without shared semantics, and no shared semantics without standards.”
RDF is a kind of a lingua franca, one that resonated with me. But it doesn’t go far enough to cross the human-to-human language barrier by identifying what’s common across languages. Role and Reference Grammar, in use at PAT Inc., may do better, but trying to get humans to understand why and how it could be better has been a challenge. Somehow, getting RDF and RRG to work together might be very helpful, but few seem to be aware that the two could be complementary and more powerful together. It’s hard enough to get modelers to standardize on something like RDF/OWL.
Another thing I’ve learned from studying emerging tech–just because a tech might make things easier, doesn’t mean it will be broadly adopted.
You are dedicated to creating Quora answers and it is interesting to know how did you get started and what motivates you to share your knowledge in such a concise and easy to understand way?
Back in the days of typewriters or pen and paper, I used to write letters. Letter writing was addictive. I’d share thoughts with family and friends.
When I was overseas in Europe during my Navy days, I’d do a form of journaling on cassette tapes and send those to my mom and dad back home in Oklahoma City. My dad asked his secretary to transcribe the tapes. She actually had fun doing it, my Dad told me later. Secretaries in law offices back then were used to transcribing audio. My dad was an oil and gas lawyer, primarily, someone who wrote very good letters himself.
Later on I’d write emails. It occurred to me last year that I’d been using email for over 30 years. It’s like you speak directly to a person or a small group in email, just like operator to operator on the teletype.
When I answer a question on Quora, it’s like I’m writing directly to the person who asked the question. I know from the question that they’re interested in the answer, and I’m a helpful Henry kind of person, so I try to help if I can. Answering helps me distill what I’ve tried to convey previously in long form, or in a slide deck, or just have in my head. And it helps me refine and articulate what I think I know.
An ex-boss of mine back when I researched and wrote market reports on compound semiconductor demand in the 1990s told me, “Because you’re focused on a market niche and study it all the time, you will become the expert on that niche. You’ll know more than anyone else knows about that niche.” He was right.
Because I’ve been preoccupied with semantics, graph databases and such and have studied them for years, I feel like I do know a lot that I can share, at least about the bigger picture. But I hardly consider myself an expert. I get a lot of requests to answer questions about the details of databases that I’m not able to answer, for example.
Sometimes I’m triggered by a question and feel I’m obligated to answer it.
I’ve written answers to over 1,000 questions on Quora. I’m surprised when others answer in a way that shows how narrowly focused the knowledge they’re sharing is. I’ve had the advantage and the blessing of a big-picture perspective that comes with being a generalist for 20 years as well as someone who’s been preoccupied with a niche as a hobby for a while.
Now I’m trying to write a book on removing the obstacles to knowledge graph adoption with Mark Ouska and Brian Stein (friends and colleagues who have deep domain KG experience and more technical depth). The challenge of book writing, though, is that I don’t feel like I’m writing directly to the one person who really needs to know the answer.
Somewhat related to that question is the question of how to translate Semantic Web technologies into benefits for the enterprise. What do you think can be a tipping point in which data users and developers start thinking: 1. beyond tables 2. Low-code 3. Less data waste.
One analogy that seems to work involves the history of the automotive industry. During the early days, 100+ years ago, owners had to learn more about how cars worked. They had to hand crank the engine of their car to start it, and often try to fix it themselves when it broke down. Most people couldn’t afford to own cars.
A hundred years later, most people don’t need to know much at all about cars, even though cars have gotten a lot more complicated. So much support infrastructure exists now that owners don’t need to know about their own cars. Others can tend to their cars for them.
In 20 years, car sharing will make even more sense than it does today. Just summon the car for when you need it, and pay by the use. You likely won’t own your own car; owners own fleets, so the fleet owner does the maintenance for the user. With electric cars, you could get away with one full charge a day and recharging the fleet at night. From a utilization perspective, it doesn’t make sense to leave cars idle 85+ percent of the time and use up a lot of parking lot space.
Semantic technologies in terms of auto industry history are in their equivalent of the era of the 1920s. Companies are just now putting the complexity under the covers and starting to paint the cars different colors than just black.
Humans will still need tables, but those who grok graphs, graph models, and the importance of organic data/information/knowledge management can use those too. It’s good to be able to shift back and forth.
Are relational databases what keeps enterprises from the data-centric future?
I used to think so, but others have suggested a middle ground. Rather than pushing a mass migration to RDF that may never happen, you could have the semantic graph team in an enterprise manage a virtual RDF/OWL layer, and leave the rest of the mess in place, since it’s not going to change anytime soon.
Companies like Timbr.ai and RelationalAI are working on such an alternative. You’re not addressing the root problems of code sprawl and data silo perpetuation this way, but you are empowering a large scale semantic integration effort, at least.
The key challenge is how to update and broaden the mentality of the organization with whatever methods you’re using. The compulsion for most is to see every problem as a nail and use RDBMSes as a hammer, along with the associated one database per application development habit, are just plain wasteful. The impulse for most is to look first to an RDBMS, because that’s what’s been comfortable for most.
Dave McComb of Semantic Arts wrote a must-read book about the power of semantic interoperation and reuse called Software Wasteland that spells out the magnitude of the waste without it, which is just staggering. Companies are spending 10 to 100 times more on development than they need to, he says.
It’s a Sisyphean task to try to change such long-established compulsions and habits. Most developers don’t learn more than they absolutely have to when it comes to databases or data modeling. How do you get them to double or triple the amount they know? The enthusiasm just isn’t there. Thus the need for a guerilla team inside each organization, people who do have the passion and knowledge, a team that has leadership backing. That passionate core needs to exist in every company serious about data/information/knowledge-centric transformation.
Businesses will have to clear space and create roles for abstract, non-linear thinkers from the humanities, philosophy and the sciences, people for whom visualizing in graphs may be more natural. We interviewed Ben Gardner of AstraZeneca earlier this week, and he has a background in the life sciences. Graphs, he said, were always more intuitive for him.
What do you think is the biggest challenge enterprises face when considering using semantic technologies for managing their data?
Opening up minds, organizations, and ways of working. You’ve got entrenched packaged suites, and the impulse is to license those, wave a magic wand, and say you’ve fixed the problem when you clearly haven’t.
Companies will have to get hands on with their data and make entities and contexts discoverable. Communicating an obscure, nuanced set of value propositions and encouraging folks to try something quite a bit different is an underrated skill.
What is the first tiny step towards building a data-centric business?
Persuading one leader to own and understand the problem, then helping them gain the courage to try the solution. You’ve got to have long-term commitment, and leadership can’t consider their company’s data someone else’s problem. It’s leadership’s problem. You’ll have to pick the right leader, the right time and the right place to make this step successfully.
Data has become a huge asset that needs to be managed wisely. If you look at market capitalization rankings, most of the top-ranked companies have generated their wealth by monetizing data. How do other companies respond?
People in the semantics community we know have been giving copies of Software Wasteland to their bosses. That’s a small step we haven’t seen before.
Who’s gonna interview the interviewer?
Teodora: And now, I would be grateful if you could spare several minutes to ask me a question or two. :)
Alan: What should everyone know that only a PhD in Semiotics tends to know?
Teodora: :) That meaning is a bird that comes to the hand that does not grasp.
Alan: What’s the oldest building you’ve been in, and what did it feel like to be there?
A Roman tomb, just a week ago. Near the town of Hissarya. It felt really strange. To be honest I had the feeling that I was not supposed to be entering this sacred place.
On a happier note, I have also recently visited Ruse’s library and its watch tower. It felt exhilarating and also this was the first time I realized my past in now a very “old” past. The ceiling of the library hosted a collection of typewriters and here I was – looking at something on which I have learned to type, now as an artefact :)

Alan: Which novel gave you the most immersive, enveloping experience and was the hardest to put down?
Teodora: Flowers for Algernon. [Recommended to me by the one and only David Amerland]
Alan: Jim Carrey says we should live in the moment. As a web-immersed person, do you live in the moment enough?
I feel I do. Not enough. Sometimes I would get caught in the threadmikl of just doing…and forget about the experiential side of life.
Alan: If you could buy an original painting today, it was available, and you had the money to buy it, which painting would you buy? {My own favorites are George Catlin’s early paintings of the Mandan and other tribes. Example: https://www.paintingstar.com/item-mandan-dance-s113713.html Other examples at https://www.paintingstar.com/artist-george-catlin-1.html. I’m almost just as happy with good prints.}
Teodora: I would get a Georgi Bozhilov’s one. Probably this one:

And with that our Dialogue is over. But the discourse continues. You can find Alan Morrison on Twitter and also enjoy his answers (letters :)) on Quora. Also. son’t forget to check his Slideshare decks, my favourite one being: Data centric business and knowledge graph trends .
Thanks for reading!