New Release
Learn More
Your submission has been received!
Thank you for submitting!
Thank you for submitting!
Download your PDF
Oops! Something went wrong while submitting the form.
Table of Contents
Get weekly insights on modern data delivered to your inbox, straight from our hand-picked curations!
the following is a revised edition.
To understand and empathise with fundamental knowledge management, the first thing we need more clarity on is ‘Knowledge’ itself.
A unit of knowledge can be seen as a piece of information that allows users to reach an outcome when confronted with specific questions. Knowledge in the real world can be classified into three high-level categories:
Different types of knowledge are interchanged across people, processes, and tools. The job of a knowledge manager, like the knowledge graph, is to ensure the interchange is manageable at scale, is uncorrupted, and is easily discoverable.
📝 Editor’s Note
If we look at Knowledge as a Product, the first user requirement is Discovery. Enabling knowledge to be accessible to users (product adoption in other words). Before diving into the fundamentals of Knowledge Graphs, feel free to refer to more context around Discoverability-as-a-feature in one of our previous editions where Animesh Kumar and Travis Thompson pivot the piece on metadata models (aka knowledge graphs).
The Art of Discoverability and Reverse Engineering User Happiness
Now that we have a specific understanding of Knowledge, let’s dive into the nuances of a Knowledge Graph.
A knowledge graph is a semantic web of entities, relationships, and events. More fundamentally, it is a directed graph where every element is populated with rich information regarding itself and its relationships with other elements.
The Knowledge Graph is tasked with surfacing up-to-date and related information to users based on their specific requirements around data sourced from multiple sources.
🔑 Every data problem is a knowledge transfer problem, and every knowledge transfer problem can be formalized as a graph. Therefore, every data problem can be formalized as a graph. ~ Stephen Bailey
A Knowledge Layer contains information across the three primary layers: Data, Process, and People. This includes information about data lineage, provenance, and governance. But these are not enough. The Knowledge Graph is expected to capture information even around the dynamics of every data asset with the data ecosystem as a whole.
Glossary
The end objective of a Knowledge graph is to operationalize knowledge and make it available to users when they feed specific questions to the graph. The outcome of these questions powers integration, data recycling, and analytics.
The Knowledge Graph ideology was popularized by Google in 2012 when they publicly attributed their search solution to Knowledge Graphs. Google defined its Knowledge Graph to serve the following objectives:
📈 88% CXOs believe knowledge graphs will significantly improve the bottom line ~ Pulse Survey, 2020
Google built applications on top of their Knowledge Graph to add an additional layer for Insights. For example: When a user searches “restaurants near me,” it doesn’t just surface the specific detail (restaurant names) the user searched for. It also brings up review data, ratings, directions, and a plethora of well-curated insights that the user can instantly process to choose one data point (the restaurant) within seconds.
Tangent: Note that the human mind is built to connect dots. When knowledge is presented as a product with associated dots in the vicinity of the user, it’s easier for users to make decisions and, in fact, make it within seconds.
If billions of data points are produced for a specific object, and a user searches that object, which data point should surface? This is solved through the Knowledge Graph’s capability to include peer insights (peer here means peer data assets). SEO or Search Engine Optimization is the method to curate and surface information that, on a high level, has:
A knowledge graph is able to pool data from across siloed data sources. In Google’s case, it would mean data stored across various websites, clouds, servers, and geographies.
The semantic meaning added by Knowledge Graphs is written formally to eliminate ambiguity, make it digestible for both users and machines and enable automated reasoning to contemplate inferred reasoning.
In a knowledge graph, the description attributed to any entity or relationship also partially encompasses descriptions for related entities, which is how the big picture of a web-like structure develops.
This is, in fact, a key attribute of Knowledge Graphs- descriptions for each component partially describe other components. For example, while describing the entity ‘Cat’ as a mammal that hunts rats, the description of the entity ‘Rat’ and ‘Mammal’ gets partially defined: ‘Rat’ eaten by ‘Cat’ & ‘Mammal’ contains ‘Cat’.
Formal semantics is the process of defining meaning and context for objects through formal computational and logical tools. A Knowledge Graph can be achieved through formal semantics, and ontology is the foundation of formal semantics.
Ontology is the classification and explanation of entities and their structure. It ensures both developers and users of the knowledge graph have a shared understanding of data. In other words, ontology serves as the contract that brings a consensus around the meaning of the data between users and creators of the knowledge graph. This objective is achieved through tools such as classes, categories, relationships, or even human-friendly textual descriptions.
While taxonomies are a way to define hierarchical structures or relationships, Ontology goes a step further to add richer information to the data. Ontology is a superset of Taxonomy and can define interrelationships between the entities in the taxonomy. Therefore, an ontology can contain multiple taxonomies.
RDF is a type of data model that enables users to run CRUD operations on the data without affecting the physical data. It is a standard framework for interchanging highly interconnected data. Through RDF, users can unify or integrate data from various sources while detaching the original data and run queries on the entire global data instead of querying scattered data instances.
RDFs enable Knowledge Graphs to entail the attributes of multiple data management models:
On a fundamental level, a knowledge graph has three structural elements: Nodes, Edges, and Labels. Nodes are logical representations of real-world entities, edges are directed logical representations of the connections between these entities, and Properties are logical descriptions or features of the Nodes and Edges.
The real-world entities could be a data asset, a concept, a service, or a user, while relationships could define hierarchical associations (’subset of’), locations (’contains’), definitions (’is a’), etc.
Classes and categories are represented through Nodes, Relationships are represented through Edges, and all of them, including textual descriptions, can be represented through Properties.
Natural Language Processing, or NLP, is often used to augment Knowledge Graphs for semantic enrichment where tags, descriptions, and context are improved through AI.
Imagine the knowledge graph like a digital brain. Every time a human brain learns something new, a neuron connection is developed to retain that information’s pattern. This neuron connection is triggered when a similar situation arises where that knowledge could be applied.
With AI-Augmentation, it is possible to develop such connections at the scale every time AI is able to detect new patterns or new relationships between entities. This discovery gets wired into the knowledge graph and starts adding to subsequent queries or knowledge formation.
AI-identified relationships enable:
Thanks for reading Modern Data 101! Subscribe for free to receive new posts and support our work.
*Originally published on Medium