LLM-Powered Knowledge Graphs for Enterprise Intelligence

§ 01

Summary (what this paper is saying)

This paper presents a practical framework for building enterprise knowledge graphs from heterogeneous internal data sources using LLMs. It addresses the specific enterprise problem of disconnected data silos — emails, calendars, chats, documents, logs — that obstruct actionable insight extraction. The framework automates entity extraction, relationship inference, and semantic enrichment across these data types, building a unified activity-centric knowledge graph. Applications demonstrated include contextual search, task prioritization, expertise discovery, personalized recommendations, and analytics across the unified graph. Experimental results show success in expertise discovery, task management, and data-driven decision making — the three highest-value enterprise knowledge management use cases.

§ 02

Core Argument

Why do enterprises need both RAG and knowledge graphs?

Enterprise data is inherently multi-modal and relational. A meeting involves people (who attended), topics (what was discussed), decisions (what was agreed), and follow-ups (what needs to happen) — these are entities and relationships, not chunks. Vector RAG over meeting transcripts finds semantically similar text; a KG over meeting data finds who made which decision, with whom, on what project.

The semantic enrichment step is the key contribution: LLMs don't just extract entities, they infer implicit relationships (this email is about the same project as that calendar event) that aren't explicit in the raw data. This is what transforms siloed data into connected knowledge.

RAG is the query interface over the resulting KG — natural language queries are answered by retrieving relevant graph substructures and associated text. The KG is the structure; RAG is the access layer.

§ 03

RAG Side (Strengths & Limits)

§ 04

Strengths:

Natural language query interface accessible to all enterprise users

Semantic search over enriched entity descriptions finds relevant nodes even without exact match

Works across heterogeneous source types without schema alignment

§ 05

Weaknesses:

Without the KG, RAG over enterprise data returns documents, not answers — "who is the expert on Project X?" returns documents mentioning Project X, not a ranked list of experts

RAG alone can't answer relational queries — "find everyone who attended a meeting with Person A in Q3 that also touched Project B" requires graph traversal, not semantic search

§ 06

Knowledge Graph Side (Strengths & Limits)

§ 07

Strengths:

Activity-centric model captures how enterprise knowledge actually flows — through meetings, decisions, and collaborations

Expertise discovery becomes a graph query: who has the most edges to nodes tagged with Topic X?

Relationship inference surfaces implicit connections between siloed data sources

§ 08

Weaknesses:

LLM-based entity extraction and relationship inference introduces errors — wrong entities, hallucinated relationships

Privacy and access control: the unified KG contains sensitive information across all silos; access must be carefully governed

Continuous maintenance: as new emails, meetings, and documents arrive, the graph must be updated incrementally

§ 09

Key Insight (the "why both" claim)

Enterprise knowledge is relational by nature. The activities that generate enterprise knowledge — meetings, decisions, collaborations, projects — are inherently graph-structured. Building a KG from enterprise data doesn't impose structure; it reveals structure that was already there but hidden in silos. RAG is then used to make that structure queryable in natural language. The combination answers questions that neither can answer alone: "Who in the Singapore office has experience with wildlife resort permitting and has worked with HeiHomes contacts?" requires graph traversal (find person nodes with relevant attributes and relationship edges) plus semantic retrieval (surface the relevant context from their documents and communications).

§ 010

Mental Model (how to think about it)

Think of the framework as building an org brain — a unified memory of everything the organization knows, connected the way the organization actually works. Instead of searching through email archives and calendar events separately, you query the brain: "Show me everything connected to the Q3 product launch." The brain returns a subgraph: the people involved, the decisions made, the documents produced, the follow-ups completed or outstanding. The KG is the brain's structure; the source documents are its detailed memories; RAG is the language interface.

§ 011

Enterprise Implications

Expertise discovery at scale: instead of asking around or searching directories, query the KG for who has experience with a topic, validated by their actual activity (meeting participation, document authorship, email discussion).

Task prioritization with context: the KG connects tasks to the decisions that generated them, the people responsible, and the deadlines agreed — giving task management systems full context, not just to-do items.

Cross-silo analytics: "which clients are discussed in the most meetings but have the fewest completed deliverables?" is a graph query that reveals at-risk relationships before they become problems.

The activity-centric model is directly applicable to any enterprise with complex internal communications — consulting firms, law firms, product companies, agencies.

§ 012

Technical Mapping

RAG → semantic query interface over the KG: natural language → entity/relationship search → subgraph retrieval → LLM generates answer from subgraph context

Graph → activity-centric KG: people, meetings, decisions, projects, documents as nodes; attended, decided, produced, followed-up as edges

How they connect:

Ingestion: LLM extracts entities and relationships from each source type → writes to KG

Enrichment: LLM infers cross-source relationships (this email references this meeting decision) → adds edges

Vector embeddings stored alongside graph nodes for semantic similarity search

Query: natural language → entity extraction → graph traversal from matched entities → retrieve associated text → generate answer

§ 013

My Critique

LLM-based relationship inference is the weakest link. The paper demonstrates it works on clean enterprise data; real enterprise data (poorly formatted emails, ambiguous meeting notes, inconsistent naming conventions) will produce significantly noisier graphs.

Privacy and access control is barely addressed. A unified KG of all enterprise communications is an extremely sensitive artifact. The paper doesn't specify how to enforce per-user access controls on graph queries.

The framework is described but not open-sourced or made reproducible in detail — it's closer to a system description than a deployable solution.

Scalability: for a large enterprise (10,000+ employees, years of communications), the KG will be enormous. Graph construction, maintenance, and query performance at that scale require infrastructure the paper doesn't address.

§ 014

When this fails

High-volume communication environments where the graph grows faster than it can be maintained

Domains with strong naming inconsistencies (same person referred to by different names/handles across systems)

When LLM relationship inference errors compound — a wrong relationship edge can make correct graph traversals return wrong answers

Cross-jurisdictional enterprises where data residency requirements prevent centralizing communications into a unified KG

§ 015

Key Takeaways for the CIO

Watchlist priority: High — closest paper in this collection to a deployable enterprise semantic layer. Act within 12 months.

This paper describes what an enterprise AI knowledge initiative looks like when executed correctly: a unified, queryable knowledge layer built from existing internal communications and documents, without requiring a data warehouse migration or a new database. The use cases demonstrated — expertise discovery, task prioritisation, decision intelligence — are among the highest-ROI AI applications available to enterprises today, regardless of industry.

The practical case for CIOs:

Your most valuable institutional knowledge is currently unsearchable. Decisions made in meetings, expertise demonstrated in email threads, commitments made in chat — this is where knowledge actually lives in most organisations. It is invisible to AI systems, non-accumulating, and lost when people leave. This framework makes it queryable and persistent.

Expertise discovery is the highest-ROI immediate application. "Who in this organisation has worked on this type of problem before?" is a question every enterprise wastes significant time answering manually. A knowledge graph built from internal communications answers it in seconds, validated by actual activity rather than self-reported directory profiles.

This is the foundation layer for AI-native operations. Before deploying intelligent agents across an enterprise, a structured representation of what the organisation knows and how it is connected is required. This framework builds that foundation from data the organisation already has.

Privacy governance must be designed before deployment, not after. A unified knowledge graph of all internal communications is among the most sensitive data artefacts an organisation can create. Who can query what, under what conditions, with what audit logging — these are not implementation details. They are prerequisites.

Data quality in source communications will determine graph quality. Inconsistent naming, poor meeting note hygiene, and email threads that do not capture decisions will produce a noisy graph. A communications quality audit should precede any deployment.

Recommended action: Commission a data audit of your highest-value communication sources. Assess data quality and access control requirements in parallel. If both are manageable, this is a strong 12-month pilot candidate.

Reference

https://arxiv.org/abs/2503.07993

Linked May 19, 2026

LLM-Powered Knowledge Graphs for Enterprise Intelligence and Analytics

Summary (what this paper is saying)

Core Argument

RAG Side (Strengths & Limits)

Strengths:

Weaknesses:

Knowledge Graph Side (Strengths & Limits)

Strengths:

Weaknesses:

Key Insight (the "why both" claim)

Mental Model (how to think about it)

Enterprise Implications

Technical Mapping

My Critique

When this fails

Key Takeaways for the CIO