DiscoReady
FR EN
Get started
Home Blog Knowledge Graph & entities: exist for Google first
Guide May 13, 2026 · 12 min read

Knowledge Graph and entities: why Google must know your site before your readers do

Google doesn't see your site. It sees a graph of entities — people, organizations, places, concepts — interconnected by verifiable relationships. Until your publication is a node in that graph, you're publishing into the void. Here's the exact mechanism, and the protocol to become a recognized entity.

Conceptual visualization of Google's Knowledge Graph: glowin — article cards and search interface

Most publishers still reason like it's 2015: you write an article, you wait for clicks. But Google moved past that model years ago. Today, your URL isn't the unit of reasoning anymore — you are, as a publisher, as a node in a knowledge graph. If that node doesn't exist, your best article is indistinguishable from noise, and Discover never pushes it to a user.

This article describes the exact mechanics of the Knowledge Graph on Google's side, why missing entity status shuts off every Discover faucet, and the 8-step protocol to graduate your publication from "anonymous URL" to recognized editorial entity. Everything is grounded in public Google papers, the NER / Pygmalion patents, and the patterns observed across 500,000 articles in our Discover corpus.

The Knowledge Graph, in practice: what Google actually "sees"

When Googlebot consumes your page, it doesn't store the raw HTML. It extracts entities through an internal NER (Named Entity Recognition) pipeline, then tries to disambiguate each entity against a knowledge base of over 5 billion nodes (the evolution of Freebase, acquired in 2010, enriched by Wikidata, Wikipedia, Common Crawl and licensed sources).

Every recognized entity gets a stable machine identifier:

  • kg:/m/0k8z → Apple Inc. (the company)
  • kg:/m/014j1m → apple (the fruit)
  • kg:/m/0gqz → The New York Times
  • kg:/m/07ssc → The Guardian

These IDs are the reference currency across every Google system: Search, Discover, News, YouTube, Maps. When your article mentions "The New York Times," Google doesn't store the string — it stores kg:/m/0gqz. Which means any mention without entity attachment flies under the radar of semantic reasoning.

Direct consequence: two sentences that read as equivalent to a human can be radically different to Google. "The famous American newspaper announced…" is a bag of keywords. "The New York Times announced…" is an entity-action relation the graph records, scores and propagates.

Why your site, too, must be an entity

Many publishers understand that the Knowledge Graph stores the topics of their articles. What they miss: their own publication must be a node in the graph as well. Without it, none of the following signals can be computed:

  • Aggregated E-E-A-T score — Google needs a stable identifier to stack the reputation of all your articles into one bucket. No entity = no aggregation = score reset on every article.
  • Topic classification — The pipeline that decides whether you're a "tech," "lifestyle" or "sports" source relies on the mention history of your kgmid. Without a node, you're tagged generic publisher and compete with everyone.
  • Author-publication trust — The link between authors (Person schema) and publication (Organization schema) only holds if the organization resolves in the graph.
  • Cross-surface signaling — When Discover hesitates to push one of your articles, it looks at your behavior on News, Search, YouTube. Without a unique entity, those signals are unreachable.

In the Discover corpus we index daily, sites without an identifiable Knowledge Graph node represent ~62% of active domains but only 9% of Discover impression volume. That 7× gap is mechanically widened by the quality-demotion pipeline which drops these articles before final scoring (see Discover's 20+ pipelines).

How does Google decide you're an entity?

Node creation in the Knowledge Graph follows a 3-step process documented in the "A Web of Concepts" paper (Dalvi et al., 2009) and "Knowledge Graph Embeddings" (Sun et al., 2019):

1. Extraction — Recognition

The NER pipeline scans your content, schema, links, and tries to extract a candidate entity. For a domain to become a candidate:

  • Recurring mention of the publication name across content (acronym + full name)
  • Valid Organization or NewsMediaOrganization schema
  • Identifiable logo, address, founders
  • Root domain consistent with the brand name (no random-keyword-2026.com)

2. Disambiguation — Linking

Once the candidate is extracted, Google tries to tie it to an existing entity in its base. If The Daily Post is a candidate, Google compares against every Daily Post it already knows (UK paper, US weeklies, etc.) to resolve ambiguity. This is where sameAs becomes decisive: if your Organization schema points to your Wikidata entry, your LinkedIn company page, your Crunchbase profile and your official Twitter, Google has 4 anchors to say "OK, this is this entity."

3. Enrichment — Knowledge propagation

Once attached (or created as a new node), your site receives edges (relations):

  • topicalAuthority on the themes you cover most
  • memberOf if you're affiliated with a media group
  • founder / author toward named people
  • knowsAbout on recurring concepts

These edges are the capillaries through which Discover evaluates your legitimacy on every article you publish.

The 8-step protocol to become a recognized entity

Here's the exact sequence that, applied end-to-end, moves a domain from "unknown" to "Knowledge Graph entity" in 3-9 months depending on the sector. No hack — just the signals Google expects.

Step 1 — Flawless Organization (or NewsMediaOrganization) schema

The exploitable minimum includes: @type, name, url, logo (1200×400 minimum, transparent background), foundingDate, founder, and a structured address. Validate it with the Schema Auditor before publishing — a broken schema zeros out the effort.

Step 2 — sameAs pointing to ≥5 verifiable profiles

This is the main weapon. Link your Organization to:

  • Wikidata (create the entry if it doesn't exist — see step 4)
  • LinkedIn company page (official URL, not a personal profile)
  • Crunchbase (free for profile creation)
  • X / Twitter official handle
  • YouTube channel (even lightly active)
  • Facebook page (if geographically relevant)

Each sameAs is an anchor point Google can crawl and cross-check. Consistency (same name, same logo, same founding date everywhere) matters more than quantity.

Step 3 — Person schema for every recurring author

Every recurring author needs a Person schema with jobTitle, worksFor (pointing to your Organization), and sameAs toward their LinkedIn and X profiles. Without that, editorial authority doesn't "stick" to the publication — it scatters.

Step 4 — Clean Wikidata entry

Create a Wikidata page for your publication with at minimum:

  • P31 (instance of) → online newspaper or website
  • P856 (official URL) → your canonical domain
  • P571 (inception) → founding date
  • P112 (founded by) → person entity if applicable
  • P17 (country) → main country

Wikidata is the tertiary source of truth for the Knowledge Graph. A clean entry often triggers the appearance of a KG node within 30-60 days.

Step 5 — Mentions from already-recognized entities

For Google to "validate" your entity status, other already-named entities need to mention you. Practical moves: get cited in a Wikipedia article (bibliographic references), earn 2-3 backlinks from established media (interview, study reused), get your publication listed in databases like Owler, Similarweb or recognized sector directories. Quantity doesn't matter — a single backlink from The Guardian or Forbes beats 500 mentions from anonymous blogs.

Step 6 — Topic consistency

Google builds your topical authority by looking at how concentrated your publications are. A site that covers 3 well-defined topics is more easily ranked than one that touches 30. If you're a Discover publisher, pick 2-4 main categories and stick with them for at least 6 months before diversifying.

Step 7 — Active Web Profile

This is the final validation signal. Once Google has gathered enough signals, it assigns you a canonical URL profile.google.com/cp/<ID> that consolidates all your articles, your logo, your description. Check if this profile exists with our Profiler — if the URL returns your identity, you're officially a graph entity.

Step 8 — Maintenance and monitoring

Entity status isn't a once-and-done achievement. If you change authors, domain, or legal structure, the graph must be resynced via sameAs updates and Wikidata edits. A quarterly schema audit + a sameAs link check (404s, redirects) protects your node from silent demotion.

The 4 traps that keep your site invisible to Google

Across 500 audits we've run, these 4 anti-patterns account for ~80% of sites stuck at the entity stage:

  • Missing or broken Organization schema — a single name field isn't enough. Without logo, url, sameAs and foundingDate, Google doesn't even attempt resolution.
  • Anonymous authors — empty bylines or generic pseudonyms ("Editorial Team," "Admin," "John Doe"). The author-publication pipeline never primes. Result: no E-E-A-T signal propagation.
  • Non-vector or highly variable logo — Google expects a stable logo over time, in high resolution, served at the same URL. A logo that changes every 3 months or only comes in low-res PNG breaks visual resolution in the graph.
  • Too much topical noise — a site simultaneously covering "tech," "health," "finance" and "lifestyle" forces Google to spread topical authority thin, diluting the score on every sub-domain. Concentration beats coverage.

How to check your entity status today — in 3 minutes

  1. Run the DiscoReady Profiler on your domain. A non-empty profile.google.com/cp/… URL = you're an entity.
  2. Google your publication name in private browsing. A Knowledge Panel on the right = confirmed.
  3. Audit your Organization schema with our Schema Auditor — at least 5 required properties + non-empty sameAs.

If all three pass, you're ready for Discover. If one fails, you know exactly where to focus your next 90 days of effort.

Conclusion

The Knowledge Graph is the invisible layer that decides who deserves to appear on Discover. Until your site is a recognized node, you stay stuck in the quality-demotion pipeline regardless of content quality. Conversely, a well-established entity compounds on every new publication, because reputation aggregates instead of resetting to zero each article.

The entity investment is structural — not a tactical optim. It's what separates publishers capped at a few thousand Discover impressions from those harvesting millions. Start with the Profiler; the rest follows.

Free tools

Take action in 1 minute

Three free tools the editorial team uses daily — tested across French and international publishers.

📘 Want to go deeper? Grab the free Discover Essentials ebook (33 pages, 25-min read).

Frequently asked questions

What is an entity, concretely, in Google's terms?

An entity is an identifiable, unique concept — a person, organization, place, event, product — to which Google assigns a stable internal identifier (the famous kgmid or Machine-ID). Unlike a keyword whose interpretation shifts with context, an entity disambiguates: "Apple" the company (kgmid=/m/0k8z) is no longer conflated with "apple" the fruit (kgmid=/m/014j1m). Without an identifier, you don't exist inside Google's reasoning.

How do I check whether my site is already an entity?

Three quick tests: 1) Google your publication's exact name — if a Knowledge Panel shows on the right, you're an entity; 2) run your domain through the DiscoReady Profiler — an active Web Profile (URL profile.google.com/cp/…) proves Google has indexed you as a publisher; 3) inspect your Organization schema in Search Console (Structured data) — without a valid Organization tied to sameAs, the graph can't resolve you.

Why does Discover punish non-entity sites so harshly?

Because Discover has to decide within 80 ms whether to push your article to a user. Without entity status, no authority signal can be computed: no aggregated E-E-A-T score, no reliable topic classification, no cumulative reputation. The quality-demotion pipeline (see Discover's 20+ pipelines) drops these articles before final scoring — an "unknown publisher" loses ~85% of its exposure potential, based on patterns observed across 500,000 indexed articles.

How long does it take to become a recognized entity?

The typical timeline is 3-9 months for a new domain that applies the full protocol (Organization schema with sameAs pointing to ≥5 verified profiles, Wikidata presence, mentions on entities Google already knows, editorial consistency). Sites with a named team (authors with their own Person schema) and active social presence move faster — typically 4-5 months. AI content farms without identifiable authors never cross the threshold.

Is Wikipedia really required for the Knowledge Graph?

No, but it's the most powerful accelerator. The Knowledge Graph is fed by Wikidata, Wikipedia, Crunchbase, Common Crawl, IMDb, MusicBrainz and the structured data you supply yourself. A clean Wikidata entry (with P856 official URL and P31 instance of = publisher / website) often suffices to trigger node creation. Wikipedia is the ideal endgame for critical-mass publishers but is neither required nor sufficient — flawless schema + Wikidata + 3-4 verified sameAs links cover 80% of the journey.

Step 0 — Verification

Does your site have an active Google Web Profile?

No Discover tactic works if Google doesn't recognize you as an entity. 1 second to check, free.

Launch the Profiler →
Share this article
DiscoReady
✨ Written by
The DiscoReady team

The French experts on Google Discover. Our Profiler tool helps publishers detect and master their Google Web Profile — the mandatory first step to appear in Discover.