Knowledge Graph and entities: why Google must know your site before your readers do
Google doesn't see your site. It sees a graph of entities — people, organizations, places, concepts — interconnected by verifiable relationships. Until your publication is a node in that graph, you're publishing into the void. Here's the exact mechanism, and the protocol to become a recognized entity.
Most publishers still reason like it's 2015: you write an article, you wait for clicks. But Google moved past that model years ago. Today, your URL isn't the unit of reasoning anymore — you are, as a publisher, as a node in a knowledge graph. If that node doesn't exist, your best article is indistinguishable from noise, and Discover never pushes it to a user.
This article describes the exact mechanics of the Knowledge Graph on Google's side, why missing entity status shuts off every Discover faucet, and the 8-step protocol to graduate your publication from "anonymous URL" to recognized editorial entity. Everything is grounded in public Google papers, the NER / Pygmalion patents, and the patterns observed across 500,000 articles in our Discover corpus.
The Knowledge Graph, in practice: what Google actually "sees"
When Googlebot consumes your page, it doesn't store the raw HTML. It extracts entities through an internal NER (Named Entity Recognition) pipeline, then tries to disambiguate each entity against a knowledge base of over 5 billion nodes (the evolution of Freebase, acquired in 2010, enriched by Wikidata, Wikipedia, Common Crawl and licensed sources).
Every recognized entity gets a stable machine identifier:
kg:/m/0k8z→ Apple Inc. (the company)kg:/m/014j1m→ apple (the fruit)kg:/m/0gqz→ The New York Timeskg:/m/07ssc→ The Guardian
These IDs are the reference currency across every Google system: Search, Discover, News, YouTube, Maps. When your article mentions "The New York Times," Google doesn't store the string — it stores kg:/m/0gqz. Which means any mention without entity attachment flies under the radar of semantic reasoning.
Direct consequence: two sentences that read as equivalent to a human can be radically different to Google. "The famous American newspaper announced…" is a bag of keywords. "The New York Times announced…" is an entity-action relation the graph records, scores and propagates.
Why your site, too, must be an entity
Many publishers understand that the Knowledge Graph stores the topics of their articles. What they miss: their own publication must be a node in the graph as well. Without it, none of the following signals can be computed:
- Aggregated E-E-A-T score — Google needs a stable identifier to stack the reputation of all your articles into one bucket. No entity = no aggregation = score reset on every article.
- Topic classification — The pipeline that decides whether you're a "tech," "lifestyle" or "sports" source relies on the mention history of your
kgmid. Without a node, you're tagged generic publisher and compete with everyone. - Author-publication trust — The link between authors (Person schema) and publication (Organization schema) only holds if the organization resolves in the graph.
- Cross-surface signaling — When Discover hesitates to push one of your articles, it looks at your behavior on News, Search, YouTube. Without a unique entity, those signals are unreachable.
In the Discover corpus we index daily, sites without an identifiable Knowledge Graph node represent ~62% of active domains but only 9% of Discover impression volume. That 7× gap is mechanically widened by the quality-demotion pipeline which drops these articles before final scoring (see Discover's 20+ pipelines).
How does Google decide you're an entity?
Node creation in the Knowledge Graph follows a 3-step process documented in the "A Web of Concepts" paper (Dalvi et al., 2009) and "Knowledge Graph Embeddings" (Sun et al., 2019):
1. Extraction — Recognition
The NER pipeline scans your content, schema, links, and tries to extract a candidate entity. For a domain to become a candidate:
- Recurring mention of the publication name across content (acronym + full name)
- Valid
OrganizationorNewsMediaOrganizationschema - Identifiable logo, address, founders
- Root domain consistent with the brand name (no
random-keyword-2026.com)
2. Disambiguation — Linking
Once the candidate is extracted, Google tries to tie it to an existing entity in its base. If The Daily Post is a candidate, Google compares against every Daily Post it already knows (UK paper, US weeklies, etc.) to resolve ambiguity. This is where sameAs becomes decisive: if your Organization schema points to your Wikidata entry, your LinkedIn company page, your Crunchbase profile and your official Twitter, Google has 4 anchors to say "OK, this is this entity."
3. Enrichment — Knowledge propagation
Once attached (or created as a new node), your site receives edges (relations):
topicalAuthorityon the themes you cover mostmemberOfif you're affiliated with a media groupfounder/authortoward named peopleknowsAbouton recurring concepts
These edges are the capillaries through which Discover evaluates your legitimacy on every article you publish.
The 8-step protocol to become a recognized entity
Here's the exact sequence that, applied end-to-end, moves a domain from "unknown" to "Knowledge Graph entity" in 3-9 months depending on the sector. No hack — just the signals Google expects.
Step 1 — Flawless Organization (or NewsMediaOrganization) schema
The exploitable minimum includes: @type, name, url, logo (1200×400 minimum, transparent background), foundingDate, founder, and a structured address. Validate it with the Schema Auditor before publishing — a broken schema zeros out the effort.
Step 2 — sameAs pointing to ≥5 verifiable profiles
This is the main weapon. Link your Organization to:
- Wikidata (create the entry if it doesn't exist — see step 4)
- LinkedIn company page (official URL, not a personal profile)
- Crunchbase (free for profile creation)
- X / Twitter official handle
- YouTube channel (even lightly active)
- Facebook page (if geographically relevant)
Each sameAs is an anchor point Google can crawl and cross-check. Consistency (same name, same logo, same founding date everywhere) matters more than quantity.
Step 3 — Person schema for every recurring author
Every recurring author needs a Person schema with jobTitle, worksFor (pointing to your Organization), and sameAs toward their LinkedIn and X profiles. Without that, editorial authority doesn't "stick" to the publication — it scatters.
Step 4 — Clean Wikidata entry
Create a Wikidata page for your publication with at minimum:
P31(instance of) → online newspaper or websiteP856(official URL) → your canonical domainP571(inception) → founding dateP112(founded by) → person entity if applicableP17(country) → main country
Wikidata is the tertiary source of truth for the Knowledge Graph. A clean entry often triggers the appearance of a KG node within 30-60 days.
Step 5 — Mentions from already-recognized entities
For Google to "validate" your entity status, other already-named entities need to mention you. Practical moves: get cited in a Wikipedia article (bibliographic references), earn 2-3 backlinks from established media (interview, study reused), get your publication listed in databases like Owler, Similarweb or recognized sector directories. Quantity doesn't matter — a single backlink from The Guardian or Forbes beats 500 mentions from anonymous blogs.
Step 6 — Topic consistency
Google builds your topical authority by looking at how concentrated your publications are. A site that covers 3 well-defined topics is more easily ranked than one that touches 30. If you're a Discover publisher, pick 2-4 main categories and stick with them for at least 6 months before diversifying.
Step 7 — Active Web Profile
This is the final validation signal. Once Google has gathered enough signals, it assigns you a canonical URL profile.google.com/cp/<ID> that consolidates all your articles, your logo, your description. Check if this profile exists with our Profiler — if the URL returns your identity, you're officially a graph entity.
Step 8 — Maintenance and monitoring
Entity status isn't a once-and-done achievement. If you change authors, domain, or legal structure, the graph must be resynced via sameAs updates and Wikidata edits. A quarterly schema audit + a sameAs link check (404s, redirects) protects your node from silent demotion.
The 4 traps that keep your site invisible to Google
Across 500 audits we've run, these 4 anti-patterns account for ~80% of sites stuck at the entity stage:
- Missing or broken Organization schema — a single
namefield isn't enough. Withoutlogo,url,sameAsandfoundingDate, Google doesn't even attempt resolution. - Anonymous authors — empty bylines or generic pseudonyms ("Editorial Team," "Admin," "John Doe"). The author-publication pipeline never primes. Result: no E-E-A-T signal propagation.
- Non-vector or highly variable logo — Google expects a stable logo over time, in high resolution, served at the same URL. A logo that changes every 3 months or only comes in low-res PNG breaks visual resolution in the graph.
- Too much topical noise — a site simultaneously covering "tech," "health," "finance" and "lifestyle" forces Google to spread topical authority thin, diluting the score on every sub-domain. Concentration beats coverage.
How to check your entity status today — in 3 minutes
- Run the DiscoReady Profiler on your domain. A non-empty
profile.google.com/cp/…URL = you're an entity. - Google your publication name in private browsing. A Knowledge Panel on the right = confirmed.
- Audit your Organization schema with our Schema Auditor — at least 5 required properties + non-empty
sameAs.
If all three pass, you're ready for Discover. If one fails, you know exactly where to focus your next 90 days of effort.
Conclusion
The Knowledge Graph is the invisible layer that decides who deserves to appear on Discover. Until your site is a recognized node, you stay stuck in the quality-demotion pipeline regardless of content quality. Conversely, a well-established entity compounds on every new publication, because reputation aggregates instead of resetting to zero each article.
The entity investment is structural — not a tactical optim. It's what separates publishers capped at a few thousand Discover impressions from those harvesting millions. Start with the Profiler; the rest follows.
Take action in 1 minute
Three free tools the editorial team uses daily — tested across French and international publishers.
📘 Want to go deeper? Grab the free Discover Essentials ebook (33 pages, 25-min read).
Frequently asked questions
What is an entity, concretely, in Google's terms?
An entity is an identifiable, unique concept — a person, organization, place, event, product — to which Google assigns a stable internal identifier (the famous kgmid or Machine-ID). Unlike a keyword whose interpretation shifts with context, an entity disambiguates: "Apple" the company (kgmid=/m/0k8z) is no longer conflated with "apple" the fruit (kgmid=/m/014j1m). Without an identifier, you don't exist inside Google's reasoning.
How do I check whether my site is already an entity?
Three quick tests: 1) Google your publication's exact name — if a Knowledge Panel shows on the right, you're an entity; 2) run your domain through the DiscoReady Profiler — an active Web Profile (URL profile.google.com/cp/…) proves Google has indexed you as a publisher; 3) inspect your Organization schema in Search Console (Structured data) — without a valid Organization tied to sameAs, the graph can't resolve you.
Why does Discover punish non-entity sites so harshly?
Because Discover has to decide within 80 ms whether to push your article to a user. Without entity status, no authority signal can be computed: no aggregated E-E-A-T score, no reliable topic classification, no cumulative reputation. The quality-demotion pipeline (see Discover's 20+ pipelines) drops these articles before final scoring — an "unknown publisher" loses ~85% of its exposure potential, based on patterns observed across 500,000 indexed articles.
How long does it take to become a recognized entity?
The typical timeline is 3-9 months for a new domain that applies the full protocol (Organization schema with sameAs pointing to ≥5 verified profiles, Wikidata presence, mentions on entities Google already knows, editorial consistency). Sites with a named team (authors with their own Person schema) and active social presence move faster — typically 4-5 months. AI content farms without identifiable authors never cross the threshold.
Is Wikipedia really required for the Knowledge Graph?
No, but it's the most powerful accelerator. The Knowledge Graph is fed by Wikidata, Wikipedia, Crunchbase, Common Crawl, IMDb, MusicBrainz and the structured data you supply yourself. A clean Wikidata entry (with P856 official URL and P31 instance of = publisher / website) often suffices to trigger node creation. Wikipedia is the ideal endgame for critical-mass publishers but is neither required nor sufficient — flawless schema + Wikidata + 3-4 verified sameAs links cover 80% of the journey.
Does your site have an active Google Web Profile?
No Discover tactic works if Google doesn't recognize you as an entity. 1 second to check, free.
Launch the Profiler →


