Google Discover's 20+ pipelines: the real feed architecture
When people talk about Discover's "algorithm," they picture one monolithic black box. The reality is harsher: Discover chains together 20+ distinct pipelines, each capable of dropping or demoting your article. Here's the complete map.
When a publisher loses 60% of their Discover traffic overnight, the same answer comes back every time: "the algorithm changed." That phrasing is convenient for everyone except you, the person who has no idea where to look. The reality is harsher and far more actionable: Discover isn't one algorithm. It's a chain of 20+ distinct pipelines running in cascade, and your article can be killed at any one of those 20 links. Understanding that architecture is what stops you from shooting in the fog.
Discover stacks 20+ pipelines grouped into 5 phases: ingestion, classification, quality, personalization, re-ranking. Your article can fail at 20 different points, and each pipeline has its own failure signature. Knowing which one is throttling you is the difference between optimizing where it matters and "rewriting the headline" hoping it sticks.
Why pipelines, not "the algorithm"
The word "algorithm" suggests a single function: signals in one side, score out the other. That picture has been wrong for at least a decade. Google's IR research papers published between 2018 and 2024 keep describing the same architecture: a sequence of specialized classifiers passing content from one to the next, each annotating, filtering, or demoting.
The May 2024 Content Warehouse leak confirmed that structure with surgical precision. The 2,596 documented modules expose internal names like topic_embedding, helpful_content_score, nsr_data (Normalized Site Rank), chard_score (Content Effort), and imageQualityClickSignals. None of those is "the Discover algorithm." They're the separate parts the Discover chain assembles at runtime.
Officially, since 2023 Google has been talking about "many small models" — dozens of small classifiers trained independently and aggregated at inference. That's the same architecture Search has been running since BERT/MUM, and Discover inherited it. Practical consequence: a winning article has to survive 20+ successive filters, not charm a single judge.
The complete map: 5 phases, 20+ pipelines
You can group every Discover pipeline into 5 time-ordered phases. An article published at 9:00 AM walks the chain in this order, and every phase is blocking: if you fail in phase 2, you never reach phase 4. Here's the map.
- Phase 1 — Ingestion & indexation: Google discovers the URL, crawls it, renders it, extracts structural signals.
- Phase 2 — Classification & entity binding: the article is matched to topics, your site to entities, your authority on that topic is scored.
- Phase 3 — Quality scoring: editorial quality, hero image, dwell-time predictor, helpful-content classifier.
- Phase 4 — Personalization & ranking: matching against user interests, freshness boost, topic dedup, geo relevance.
- Phase 5 — Re-ranking & demotion: clickbait demotion, pogo-stick demotion, saturation cap, AI Overviews co-presence.
20+ pipelines, 5 phases, and only ~5% of published articles survive the entire chain to land in a user feed. Let's walk every phase in detail.
Phase 1 — Ingestion & indexation
This is the most mechanical, most testable phase, and the one where most small publishers lose the game without even knowing it. If Google doesn't ingest your article correctly here, nothing downstream happens.
1. Crawler reach
The discovery_url pipeline receives your URL via sitemap, internal links, hreflang, or an IndexNow ping. It decides how fast Googlebot will pass. On a media site with weak crawl budget (DA < 40, few internal links pointing to fresh news), the gap between publication and first crawl can hit 6 to 24 hours. The Discover window is 48-72h. Do the math: you burn half your window before being seen.
Failure signature: zero impressions, Search Console > Coverage > Discovered — currently not indexed.
2. Render budget (JavaScript rendering)
Once crawled, the article goes through web_rendering_service (WRS), which runs JavaScript in a headless Chromium. If your main content only appears after client hydration (SPA, aggressive lazy-load), Google may not see your title, hero image, or article:published_time. Rendering happens in waves — first pass raw HTML, second pass rendered — and Discover mostly reads the second.
Failure signature: article indexed, but with a truncated title or generic thumbnail in Discover.
3. Canonical resolution
The canonical_url pipeline deduplicates. If your article exists at multiple URLs (UTM parameters, AMP, separate mobile version), Google picks one canonical URL — and not necessarily yours. The rel=canonical tag is a hint, not an order. Bad canonicalization can route every engagement signal to a phantom URL.
Failure signature: Discover impressions on a URL you never wanted to push (often an AMP or a version missing the right JSON-LD).
4. Structured data extraction
The structured_data_parser pipeline extracts your NewsArticle or Article JSON-LD: author, publish date, image, publisher organization. A schema error — missing date, image < 1200px wide, missing author — and the article is downgraded to second-tier candidate. Discover heavily favors articles with clean schema. To check yours, the Schema Auditor tool lists the 12 critical properties Google expects (and flags classic traps like datePublished in MM/DD/YYYY format instead of ISO 8601).
Failure signature: article indexed in Search but never in Discover, hero image replaced by your logo in SERPs.
Phase 2 — Classification & entity binding
Ingestion clean, the article enters a phase of semantic understanding. Google doesn't ship your article to the entire world: it first decides who might be interested, and that calculation starts by classifying the content itself.
5. Topic classifier
The topic_classifier module (BERT/MUM-style embeddings) projects your article into the space of ~5,000 topics in Google's taxonomy. The output is a probability vector: 0.82 "soccer," 0.11 "transfers," 0.05 "premier league," etc. If your article is too vague ("sports news"), no topic crosses 0.5 and it's classified "low-confidence topic" — making it ineligible for most personalized feeds.
Failure signature: negligible impressions despite decent Search traffic.
6. Entity resolver (Knowledge Graph)
The entity_resolution pipeline links the names cited in your article to Knowledge Graph entities: "Mbappé" → MID /m/0gn30, "Real Madrid" → MID /m/0g5lhl7. If Google can't disambiguate (homonyms, lack of context, local entities without a KG card), the article stays "non-entity" and loses its ticket to the users who follow those entities. Your site itself has to be a recognized entity — that's exactly what the Profiler tool verifies by retrieving your Google Web Profile.
Failure signature: 0 impressions on hot topics where your peers are dominating the feed.
7. Site authority on topic (NSR)
The nsr_data score (Normalized Site Rank) is not a global site score — it's a site × topic score. You can have an NSR of 0.85 on "tech" and 0.12 on "cooking." Discover only pushes your article if your site's NSR on that specific topic crosses a threshold. Generalist sites that "do everything" get crushed here.
Failure signature: your "pillar" articles (topics where you have authority) crush it, but your "experimental" articles on new themes never take off.
8. Language & locale detection
The locale_classifier pipeline detects the main language and target country (via ccTLD, hreflang tags, geo of backlinks). A poorly hreflanged English article can get pushed to French speakers — where it doesn't engage — and the behavioral feedback chain penalizes it instantly.
Failure signature: abnormally low Discover CTR (< 1.5%), with unexpected source countries in Search Console.
Phase 3 — Quality scoring
Once classified, the article enters the most discriminating phase. Per our internal audits, roughly 40% of technically eligible articles are eliminated at this stage. This is where "quality" becomes a numerical score, not a slogan.
9. Editorial quality model
The chard_score module (Content Effort, exposed in the 2024 leak) quantifies perceived editorial effort: useful length, text-to-HTML ratio, paragraph depth, presence of citations, originality versus the 50 competing articles on the same topic. A duplicated/spun/AI-generated article without added value falls below threshold and gets a permanent mark on the URL.
Failure signature: decent impressions for 2-4 hours then a brutal collapse — Google recomputed the score after the first batch of engagement.
10. Hero image scoring
The imageQualityClickSignals pipeline grades your hero image: sharpness, ratio (16:9 preferred), presence of human faces (≈ 12% boost), text overload (heavy penalty), effective resolution > 1200px. A blurry image or one stored at 800px kills CTR before pipeline 17 can even measure engagement. DiscoReady's Image Validator reproduces the 8 main checks.
Failure signature: your article shows up with a blurry or weirdly cropped thumbnail in Discover.
11. Dwell-time predictor
Before even shipping the article to users, Google predicts the average time spent. The dwell_predictor module uses features like length, Flesch readability, embedded images, H2/H3 structure, and the absence of an aggressive paywall. An article with a predicted dwell < 25 seconds simply never leaves the initial sandbox.
Failure signature: article published, indexed, but 0 Discover impressions in the first hours.
12. Helpful-content classifier
The helpful_content_score is the most publicized of Google's classifiers. It globally penalizes sites that pile up thin content, AI-spam, or content lacking demonstrated expertise — and the score applies to the whole site, not the individual article. A single article published on a "unhelpful" site carries the mark.
Failure signature: coordinated collapse of Discover AND Search on the same day, across all your articles, after a Helpful Content Update.
Phase 4 — Personalization & ranking
Welcome to the phase where the article leaves pure intrinsic scoring and enters open competition. Here, your article is no longer judged alone: it's compared in real time to the ~10,000 candidates Google shortlisted for that user slot.
13. User-interest scoring
For each user, Google maintains an interest vector (built from web history, Search activity, YouTube subscriptions, past Discover swipes). The interest_match pipeline computes a dot product between your article vector (output of pipeline 5) and that user vector. Below a threshold, you don't appear in their feed — even if everything else is perfect.
Failure signature: decent global impressions but very narrow reach (few unique users).
14. Freshness boost (the 2-96h decay curve)
The freshness_boost pipeline applies a multiplier that starts at 100% at H+2 and decays to 0% by H+96. The curve is not linear — it's category-specific.
- Live sports: peak at H+2, effective death at H+6 to H+12. A Champions League final generates nothing the morning after.
- Politics & breaking news: peak at H+4, half-life at H+12, death at H+24 to H+36.
- Tech & business: gentler curve, half-life at H+24, death at H+48-72.
- Lifestyle, food, travel: later peak (H+8), half-life at H+36, death at H+72 to H+96.
- Indexed evergreen: no freshness boost, but eligible for the "evergreen recommendation" pipeline (rare, ~3% of the feed).
Concretely: if your CMS publishes at 11:47 PM while your target readers consume Discover at 7:30 AM, you burn 8 hours of your curve with no readers. It's the costliest — and most invisible — editorial scheduling mistake on Discover.
15. Topic deduplication per session
Discover doesn't show 5 articles about the same final in a single feed. The session_dedup pipeline clusters candidates by topic and keeps only 1 or 2 per session. In a cluster where the NYT, the WSJ, and CNN publish simultaneously, two of them disappear — and the selection is made on NSR (pipeline 7) and freshness (pipeline 14).
Failure signature: your article was good, but a more authoritative competitor crushed it in the cluster.
16. Location-relevance
The geo_match pipeline weights by distance between the article's target geography and the user's. A piece on a local incident in Austin gets a massive boost for users in Texas and a near-zero score in Boston. For non-geolocalized sites, this pipeline is neutral — but for regional outlets, it's the most underused lever in the chain.
Phase 5 — Re-ranking & demotion
Last filter before delivery. This phase is reactive: it keeps readjusting throughout the article's first 48 hours of life, based on behavioral signals coming back from users. This is where many "fast starters" get crushed.
17. Clickbait demotion
The clickbait_score pipeline compares your headline to the article body (semantic embeddings) and to observed CTR. A headline that overpromises triggers immediate demotion the moment CTR exceeds expectations by more than 30%. Google publicly announced in March 2024 that this demoter was strengthened by a factor of ~3 on Discover.
Failure signature: impression spike at H+2, collapse at H+4 while competitors keep climbing.
18. Pogo-stick demotion
The navigationalcounts module measures fast returns to the feed (< 10 seconds after the click). Above a certain threshold, the article gets demoted. The threshold is dynamic per category — more tolerant in news (where users skim), stricter in lifestyle (where engagement is expected).
Failure signature: decent impressions and CTR, but the article dies at H+6 while the news is still hot.
19. Topic saturation cap
No user feed has more than ~30% of cards on the same topic. The topic_saturation pipeline enforces that ceiling. Consequence: during a major event (election, final, Apple launch), even an excellent article can be benched because the "politics" or "tech" quota is full for that user's session.
20. AI Overviews co-presence (the 2025-2026 layer)
Since 2025, Discover has had to coexist with AI Overviews and AI Mode. The aim_coexist pipeline computes whether your article is cited in an AIO on an adjacent topic. Double-edged effect: being cited boosts your global credibility (authority signal), but can cannibalize the Discover click if the AIO directly answers the question. It's the single biggest disruption publishers are feeling in 2026.
There is no single Discover algorithm to charm. There are 20+ successive filters to survive — and each one kills you in a different way. Editorial competence in 2026 is knowing which one killed you.
How to diagnose where your article dropped
Now that the chain is clear, diagnosis becomes methodical. Each observable Search Console symptom maps to a probable pipeline (or small group).
- 0 impressions, ever → Phase 1. Likely: crawler reach (pipeline 1) or structured data (pipeline 4). Run an audit with Eligibility Check.
- Impressions in Search, 0 in Discover → Phase 2 (pipelines 5-7). Topic too vague or insufficient site/topic authority.
- Impression spike at H+2 then collapse at H+4-6 → Phase 5 (pipelines 17-18). Clickbait or pogo-stick.
- Decent impressions but CTR < 1.5% → Phase 3 (pipeline 10) or Phase 4 (pipeline 16). Hero image or geo/language misalignment.
- Article dies at H+24 regardless of topic → Phase 4 (pipeline 14). Wrong publication timing relative to the category-specific freshness curve.
- Coordinated collapse of Search AND Discover across the whole site → Phase 3 (pipeline 12). A Helpful Content Update flagged you.
- Good article, crushed by a competitor → Phase 4 (pipeline 15). Topic dedup, you lost the cluster.
This grid is exactly the one we'd apply after a traffic collapse. For a wider view of the strategic mistakes underneath these symptoms, the 5 myths to bury and why 95% of publishers fail at Discover cover the biases that often prevent even posing the right diagnosis. For the French-language counterpart of this analysis, see Les 20+ pipelines Discover : l'architecture cachée du feed.
Conclusion: optimize for the chain, not for "Discover"
The founding mistake most publishers make is treating Discover as a black box you charm with "a good headline and a nice image." That strategy fails on average 4 times out of 5 because it only addresses 2 or 3 pipelines out of 20. Your other 17 links stay broken — and one broken link is enough to kill the article.
The approach that works is mechanical: audit the 5 phases in order, find the 2-3 pipelines where your site is weak, and fix those first. The vast majority of traffic gain comes from 3 or 4 critical pipelines, not from cosmetic optimization everywhere. That's also the gap between general guides on Discover and a pipeline-by-pipeline audit: the first hands you the theory, the second hands you the precise link costing you 60% of your traffic.
Start with phase 1, because it's binary (crawled or not, schema valid or not) and there's no point optimizing phase 5 if phase 1 is broken. Then climb phase by phase. Every fixed pipeline unlocks the ones downstream. It's less glamorous than a "Discover strategy," but it's the only one that produces reproducible results.
Take action in 1 minute
Three free tools the editorial team uses daily — tested across French and international publishers.
📘 Want to go deeper? Grab the free Discover Essentials ebook (33 pages, 25-min read).
Frequently asked questions
Why frame Discover as 20+ pipelines rather than one algorithm?
Because that's how Google itself describes it in published papers and in the 2024 Content Warehouse leak. An "algorithm" is actually a chain of at least 20 independent sub-systems, each with its own input metric and output logic. This granularity changes everything: optimizing "for Discover" without knowing which pipeline is throttling you is shooting in the dark.
Which pipeline demotes the most articles?
The quality-demotion model tops the list: of articles that pass ingestion, roughly 40% are dropped from the feed before they're ever shown, mostly due to weak prior engagement signals on the domain. The second most filtering pipeline is topic deduplication, which prevents pushing two near-duplicate articles to the same user in one session.
Does the freshness pipeline really expire after 48h?
Not exactly — the freshness boost decays gradually from 100% to 0% between hour 2 and hour 96, depending on topic category (political news: 24-36h, lifestyle: 72-96h, live sports: 6-12h). Past that, the article stays eligible but loses the boost — only very high-authority evergreens still compete.
Can I diagnose which pipeline dropped my article?
Indirectly, yes. Search Console doesn't expose the pipelines, but the patterns are recognizable: an article with zero impressions was dropped at ingestion or quality; an article with impressions but 0 clicks got eliminated at the image/thumbnail pipeline; an article with impressions + clicks + bad dwell time is punished at the engagement pipeline and loses future exposure.
Could this architecture change overnight?
Individual pipelines evolve continuously, but the overall architecture (ingestion → classification → personalization → ranking → re-ranking) has been stable since 2022 per Google papers. Recent 2025-2026 additions are AI Overview models that plug in after ranking, not before — they extend the chain, they don't replace it.
Does your site have an active Google Web Profile?
No Discover tactic works if Google doesn't recognize you as an entity. 1 second to check, free.
Launch the Profiler →


