MustntMiss & NewsStoriesHeadlines: how Google identifies the news that matter
In May 2024, the Google Content Warehouse API leak exposed two internal signals nobody was supposed to see: MustntMiss and NewsStoriesHeadlines. Behind the scenes, they decide which news stories Google considers "must-not-miss" and push to Discover and Top Stories.
In May 2024, more than 14,000 internal files leaked from the Google Content Warehouse API. Among them, two names keep showing up in the analyses by SEOs who dissected the leak: MustntMiss and NewsStoriesHeadlines. Neither has ever been documented publicly by Google. Together they answer a simple question — which news stories does Google consider important enough that you shouldn't miss them? — and how they work explains why some articles land at the top of Discover or Top Stories while others, perfectly written, stay invisible.
This article unpacks the mechanic: what these two signals actually are, how they connect, what criteria Google uses to flip MustntMiss on, and most importantly what a publisher can do concretely to maximize their odds of clearing the filter.
Context: what exactly leaked?
On May 5, 2024, a private GitHub repository containing the internal documentation of Google's Content Warehouse API was briefly made public, then archived by several researchers before it closed back up. Inside: definitions for 2,596 modules and 14,014 attributes that make up the ranking pipeline behind Google Search, Discover and News.
Google never publicly acknowledged the leak — official communication was limited to a spokesperson saying not to "make assumptions out of context." But the repo wasn't pulled from public archive nor were the files DMCA'd, which the community reads as tacit acknowledgement. Our ultimate guide to what Search won't tell you walks through the major signals exposed by the leak.
NewsStoriesHeadlines: the list comes before the priority
To understand MustntMiss, you first need to understand where it applies. NewsStoriesHeadlines is a data structure bundling, for a given query or topic, every candidate headline that could be displayed in the "news" blocks Google can generate (Top Stories, news carousel in the SERP, Discover's "front page" card, etc.).
What this structure holds for each candidate:
- The editorial title (
title) - The publication timestamp (
publication_time) - The publisher source recognized as an entity (
source.entity_id) - The topical-authority score on the subject (
topical_authority) - A dynamic freshness metric (
recency_score) - And the flag we care about:
mustnt_miss(true/false)
Concretely: if your article doesn't even appear in NewsStoriesHeadlines, it's not a candidate. No candidate, no MustntMiss to flip. This is the pre-selection stage, gated by publication freshness and your domain being recognized as a legitimate source on the topic.
MustntMiss: the flag that changes everything
MustntMiss is a binary flag: an article either has it or doesn't. When it's true, Google considers the article "important enough for this user, on this topic, at this moment, that we have to show it." That unlocks three behaviors observable in production:
- Promotion at the top of Discover, sometimes for several days on long-running stories (elections, disasters, major product launches)
- Inclusion in the Top Stories carousel on the SERP even for secondary queries related to the topic
- Push notification in the Google app for users who have enabled topic alerts
The flag is recomputed continuously: an article can earn it 30 minutes after publication, lose it 6 hours later when another publisher posts a better take, and reclaim it if you update with exclusive information. It's not a frozen decision — it's a living signal.
The 3 conditions to unlock MustntMiss
The leak doesn't reveal the exact formula (the coefficients stay private), but the attributes read by the MustntMissScorer module are named. They cluster around three signal families that all three need to be green at the same time.
1. Strict freshness
Topics eligible for MustntMiss are almost always new events or reactions to one. The module reads recency_score, which decays fast after publication: very high for the first 2 hours, still elevated up to 6 hours, then drops sharply. Beyond 24 hours, MustntMiss eligibility falls to zero on most "hot" topics.
Practical takeaway: if you cover a breaking topic, publish fast. The newsroom posting a clean reaction 90 minutes after the event beats the newsroom dropping an ultra-thorough deep dive 8 hours later, even if the second piece is better on substance.
2. Topical authority (recognized entity)
Google doesn't distribute a news story by first asking "is this a good article?" It asks "is this source recognized as legitimate on this specific topic?" The signal read, topical_authority, depends on your publisher entity's Web Profile and your historical publishing record on the topic cluster.
Without a validated entity, no matter the news quality: MustntMiss stays false. That's why well-written articles from small sites get skipped while short wires from established outlets earn the flag. The Profiler tests this entity eligibility in one second.
3. Intense early engagement
The first 30 minutes after publication are a test window. The module watches early_engagement_burst — an aggregate of CTR, dwell time, shares and reading-completion on a sample of users. If the metric clears a threshold (varies by topic, but around the 75th percentile of your cluster), the MustntMiss flag flips to true.
This signal is self-reinforcing: an article that earns the flag gets promoted, which drives more traffic, which sustains strong engagement, which holds the flag longer. Conversely, an article that misses the 30-minute window has little chance of recovering it.
What publishers can actually do
You don't "force" MustntMiss. But you can maximize the odds it flips on, by acting on the three levers that drive it:
- Ultra-fast publishing cadence on hot topics: aim for live within 60-120 minutes after the event, even if you keep enriching the article through successive updates (Google revalues the
article:modified_time). - Entity solidified before the scoop: recognition is built over weeks (regular publishing, inbound citations, clean Knowledge Panel). The moment of the event is too late to construct that authority — it must already be there. The guide on how Discover works details entity construction.
- Editorial hero instead of stock: initial CTR depends massively on the image. Generic stock image = low CTR = no
MustntMissflip. Original photo or unique editorial illustration = 2-3× higher CTR. - No friction above the fold: zero newsletter pop-up, zero disguised paywall, zero cookie overlay covering content. First-30-seconds dwell time decides.
- Emotional and precise H1: a title like "What the ECB announcement actually changes for your savings" beats "ECB: new monetary policy announcement." The first triggers a click; the second informs without inviting one.
Does the leak still hold in 2026?
Legitimate question, two years on. Short answer: yes. Variable names (MustntMiss, NewsStoriesHeadlines, relatedSourcesGoodWithType, etc.) have stayed stable in the indirect references Google does publish (Schema.org documentation, their own research papers, etc.). The mechanic observable in production — established sites with solid entities sustainably winning Discover, young sites struggling despite quality content — fits exactly the picture the leak painted.
Until Google rebuilds its pipeline from scratch (which would take years), the signals named in 2024 remain the right levers to pull in 2026.
The right reflex before the next news story
Next time you publish on a breaking topic, ask yourself three questions, in this order:
- Am I recognized by Google as an entity on this topic? If not, the article won't even be listed in
NewsStoriesHeadlines. - Will I publish within the 2-hour window after the event? If not,
recency_scorewill be too low to flipMustntMiss. - Will the hero, H1, and first paragraph generate strong initial engagement? If not, the 30-minute window will be missed.
Three yes = real shot at unlocking MustntMiss. Two yes = trying is still justified. One yes = the news won't break through, treat it as evergreen and aim for classic SEO instead. The first question — entity eligibility — is the one that has to be settled in advance, not on the day of the news.
Frequently asked questions
What exactly is MustntMiss?
It's an internal binary flag in Google's Content Warehouse system, exposed by the May 2024 leak. It marks an article as "must-not-miss" — meaning a news story Google considers important enough that the user shouldn't miss it. When the flag is positive, the article becomes a priority candidate for Discover, Top Stories, and the mobile homepage news card.
And what's NewsStoriesHeadlines, what's the difference?
NewsStoriesHeadlines is the data structure that bundles candidate headlines for the "news" blocks across Google surfaces. It's the inventory; MustntMiss is the priority tag attached to it. Without a populated NewsStoriesHeadlines, no selection happens; without MustntMiss set, no priority promotion.
How does Google decide an article deserves MustntMiss?
Three signal families emerge from the leak: freshness (published within the last 6 hours for hot topics), editorial authority on the topic (recognized entity, regular publishing cadence), and early engagement intensity (CTR, dwell time, shares in the first 30 minutes). None is sufficient alone; all three must line up.
Can we "force" MustntMiss on an article?
No — it's an internal computation, not a meta tag. But you can maximize the probability: publish within 1-2 hours of the event, have a validated Web Profile on the topic, polish the hero/H1 for fast CTR, and remove frictions (pop-ups, paywall) that crater dwell time. The Profiler checks the first prerequisite in one second.
Is the leak still relevant in 2026?
Variable names (MustntMiss, NewsStoriesHeadlines, relatedSourcesGoodWithType, etc.) have been stable for 2 years: Google never publicly acknowledged the leak but never unpublished the API either. The mechanics documented in 2024 are still observable in production — that's what makes these signals actionable today for Discover optimization.
Does your site have an active Google Web Profile?
No Discover tactic works if Google doesn't recognize you as an entity. 1 second to check, free.
Launch the Profiler →


