A news article looks like a stable object to the reader, since it has a URL and a headline and a timestamp sitting at the top of the page, but the version you are looking at right now is usually only one of several that the publisher has served from that same address in the course of the day. The headline gets reworked once or twice in the first hour as the desk settles on a line, the lede shifts as the story develops or as a competing outlet reframes it, and a few months later the same URL may resolve to a paywall, a 404, or a quietly edited version with a paragraph removed and no public record that anything changed. The web is set up to treat news as a stream that flows past the reader, and treating that same stream as a durable record is a separate engineering problem which, to a surprising degree, nobody has fully solved.
The Internet Archive's Wayback Machine is the closest thing the open web has to a default answer, and the work it does is genuinely extraordinary, but it was not built around the specific shape of a news story. It captures pages on a crawl schedule rather than on edits, which means the version of a headline you are reading at 14:03 in the afternoon may never have been captured at all if the crawler happened to visit at 09:00 and then again the following morning. It also does not structure its captures around the entity "this article" in the way a news-specific archive would, so reconstructing the edit history of a single piece across snapshots is a manual job that involves diffing arbitrary HTML and inferring which changes were editorial and which were boilerplate.
There has been a smaller and more specialised set of projects working at the news layer specifically, and they are interesting to read about even when, or perhaps especially when, they have stopped running. The GDELT Project ingests news across languages and regions and exposes the result as an enormous queryable dataset, which has been valuable to researchers studying media patterns but was never meant to function as a reading interface for a curious human. NewsDiffs, which ran out of MIT and Stanford for several years in the early 2010s, tracked headline and body-text changes on the New York Times, Washington Post, BBC and a handful of others and made the diffs browsable in a way that exposed, sometimes uncomfortably, the editorial choices a newsroom would rather have made invisible. The project went dormant, but the instinct behind it was the right one, and you can still find writers and researchers referencing the captures it produced.
What feels more interesting in the present generation is a group of projects that treat the aggregation surface itself as the primary artifact rather than as scaffolding for something else. The Hear is a working example of this approach: a multi-language headline aggregator that pulls timestamped updates from a configurable set of regional and international outlets, retains what it captures so the reader can step backwards in time, and lets you change country or language without losing the temporal axis you were reading along. The architectural choice underneath, which is to store the stream as a first-class object and then expose the archive as a navigable surface on top of it, is the part that makes this generation of tools different from the simple RSS readers of fifteen years ago, and it sits in the same intellectual neighbourhood as the Common Crawl news subset and the older Memex-style web-clipping tools, applied specifically to the moment-by-moment headline layer that legacy archives tend to flatten.
None of this is anywhere near a solved problem, of course, because the legal questions around archiving paywalled content, the storage questions around whether to keep every edit or only periodic snapshots, and the interface questions around how to present a story with fourteen versions to a reader who only has time for one are all genuinely difficult and largely unresolved. What seems clear is that the projects taking the headline layer seriously as something worth preserving on its own terms, rather than as a side effect of general-purpose web crawling, are quietly doing the work that the open web has needed someone to do for a long time.