There are thousands of pages on the internet, from forums and social networks to shopping sites and libraries online. However, these pages may not last forever – the “Error 404 ”Illustrates this scenario.
Recent survey suggests that almost 40% of all content ever hosted on the internet no longer exists – at least until the end of 2023. That is, this material is no longer available by official means.
According to a Pew Research Center survey, approximately 38% of all content that existed on the internet between 2013 and 2023 can no longer be accessed . If you try to open one of these links, you will receive the classic error message 404. This code indicates that the server has not been able to find the page at the requested address, either because it has been removed, changed without redirect or the link is outdated.
Like the Internet It is always accessible, many people believe that information will remain available forever. However, the research points out that even recent pages are disappearing. About 8% of the pages that were on the air in 2023 no longer exist.
Error 404 on the rise
The Pew Research Center analyzed a sample of nearly one million pages recorded by the non -profit organization Common Crawl. The researchers found that the disappearance did not occur only on random pages: Links of government sites, great news portals, Wikipedia And other relevant domains also have several links with error 404.
“If a library burns is a tragedy, but most books survive elsewhere. But the digital world is inherently fragile and potentially ephemeral, ”explains Mark Graham, director of the project that catalogs Web Wayback Machine sites to the Business Insider website.
Although they are no longer accessible on their original sites, about two thirds of the 38% web pages that have disappeared in the last decade yet can be found in Wayback Machine . Graham explains that the project archives over one billion URLs every day – including even some YouTube videos.
Even so, Wayback Machine and other such projects cannot catalog all pages, as some sites impose obstacles such as paywalls and blockers that prevent the action of tracking robots. Paywall is a monetization model that restricts access to content and allows viewing only for site subscribers.
Internet pages disappearing
Pew Research Center data indicate that between 2013 and 2023, approximately 23% of news sites presented at least one link with error 404 . In Wikipedia, 54% of the pages contained at least one reference link that is no longer available.
Some experts believe there is an even worse problem: most data stored on the internet is under the control of large companies, such as Google. According to senior collection strategist in MIT libraries Marlene Manoff, this makes it difficult to preserve data, as these corporations may not be concerned about the conservation of the web history.
“In the long run, it is not possible to preserve a digital object in its original form. But in the case of corporate property, the likelihood of responsible and lasting management of digital content becomes less and smaller, ”Manoff told Business Insider
In addition to Wayback Machine, which belongs to Internet Archive, initiatives like Common Crawl are also cataloging billions of web pages. It is important to highlight that Common Crawl only collects data for research and analysis, while Internet Archive really preserves content for future access .
Thus, even if these initiatives cannot record all the history of the internet, a significant part of the links will remain accessible for consultation.
Internet access grows, but Brazil still has “digital excluded”
This content was originally published in Error 404: almost 40% of internet pages disappeared in 10 years on the CNN Brazil website.
Source: CNN Brasil

Charles Grill is a tech-savvy writer with over 3 years of experience in the field. He writes on a variety of technology-related topics and has a strong focus on the latest advancements in the industry. He is connected with several online news websites and is currently contributing to a technology-focused platform.