Yandex has launched a new service “Search in archives”, which is based on old archives (more than 2.5 million pages of historical documents) with text transcripts made by neural networks. This was made possible thanks to technology based on optical character recognition – it takes into account the peculiarities of handwriting, recognizes letters that have ceased to be used and understands the special structure of archival documents.
The neural networks were trained on hundreds of thousands of real handwritten documents dating back to the 18th and 19th centuries, as well as on tens of millions of generated samples. All this was supervised by experts.
“It can take a professional up to half an hour to decipher one page of archival handwritten text, and our service can do it in a few seconds,” says Elena Bubnova, head of Yandex Search. “In the future, the technology can also be used to solve other tasks in Yandex products.”
Archive Search was created not to demonstrate technology, but to really help people: the service will be useful to historians, sociologists, demographers, genealogists, and even ordinary people looking for information about their family. The service allows you to quickly find documents with the right keyword, whether it’s a name, a city name, or anything else.
At the moment, the site catalog is based on the Main Archive of Moscow, as well as on the archives of the Orenburg and Novgorod regions. The database will expand in the future.
Source: Trash Box
Charles Grill is a tech-savvy writer with over 3 years of experience in the field. He writes on a variety of technology-related topics and has a strong focus on the latest advancements in the industry. He is connected with several online news websites and is currently contributing to a technology-focused platform.