bao chi VN No 2.jpeg

AI today is trained on data from the Internet. If an increasing number of old journalism websites cease to exist or are no longer accessible, how will that impact AI’s capacity to understand Vietnam?

AI learns from what it can see. If a massive portion of Vietnamese journalistic data over the past nearly three decades vanishes from the open web, from archives, or from legally accessible datasets, AI will understand Vietnam through a fragmented memory full of holes.

An AI that is not fed with Vietnamese memory will find it highly difficult to understand Vietnam deeply. It might speak Vietnamese fluently, but it will not necessarily remember Vietnamese history accurately, Vietnamese context, or the unique layers of meaning embedded in Vietnamese life.

People worry that AI will replace journalists. But could journalism itself be one of the industries providing the original data that powers AI systems?

Yes, journalism is providing source data to AI. But I would add another thing to that observation: AI does not only need journalistic data, it also needs journalistic standards.

An AI model can learn from billions of documents, but it does not inherently know what serves the public interest, what constitutes privacy, which sources require verification, or which information could cause harm if presented without proper context. Those elements are not contained solely within data. They are embedded in professional ethics, editorial processes, and human responsibility.

In the AI era, journalism should not merely defend itself against the threat of being replaced. Journalism should step into a higher role: becoming the creator of trusted data, the guardian of information provenance, and the institution that establishes standards for how AI uses society’s knowledge.

In your opinion, is Vietnam facing the risk of "digital memory loss" when a portion of journalistic data generated over nearly 30 years since the internet appeared might no longer be fully preserved?

This risk really exists. Yet, it needs to be understood correctly: it is not that all data will vanish in a single day. The grander risk is the gradual loss of completeness, continuity, authenticity, and retrievability.

Over the last three decades, newsrooms have multiple times changed their Content Management Systems (CMS), changed domain names, changed interfaces, changed governing bodies, merged, converted models, or ceased publication. Every single time that happens, data stands before the risk of being lost, missing, stripped of correct metadata, or no longer accessible.

Digital memory does not perish in a single fire. It dies gradually through rounds of system migrations, domain changes, and organizational restructuring executed without accompanying archiving policies.

From a technological perspective, is preserving all Vietnamese digital journalism from 1997 to the present difficult or expensive? 

From a storage technology standpoint, I do not believe it is particularly difficult.

The greater challenges lie in governance, legal frameworks, copyright management, standardization, data quality, access control, and long-term operational responsibility.

The most expensive component is not the hard drive. The most expensive mistake is getting it wrong from the beginning: storing data without standards, metadata, persistent identifiers, checksums, access-control mechanisms, clear usage rights, or the ability to support research and AI development.

What would you recommend to ensure that even when news organizations merge, transform, or cease operations, the data created by society over many decades remains preserved and continues to support research, policymaking, and future AI development?

I would recommend establishing a national program for the legal deposit and preservation of Vietnam’s digital journalism.

Recently released decisions illustrate the ongoing trend toward consolidation and organizational restructuring.

Therefore, alongside institutional restructuring, there must be policies dedicated to data.

I would propose five measures.

First, establish mandatory digital legal deposit requirements for electronic journalism.

Second, whenever a news organization merges, restructures, or ceases operation, a formal data-transfer process must be required.

Third, create a National Digital Journalism Archive or a National Digital Memory Library. This repository should provide multiple levels of access for the public, researchers, government agencies, educational institutions, and AI applications operating under licensing frameworks.

Fourth, issue national standards for digital journalism data.

Fifth, recognize journalism data as part of the infrastructure supporting Vietnamese-language AI.

Restructuring the press is the work of organizations. Preserving journalistic data is the work of national memory. Yet, deeper still, rebuilding journalism in the AI era is the work of social trust.

If journalism merely chases speed, it will lose to AI. If journalism merely holds onto the past, it will turn into a dead archive. But if journalism knows how to deploy AI to elevate verification capacities, expand social memory, serve the public better, and uphold professional accountability, then journalism will not be replaced by AI. 

Tu Giang - Lan Anh