
Amid the global AI boom, Vietnam has officially launched the first beta version of the ViGen platform, a joint effort between Meta, the National Innovation Center (NIC), and the AI for Vietnam organization.
The project kicked off in March 2025 and quickly attracted the participation of major partners such as NVIDIA, Viettel, the Institute of Information Technology at the Vietnam Academy of Science and Technology, Hanoi University of Science and Technology, and the Post and Telecommunications Institute of Technology.
ViGen is a step in implementing the national strategy on AI research, development, and application through 2030. The project's goal is to create a high-quality, open-source Vietnamese dataset for large language models (LLMs), helping AI better understand the Vietnamese language, culture, and society.
In its initial phase, ViGen has achieved three key milestones, including Primer 1.0, the largest open Vietnamese pre-training dataset to date, with 50 billion carefully selected tokens from over 150 billion raw tokens.
The dataset spans from preschool to university-level knowledge, enabling AI models to reach a level of understanding equivalent to “a top-performing university graduate, with both knowledge and critical thinking ability.”
ViGen also introduced five benchmarking frameworks. With more than 10,000 test samples, these benchmarks evaluate AI capabilities in various areas, such as knowledge, logical reasoning, programming, and understanding Vietnamese language and culture.
Finally, the ViGen beta platform is an open space where citizens can log in using VNeID to contribute data (text, voice, video, etc.), integrated with a “competition-reward” mechanism to encourage community participation.
Tran Viet Hung, founder of AI for Vietnam, said: “If we were to build these datasets from scratch, we would fall far behind countries that have invested heavily and moved ahead.
The project, therefore, adopts an entirely new approach: collective data building.”
“We have 100 million Vietnamese speakers. If we all contribute together, the speed will be incredibly fast, and this will be the first initiative of its kind in the world,” he added.
Philip Chua of Meta noted that the launch of the ViGen platform is a significant milestone, reflecting the belief that open-source AI can help Vietnamese researchers and businesses build solutions that truly understand Vietnam’s culture and values.
He added that open data not only serves domestic research but also gives Vietnam a voice in the global AI landscape.
Vo Xuan Hoai, Deputy Director of NIC, said: “The ViGen platform clearly shows the role of public–private cooperation in achieving national goals in science, technology, and innovation. We are building not just technology, but also a foundation for sustainable AI-driven growth.”
A notable feature of ViGen is its openness and community orientation. Citizens can directly participate by uploading data to the system. The data is then processed and filtered for inclusion in the training set.
Contributors are credited for their efforts and may even receive rewards. This new approach turns the often dry task of “building data” into an interactive and enjoyable activity.
Under its three-year roadmap, ViGen will continue expanding: in 2026, the project will add fine-tuning datasets, developer tools, and organize national AI competitions; and by 2027, it will update datasets and develop advanced tools to support widespread AI adoption in businesses.
With the participation of companies, research institutes, universities, and the public, ViGen is expected to make AI a practical tool for every Vietnamese person.
As Philip Chua noted, “We hope this will be a key platform for Vietnam’s AI ecosystem. The ViGen project will foster collaboration and support the development of Vietnam-led solutions to drive regional economic growth.”
To build a comprehensive dataset that supports overall economic development, both the public and private sectors must participate.
Nguyen Thu Thao from Meta said that at the end of last year, Professor Yann LeCun, Vice President and Chief AI Scientist at Meta visited Vietnam and met with Minister of Science and Technology Nguyen Manh Hung.
During the meeting, the minister asked for Meta's support in technology, funding, and human resources to help develop the ecosystem, enabling Vietnamese enterprises, especially tech companies, to access a Vietnamese-language data platform.
The ViGen project was born from this idea and officially announced in March.
“Vietnamese is a difficult language and is considered low-resource, which is why it has not been prioritized by major tech corporations,” Thao said. “No existing platform has a complete, comprehensive, high-quality Vietnamese dataset that reflects the country’s history, culture, linguistic beauty, social characteristics, and moral values.”
Du Lam