Illustrative image (Source: Internet)
Following the recent release of Voice Search in Vietnamese on Android devices, Google has announced the arrival of Vietnamese Voice Search on desktop computers via the Chrome browser.
When visiting google.com.vn on Chrome, users can click on the microphone icon to ask their question.
Amy Kunrojpanya, Head of Public Affairs & Communications for the Greater Mekong Sub-Region, Google Asia Pacific, said her company hopes this will prove useful to Vietnamese consumers and help them unlock more of the web more easily and intuitively.
According to Kunrojpanya, each time Google brings Voice Search to a new language, it teaches computers to understand the sounds and words that make up spoken language.
Google accomplished this by working with native speakers to collect speech samples to model the language, she said.
The Vietnamese-specific language model was built from the ground up.
“For Vietnamese, we worked with about 700 volunteers from universities in Hanoi and HCM City to collect about 480 hours of speech samples. Once we collected the samples, we were able to build acoustical and language models that taught computers how to ‘recognise’ Vietnamese,” she said.
Google spent two years working with the local volunteers.
“We were able to gather huge amounts of data from Google fans in Vietnam, who were eager to help. Many people opened their doors to us to help the cause of making Vietnamese awesome,” said Kunrojpanya.
She said the Vietnamese language had presented unique challenges. The major challenge was recognising tones and transcribing the diacritics correctly (for example, ca means to chant; ca, tomato; ca, fish).
Diversity of accents across Vietnam also required that Google widen the sampling and double the amount of acoustic samples that are normally collected for other languages.
Google tried to capture both northern and southern accents, spending months “on lexicon development on a complicated language”, she said.
“Voice Search can recognise regional accents in Vietnamese, but it isn’t 100 percent perfect. The good thing is that the language model improves as more people use it,” she said.
Also, since in Vietnamese writing there is a space after each syllable, it is harder to know when a word begins and ends.
In contrast, in a language like English, whole words are separated by spaces, she said.
So Google introduced special handling of Vietnamese syllables so that they could be properly interpreted in the context of other syllables around them.
There were other challenges as well. For example, many Vietnamese Google users frequently leave out accents and tone markers when they search (for example, pho instead of pho).
“So we had to create a special algorithm to ensure accents and tones were restored in the search results provided, and then our Vietnamese users would see properly formatted text in the majority of cases,” said Kunrojpanya.