New AI voice tool trained to copy British regional accents

1 month ago 373

Zoe Kleinman

Technology editor•@zsk

A new AI voice-cloning tool from a British firm claims to be able to reproduce a range of UK accents more accurately than some of its US and Chinese rivals.

Because much of the data traditionally used to train AI products with voices comes from North American or southern English speaking sources, many artificial voices tend to sound similar.

To combat this, the company Synthesia spent a year compiling its own database of UK voices with regional accents, through recording people in studios and gathering online material.

It used those to train a product called Express-Voice, which can clone a real person's voice or generate a synthetic voice.

These can be used in content such as training videos, sales support and presentations.

The company said its customers wanted more accurate regional representations.

"If you're the CEO of a company, or if you're just a regular person, when you have your likeness, you want your accent to be preserved," said Synthesia Head of Research Youssef Alami Mejjati.

He added French-speaking customers had also commented that synthetic French voices tended to sound French-Canadian rather than originating from France.

"This is just because the companies building these models tend to be North American companies, and they tend to have datasets that are biased towards the demographics that they're in," he said.

The hardest accents to mimic are the least common, Mr Mejjati said, because there is less recorded material available to train an AI model.

There are also reports that voice-prompted AI products, such as smart speakers, are more likely to struggle to understand a range of accents.

Last year, internal documents from West Midlands Police revealed worries about whether voice recognition systems would understand Brummie accents.

Meanwhile the US-based start-up Sanas is taking the opposite approach, developing tools for deployment in call centres which "neutralise" the accents of Indian and Filipino staff, as reported by Bloomberg in March.

The firm says it aims to reduce "accent discrimination" experienced by workers when callers fail to understand them.

There is concern that languages and dialects are being lost in the digital era.

"Among the over seven thousand languages that still exist today, almost half are endangered according to UNESCO; about a third have some online presence; less than 2 percent are supported by Google Translate; and according to OpenAI's own testing, only fifteen, or 0.2 percent are supported by GPT-4 [an OpenAI model] above an 80 percent accuracy," writes Karen Hao in the book Empire of AI.

"Language models are homogenising speech," agrees AI expert Henry Ajder, who advises governments and tech firms, including Synthesia.

However, the better these products become, the more effective they will also be in the hands of scammers.

Synthesia's product will not be free when it is released in the coming weeks, and will have guardrails around hate speech and explicit material.

But there are already many free, open-source voice-cloning tools which are easily accessible and less protected.

At the beginning of July, messages generated by an AI-cloned voice impersonating US Secretary of State Marco Rubio were reported to have been sent to ministers.

"The open source landscape for voice has evolved so rapidly over the last nine to 12 months," Mr Ajder adds.

"And that, from a safety perspective, is a real concern."

Read Entire Article