Amazon creates the largest text-to-speech AI

By: Dale Arasa - @inquirerdotnet

INQUIRER.net / 08:40 AM February 19, 2024

Modern artificial intelligence can perform amazing feats like creating human-like texts, photorealistic images, and believable videos.

However, many don’t realize that staggering amounts of training data are behind these astounding features. The more information you feed to an AI tool, the more likely it is to develop into an advanced program.

Amazon understands this general rule of artificial intelligence development, so it created one trained on 100,000 hours of speech data! Consequently, its BASE TTS became the largest text-to-speech model to date. The abundance of training data gives it “state-of-the-art naturalness” and “emergent” properties.

What are the features of the largest text-to-speech AI?

Features of Text-to-Speech AI — Free stock photo from Pexels

The official BASE TTS paper defines the acronym as the “Big Adaptive Streamable TTS with Emergent Abilities.” It trained on 100,000 hours of public domain speech data, making it the largest text-to-speech (TTS) model at the time of writing.

It uses a 1-billion-parameter autoregressive Transformer that turns raw text into discreet codes called speech codes. Then, a convolution-based decoder turns speechcodes into waveforms in an incremental, streamable manner.

The company also trained smaller BASE TTS variants on 10,000 hours of speech and 500 million parameters. As a result, it illustrated that its largest text-to-speech AI developed “emergent abilities.”

A previous article listed the most essential AI terms. It says emergent behavior occurs when an AI program exhibits unintended behaviors.

Article continues after this advertisement

READ: 6 Best Mouse Pads of 2022

Article continues after this advertisement

This writer cited Google’s previous Bard chatbot that learned the Bengali tongue despite not receiving relevant training. More importantly, Amazon says BASE TTS is a “high-fidelity model capable of mimicking speaker characteristics with just a few seconds of reference audio.”

The official BASE TTS webpage features several voice samples created by the text-to-speech program. Its US English samples show how it can convey subtle human inflections like sarcasm.

Article continues after this advertisement

The page also provides British English and US Spanish clips that prove the test-to-speech AI’s versatility. However, the company admits the project needs more research and development before releasing it to market.

What are Amazon’s other AI projects?

#AWS introduces the #GenerativeAI Innovation Center. 📣

This program offers workshops, engagements & trainings aimed to help organizations build & deploy generative AI solutions with the support of AWS AI & ML experts.

Learn more. 👉 https://t.co/Ja4716QQEZ pic.twitter.com/kAOC6meqFZ
— Amazon Web Services (@awscloud) June 22, 2023

Amazon is a global name for online shopping. No matter where you are, you’ve probably heard of or used its services. However, it recognizes that it must transform according to the latest technologies.

That is why Amazon Web Services spent $100 million on its Generative AI Innovation Center. Despite its name, it is a “program,” not a physical facility.

A previous Tech post shared program head Sri Elaprolu’s explanation of this project. “First, we work with the customers to identify the business opportunities and the potential generative AI use cases,” he said.

“Then, our team helps them plan and develop proofs-of-concept, and lastly, we help them prepare for production launch at scale,” Elaprolu added. Also, the Amazon press release said it would offer free courses:

“Through no-cost workshops, engagements, and training, AWS will help customers imagine and scope the use cases that will create the greatest value for their businesses, based on best practices and industry expertise.”

The program will enable customers to use AWS generative AI services CodeWhisperer and Bedrock. The former is an AI-powered programming tool.

READ: How to cancel Amazon memberships

A previous Inquirer Tech content explained how the Bedrock AI cloud service works. First, it will be a simple way to find and access high-performance foundational models (FMs).

As a result, they can perform more tasks than other AI models. Amazon’s new platform will enable clients to incorporate FMs into their systems.

Your subscription could not be saved. Please try again.

Your subscription has been successful.

Subscribe to our daily newsletter

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

Second, it will reduce the time and money needed to maintain AI infrastructure. Specifically, it will help customers find the FM suitable for their needs and deploy them quickly. Third, clients can adjust FMs with their data and integrate them with other tools.

TOPICS: Artificial Intelligence, technology

Amazon creates the largest text-to-speech AI

What are the features of the largest text-to-speech AI?

What are Amazon’s other AI projects?

Disclaimer: Comments do not represent the views of INQUIRER.net. We reserve the right to exclude comments which are inconsistent with our editorial standards. FULL DISCLAIMER

© Copyright 1997-2025 INQUIRER.net | All Rights Reserved