Amazon creates the largest text-to-speech AI

Amazon creates the largest text-to-speech AI

/ 08:40 AM February 19, 2024

Modern artificial intelligence can perform amazing feats like creating human-like texts, photorealistic images, and believable videos.

However, many don’t realize that staggering amounts of training data are behind these astounding features. The more information you feed to an AI tool, the more likely it is to develop into an advanced program.

Amazon understands this general rule of artificial intelligence development, so it created one trained on 100,000 hours of speech data! Consequently, its BASE TTS became the largest text-to-speech model to date. The abundance of training data gives it “state-of-the-art naturalness” and “emergent” properties.

What are the features of the largest text-to-speech AI?

Features of Text-to-Speech AI
Free stock photo from Pexels

The official BASE TTS paper defines the acronym as the “Big Adaptive Streamable TTS with Emergent Abilities.” It trained on 100,000 hours of public domain speech data, making it the largest text-to-speech (TTS) model at the time of writing.


It uses a 1-billion-parameter autoregressive Transformer that turns raw text into discreet codes called speech codes. Then, a convolution-based decoder turns speechcodes into waveforms in an incremental, streamable manner. 

The company also trained smaller BASE TTS variants on 10,000 hours of speech and 500 million parameters. As a result, it illustrated that its largest text-to-speech AI developed “emergent abilities.”

A previous article listed the most essential AI terms. It says emergent behavior occurs when an AI program exhibits unintended behaviors.

READ: 6 Best Mouse Pads of 2022


This writer cited Google’s previous Bard chatbot that learned the Bengali tongue despite not receiving relevant training. More importantly, Amazon says BASE TTS is a “high-fidelity model capable of mimicking speaker characteristics with just a few seconds of reference audio.”

The official BASE TTS webpage features several voice samples created by the text-to-speech program. Its US English samples show how it can convey subtle human inflections like sarcasm.


The page also provides British English and US Spanish clips that prove the test-to-speech AI’s versatility. However, the company admits the project needs more research and development before releasing it to market.

What are Amazon’s other AI projects?

Amazon is a global name for online shopping. No matter where you are, you’ve probably heard of or used its services. However, it recognizes that it must transform according to the latest technologies. 

That is why Amazon Web Services spent $100 million on its Generative AI Innovation Center. Despite its name, it is a “program,” not a physical facility. 

A previous Tech post shared program head Sri Elaprolu’s explanation of this project. “First, we work with the customers to identify the business opportunities and the potential generative AI use cases,” he said.

“Then, our team helps them plan and develop proofs-of-concept, and lastly, we help them prepare for production launch at scale,” Elaprolu added. Also, the Amazon press release said it would offer free courses:

“Through no-cost workshops, engagements, and training, AWS will help customers imagine and scope the use cases that will create the greatest value for their businesses, based on best practices and industry expertise.”

The program will enable customers to use AWS generative AI services CodeWhisperer and Bedrock. The former is an AI-powered programming tool.

READ: How to cancel Amazon memberships

A previous Inquirer Tech content explained how the Bedrock AI cloud service works. First, it will be a simple way to find and access high-performance foundational models (FMs). 

As a result, they can perform more tasks than other AI models. Amazon’s new platform will enable clients to incorporate FMs into their systems.

Your subscription could not be saved. Please try again.
Your subscription has been successful.

Subscribe to our daily newsletter

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

Second, it will reduce the time and money needed to maintain AI infrastructure. Specifically, it will help customers find the FM suitable for their needs and deploy them quickly. Third, clients can adjust FMs with their data and integrate them with other tools.

TOPICS: Artificial Intelligence, technology
TAGS: Artificial Intelligence, technology

© Copyright 1997-2024 | All Rights Reserved

We use cookies to ensure you get the best experience on our website. By continuing, you are agreeing to our use of cookies. To find out more, please click this link.