Can boosted AI save Alexa? —

As Alexa flounders, Amazon hopes homegrown generative AI can find it revenue

With voice assistants on the brink of death, Amazon targets Alexa large language model.

An Amazon Echo Dot on a nightstand.
Enlarge / An Amazon Echo Dot.

While voice assistants initially seemed to be a convenient, futuristic way to get information and perform basic tasks, they have barely graduated from that role. And the lack of evolution has left voice assistants surrounded by uncertainty. Google, for example, has shut down third-party Google Assistant smart displays and reportedly shifted Assistant manpower to Bard. But while Google Assistant and Google's experimental Bard chatbot currently feel like different products with different uses, Amazon has dreams of uniting its generative AI efforts with its struggling Alexa business.

It's no secret that belts are tightening at Amazon, compounding interest in making Alexa a strong revenue source. Alexa was reportedly set to lose $10 billion in 2022, per an Insider report, and had failed to sufficiently engage users in ways that make Amazon money. Amazon is also enduring its largest round of layoffs and last week announced it is discontinuing Halo fitness and sleep trackers.

Can generative AI generate Alexa revenue?

Amazon reportedly tried incorporating more AI into Halo before killing it—like having trackers leverage a smartphone camera and computer vision to analyze and share user workout data with Amazon. We weren't eager to trust Amazon with such AI usage; however, Amazon is reportedly shifting some of that invasive AI energy to Alexa.

A report from Insider on Tuesday cited a "leaked document" titled "Alexa LLM [large language model] Entertainment Use Cases." It reportedly details plans to make Alexa more capable of "thinking vs. fetching from a database."

The AI, an Amazon spokesperson told Insider, isn't based on an open source model like versions being developed by other Big Tech companies but, rather, a proprietary LLM called Alexa Teacher Model. Alexa has already been using it for years, but Amazon is "building new models that are much larger and much more generalized and capable" to make Alexa "more proactive and conversational," according to Amazon's rep.

The internal document, Insider said, provides an example of what this beefed-up Alexa might be able to do. One sees Alexa creating a bedtime story using a prompt from a kid, like "cat and a moon." Amazon seems keen on using cameras to aid its AI, with Insider reporting that Amazon is exploring using an Echo Show smart display camera to identify a toy the child is holding and incorporate that into the story. This sort of intimate data collection for the use of voice assistant skills, however, would likely draw concern. Earlier this week, The Verge reported that Amazon workers "expressed pause" about incorporating computer vision into the Halo subscription service.

This storytelling feature could bring revenue by encouraging business partnerships. The leaked document reportedly mentions the hypothetical child holding a toy of Olaf from Frozen that Alexa would add to its story and named Lego "and others," per Insider, as potential partners.

Amazon has been trying to bring in revenue through Alexa partnerships with the likes of Domino's and Uber; however, this newly reported possible use suggests Amazon is considering questionable techniques (that would hopefully require user permission) in the name of a unique generative AI experience. The internal memo is said to discuss making the stories interactive by asking the user to make a corresponding illustration on an Echo Show display or to add to the AI tale.

The leaked memo also looks at ways to leverage Fire TVs to boost Alexa's helpfulness, Insider reported. Fire TV sets and streaming devices are reportedly selling well for Amazon and have the potential for revenue-driving experiences that are harder to replicate with, for example, smart speakers without screens. This includes letting users interact with digital content while connecting them to Prime Video and third-party streaming services.

Channel Ars Technica