Where Meta is going next with AI and AR

In an interview, Meta CTO Andrew Bosworth lays out how AI is impacting the company’s big bet on a future after the smartphone

by Alex Heath

Dec 21, 2023, 11:09 PM UTC

Alex Heath is a deputy editor and author of the Command Line newsletter. He has been reporting on the tech industry for more than a decade.

Generative AI has changed a lot of product roadmaps this year. At Meta, it has big implications for AR glasses, according to CTO Andrew Bosworth.

I recently caught up with Bosworth, who goes by Boz, to chat about all things AI and AR. Our convo was timed to his annual year-in-review post, in which he writes that “the shift we have seen over the last year suggests there is a path to AI becoming a primary way that people interact with machines.”

From Meta’s perspective, the most immediate example of this shift is its latest Ray-Ban smart glasses, which have quickly managed to break out from the early adopter crowd into broader pop culture. A handful of beta testers recently gained access to Meta’s AI assistant in the glasses, which can identify objects in the world and translate languages. As I reported earlier this year, the next version of these Ray-Bans in 2025 will include a small “viewfinder” display, which Boz told me the AI assistant will use.

Separately, his Reality Labs organization is plugging away behind the scenes — and spending heavily — on developing far more advanced AR glasses. These will exist separately from the Ray-Ban line, and they represent a big, risky bet on a future world that won’t revolve around the smartphone. During our conversation, Boz confirmed that a handful of Meta employees will begin internally testing a prototype of these glasses in 2024, though a commercial release isn’t scheduled until closer to the end of this decade.

It’s clear that Meta now sees its AI assistant, which is also accessible in text form via WhatsApp and Messenger, as a critical piece of its hardware ambitions. Boz told me that he wants to integrate the assistant into products made by other companies. He also explained how the new assistant, combined with the high cost of making full-fledged AR glasses, is altering the product roadmap for Reality Labs. Read to the end for his take on deepfakes and why he doesn’t think AGI will be reached anytime soon.

The following Q&A has been edited for length and clarity:

I was just reading your [end of year] post and noticed that AR glasses weren’t really mentioned. How are those going?

I think that the post has tremendous implications for them. It’s really just the acceleration of AI relative to expectations. Two years ago, we did not think we were going to have AIs this capable. And so, we’re like, “Oh cool, what can we do differently?”

Ray-Ban Meta glasses are obviously an early indicator. This is a pair of glasses that we were excited about. We were building [them] just to have great cameras, great phone calls, great music, video livestreaming. We were pretty pumped about it. The assistant is an entire layer that we started working on six months ago and now might actually be the best feature of these glasses.

Before this, we’re thinking, “Okay, we’re going to have these glasses. They’re going to have a display. What are you going to put on there? You’re going to put your messages. You can see the photo you just took before you share it. You can share it. You can edit captions. Maybe you can scroll feeds. No, you’re going to get the answer from your assistant visually! It’s way more efficient.”

So, my sense is that the AR roadmap changes a lot. I think the products that are in the near term, which we always saw as important milestones, are much better products now. With the full-display glasses, we thought you would end up with virtual objects first, and then later you’d have contextual AI. You might [now] have contextual AI first and later you build in the immersive features. So it has profound implications for one of these major technology pillars to arrive much sooner than expected.

Has the last six months adjusted when things are shipping and the product roadmap?

The roadmap that we had was bizarrely the right roadmap for contextual AI. You want a pair of glasses that has a camera, microphones, speakers. We have that. Then, you want one that has a display. We were planning to do that. We all know that.

So no, we ended up relatively intact on the roadmap. Now, there may be more to trying to drive hardware access to the assistant at an even lower price point. Maybe more options there.

Getting the Ray-Ban Meta glasses to be launch-ready with the assistant was a huge bit of work. Now that we’re looking at 2024 plus, we’re like, “Okay, how does this roadmap change? How does it evolve? If you have this great assistant, what were the missing pieces? What do you want to do differently?” That work is still ahead of us.

Getting the assistant out there through more hardware, do you see that happening with the Ray-Ban mostly or Meta’s other products?

We’re trying to get this into the hands of as many people as we can. The good news is with WhatsApp, with Messenger, people have access to Meta AI assistant. But because these phones are locked down, you can’t get it through your AirPods. You’re not allowed to. You have your Pixel Buds? You have to use their assistant.

I think there are a lot of ways to do it. It doesn’t have to be through a pair of glasses at all. It could be something else entirely. We’re looking at the whole space. We don’t have to build it. You can partner with somebody on that.

So you want to license the Meta AI assistant to other companies? It’s a platform play for you now?

I’m not sure about licensing or not. We don’t know what the model is. Certainly, I think we’re open to partnership on it. The goal is to get people to have access to this great tool.

My assumption was that you would want to keep the AI assistant to the Ray-Bans and to Meta-only products as a way to bring value there.

One of the most important things that we need to work on in the AI side, one of the major areas of research that we are embarking on, is planning and reasoning. That includes not just sensing and understanding your environment but your own history with it. Not just a big context window or replaying the last 20 queries but the ability to search back in the history of conversation, understand what’s relevant to this, and bring it to that. These are areas that we’re working on.

I think the only way you’re going to build a great consumer experience with an assistant is that you build a history with an assistant. It understands better and better over time how to serve you no matter what channel you’re choosing to use it on. Whether you’re texting on WhatsApp or you’re speaking on Ray-Ban Meta, it shouldn’t really matter that much. To build the best consumer experience, it should be everywhere.

Is the plan to still have the internal test launch for employees for the first full-display AR glasses in 2024? Will we see that version publicly?

We’ve been working on a number of prototypes, as we always do. We do have a pretty exciting one now. We’ve actually been playing with it this year. It’s probably our most exciting prototype that we’ve had to date. I might get myself in trouble for saying this: I think it might be the most advanced piece of technology on the planet in its domain. In the domain of consumer electronics, it might be the most advanced thing that we’ve ever produced as a species.

It’s internal only because they’re extremely expensive and complicated to build and put together, and there’s a lot of variance between the devices because it’s such a new device. The goal is very much to get software running on it that we use inside the office and to have them on all day and use them as we go about our work. I think it’s true of software broadly, but for Meta in particular, to just have hands-on things and to see real-world usage.

We have all the demos, we have all the proof of experiences, we have all the time machines where you have a big, clunky thing that simulates the thing and how it will be. So we’re not starting from scratch. But there is just no substitute for using this thing in a meeting or while you’re on a call or while you’re walking or doing things. We’re excited to have that opportunity in 2024, to have enough of these so that a cadre of employees can be using them and having that software experience. It’s a real glimpse into the future. These devices are a real time machine but have the form factor and the battery life ability to be worn all day. Maybe not all day.

We’re very proud of them. At the same time, I think it’s important for us to set expectations. These things were built on a prohibitively expensive technology path. For us to return to this capability in a consumer electronics price point and form factor is the real work that we have ahead of us. It’s exciting to have a device that is spectacular in what it’s able to do but it’s also a device that is not on the same technology path that we need to pursue to make it accessible to people. I think there’s a pretty good chance that people will get a chance to play with it in 2024.

I heard you recently told employees that the full-fledged AR glasses feel like they’re a little farther away than you thought, maybe by a couple of years. So I’m wondering, what is the main gating factor that you are still experiencing?

It’s a super constrained space. Having really good, large field-of-view displays that are efficient enough to be powered all day, bright enough to compete with million-to-one contrast ratios, are clear enough for you to be excited about them and rich enough in color and texture, and also don’t create a lot of artifacts when I look in the world — it’s a really tough technical challenge.

We’ve found a path to that. The path that we found is this one that we’re going to use for these internal glasses. It works. It’s prohibitively expensive. There isn’t a real path to making it cost-effective. We thought maybe there would be breakthroughs on the drivers of cost, and that just hasn’t materialized and doesn’t look like it’s going to materialize.

There are quite a few other paths. We understand what those are. They come with real tradeoffs. Materials that we want don’t exist yet, so we have to manufacture those materials, which we’re actively pursuing. Or the materials exist but they have these challenging properties that we have to engineer our way around, which we’re also working on. That’s everything from the substrate, the waveguide, as well as the projector, [and] the display. We have parallel paths for different display systems, entirely different optical and light systems, that we’re pursuing.

So that’s one of the big challenges, for sure. I think another one that we know we have is just the power of the entire device. If you’re taking this thing off for episodic use just to put it on to use it and then putting it back in the charging case, you might as well not have AR glasses. That’s a really niche kind of use case. They have to have a degree of longevity. Maybe you can take it off and charge it during the day once or twice — you’re eating a meal, you’re taking a break. But it really has to have longevity.

It’s a pretty dynamic portfolio of technology risk, and things are always swapping themselves out for pole position. It’s upsetting. You want it to be one thing. It’s not one thing. Even within the displays, it’s not one thing. It’s efficiency trading off against optical clarity. How do you want to balance those equities? A lot of this is also product design questions, like what is the product use case? And what is acceptable then as a consequence, as a tradeoff?

So you don’t see a path to driving down costs to make these an affordable product?

We do. We have quite a few paths to do it. They just take a little more time. My sense is that this is probably a setback of a year or two from our original schedules. The buoy on that is the products that we have in the interim look like they’re going to be much more valuable and useful than we anticipated because of AI.

If you have this great AI assistant, you actually may be able to make different design tradeoffs on the hardware. There may be some ways that we can pull some of this [technology] back in because now we’ve opened up a whole new design space where some of these tradeoffs are less intense because we can rely on the AI to do more heavy lifting.

How concerned are you about deepfakes and AI-generated imagery and video, especially with an election year coming up? It seems like we’re in uncharted territory. Some people think watermarking is the answer. Some people think that an extra identity layer is the answer.

We just announced a policy around indicating that AI was used to assist in the generation of advertising and making it against our policy if you don’t do that. We have a longstanding policy around deepfakes and those things, which I think certainly become more active in this environment.

I’m pretty sure Alexander Hamilton founded the New York Post specifically to write slander about his opponents and have it look legitimate. This is an old societal problem. It’s a serious societal problem that we take seriously for whatever role we play in it.

I’m pretty skeptical on things like watermarking. There are very few things you can do digitally that cannot be reproduced digitally. That’s part of the game. So watermarks that say we’ve got a chain of custody that this is a valid and authentic piece of content can be subverted in a dangerous way. Watermarks that say this is AI-generated could be subverted in a dangerous way.

You and I grew up in a very unusual period for human history. Photos and videos that we saw for about 50 years were almost certainly real. Not never faked. Almost certainly real. Before 50 years ago, when photography became sufficiently mass market and it wasn’t just niche, all accounts we had were textual or oral. We knew that those were suspect.

From today forward, it seems very likely that the youth of our world will know that all accounts that they come across, whether it be textual, oral, or video or photographic are suspect. There was just this one weird period where it was bizarrely cheaper to produce real images and videos than it was to produce fake ones. That was never true before and won’t be true again.

We can talk about that and lament that. It does seem to be the case that this is a problem that our society, as a set of humans, has conquered before and has persisted through. I’m not judging it as a good or a bad thing. We’re going to take every ounce of responsibility we can for our little corner of the internet and try to do the best that we can for whatever the best practices are to keep consumers informed because they want to be informed. No one wants to be deceived.

So we are going to do the best that we can, absolutely. But I think there are real limits to what you can do here. I want us to be owning as an industry, as a society, that there are real limits to what we can do here.

What is Meta’s view of AGI, and what the hell does that even mean? Is it something that Meta is trying to build?

Because of Yann LeCun [Meta’s chief AI scientist], we do call it AMI, autonomous machine intelligence. So yes, we have our own version of it. What’s interesting about Yann, because he has been such a proponent of open-source large language models, is he’s also one of the fiercest critics of large language models and their limitations. He sees these big gaps in planning and reasoning and memory and theory of mind as these critical, big, meaty gaps, with each of them as big or bigger than large language models in requiring major breakthroughs. We are pursuing those technologies.

I’m of the belief that we are not within a decade of anything resembling AMI or AGI and that what we have is a really cool associative database that can do some really interesting things.

The watercooler

My notes on what else happened in the tech industry recently:

While I wasn’t surprised to see Adobe abandon its acquisition of Figma, I did expect the two companies to put up more of a fight. Going their separate ways leaves Figma, which was already doing well financially before getting a $1 billion breakup fee (3x what it has raised to date from investors), in a much stronger position than Adobe. You just don’t pay the multiple Adobe was willing to pony up for Figma unless you are trying to neutralize a future threat. While Figma employees will have to wait a little longer to retire on goat farms, I’m sure they will still get a payday soon. If the VC posts supporting CEO Dylan Field this week are any indication, he’ll have no trouble lining up an attractive tender offer for employee shares early next year.
Did anyone really think that Apple wouldn’t quickly block Beeper’s workaround for getting iMessage to work on Android phones? As sure as the sun rises, Apple will protect its lock-in. That US lawmakers are pushing for the DOJ to investigate this whole saga is rich given that Congress has so far refused to pass any meaningful laws on Big Tech regulation.

Interesting links

Levels.fyi’s end-of-year pay report for tech workers.
Apple’s new research paper on large language models.
A look at the intense, weird corporate culture of PDD Holdings, the Chinese company behind Temu.
Mastodon CEO Eugen Rochko on the state of the fediverse.
A starter guide to San Francisco for founders.
Sam Altman’s end of year post on “what I wish someone had told me.”

I’ll be back in early 2024. In the meantime, send me your feedback, tips, and ideas for improving this newsletter in the new year. Happy holidays, and thanks for being a subscriber.

See More: