What matters to you.
0:00
0:00
NEXT UP:
 
Top
PoT_3000x3000 TM

How AI Deepfakes Are Really Made | Hany Farid

14:18 |

About The Episode

The deepfake game is getting real. Deepfake detective Hany Farid gets under the hood of AI and explains exactly how it can now make such convincing fake content – so convincing that he himself has trouble identifying his own real voice from an AI-generated fraud. Can you spot the deepfake?

For more, check out the extended interview with Hany Farid.

Learn more about NOVA and subscribe to our YouTube channel.

Hany Farid:

So let's talk about deepfakes, which is this sort of sliver of all of this. Deepfakes is an umbrella term for using machine learning, AI to, whole cloth, create images, audio, and video of things that have never existed or happened. For example, I can go to my favorite deepfake generator and say, "Give me an image of Hakeem in a studio doing a podcast with Professor Hany Farid." Actually, it would do a pretty good job because you have a presence online, I have somewhat of a presence online, it knows what we look like, and it would generate an image that's not exactly this, but something like that, or I can say... By the way, I still say please, when I ask AI for things. One of my students told me that this is a good idea because when the AI overlords come, they're going to remember you were polite to them. I actually really liked this advice.

Hakeem Oluseyi:

Wait a minute. So I read an article-

Hany Farid:

That it costs tens of millions of dollars.

Hakeem Oluseyi:

That's right.

Hany Farid:

It is the energy.

Hakeem Oluseyi:

Yes.

Hany Farid:

Just saying please and thank you. I still do it, by the way. Even in my head right there, when I was asked, I still in my head say please.

Hakeem Oluseyi:

Well, listen... I have AI connected to my AI, and so my AI corrects my AI prompts to proper grammar and it's like, "Please..." It puts please in there.

Hany Farid:

I know, and it does cost tens of millions of dollars for that extra token. I will ask it for an image of a unicorn wearing a red clown hat walking down the street of Times Square, and it will generate that image. I can ask, "Generate an audio of Professor Hany Farid saying the following..." I can generate a video of me saying and doing things I never did. You can clearly see the power of that technology from a creative perspective. If you and I are having a conversation and, in post, we said something we didn't mean to, we can just fill it in with AI now.

Hakeem Oluseyi:

Well, here's the thing that makes me... You just mentioned how we're only two, three years into this. However good it is now, what is the-

Hany Farid:

This is the worst it will ever be. I can tell you, by the way, how good it is. In addition being trained as a computer scientist and applied mathematician, I've been somewhat trained as a cognitive neuroscientist, and we do perceptual studies. What we do is we recruit participants, we show them images, audio clips, and video, and we tell them, "Half of the things you're going to look at are real. Half of the things are AI generated." We explain to them what AI generated is. We give them examples of that. For images, as of last year, people are roughly at chance at distinguishing a real photo from an AI-generated photo.

Hakeem Oluseyi:

So what you mean by that is, if you had a monkey behind the keyboard-

Hany Farid:

Yes, flipping a coin.

Hakeem Oluseyi:

... flipping a coin.

Hany Farid:

Yeah, the monkey's probably better than you, by the way. I'm going to go off and guess. With audio, we play a clip of somebody speaking, like you, and then we play an AI-generated version. They're slightly above chance, 65%.

Hakeem Oluseyi:

All image, at chance, and, audio, slightly little better than chance.

Hany Farid:

In video, they're a little bit better. But all of those trends are going towards chance.

So here's what we know. Everything in the next 12 months, 18 months, 24 months, I don't know what the number is, it will be indistinguishable to the average person online, and that is-

Hakeem Oluseyi:

Scary?

Hany Farid:

That's a weird world we're living in because think about how much... First of all, the vast majority of Americans now get the majority of their information from online sources, and unfortunately from social media too and because it is so easy to create this content. Understand, all this is is a text prompt away. I type, "Please, give me an image of this. Generate this audio. Generate this video." There are dozens of services that will do this extremely inexpensive or for free, and you can carpet-bomb the internet with fake images of the conflict in Gaza, fake images of-

Hakeem Oluseyi:

I have seen these.

Hany Farid:

I have seen them too, fake images of the flood in Texas, fake images and video of the fires and... Name it, across the boards. Fake images of people stuffing ballot boxes, now, we have a threat to our democracy. Suddenly, our sense of reality, coming back to your first very good question, is up in the air because I can create whatever reality I want... and understand that there's sort of three things happening here when we talk about deepfakes. There's the creation of it, that's what we've been talking about. There's the distribution, which we democratized 20 years ago. So anybody can publish to the world, and that's very powerful and very terrifying because there's no editorial standards on social media. And then there's the amplification that we have become so polarized as a society that, when you see things that conform to your world view, you are more than happy to click like, reshare.

Now, you have creation, distribution, amplification. That's the ballgame. That's the ballgame for spreading massive lies, conspiracies, and disinformation campaigns that affect our global health, our planet's health, our democracy, our economy, everything. Everything.

Hakeem Oluseyi:

So let's get into how these fakes are generated.

Hany Farid:

Good, yeah.

Hakeem Oluseyi:

Start with images.

Hany Farid:

Good. So let's start with images because, in some ways, it's the easiest one, but all of these have a similar theme. One of my favorite techniques for generating images is called a generative adversarial network, or a GAN, and here's how it works.

Hakeem Oluseyi:

Wait a minute. Adversarial?

Hany Farid:

Adversarial, yeah.

Hakeem Oluseyi:

So that means that you're fighting your computer?

Hany Farid:

Two computer systems are fighting each other, and this is sort of the genius of this technique. So here's how it works. You have two systems. One system's job is to make an image of a person or a landscape or whatever you want. What it does, it starts by... this is literally true, it just splats down a bunch of random pixels. I say, "Generate an image of a person," and it says, "Okay, here's a bunch of..." Think the monkey's at the keyboard typing randomly. Let's see if this is Shakespeare. And then it takes that image and it hands it to a second system and it says, "Is this a face?" and that system has access to millions and millions of images that it's scraped from the internet that are faces.

Hakeem Oluseyi:

I see.

Hany Farid:

And that system says, "That thing that you generated doesn't look like these things over here," and it gives the feedback to the generator and it says, "Nope, try again." Modify some pixels, send it back to what's called the discriminator. "Is it a face?" "No, try again." And they work in this adversarial loop, so it's like somebody's checking your homework.

Hakeem Oluseyi:

It seems like it could get stuck never getting to a face.

Hany Farid:

You would think, and that's what's amazing about the GANs is that they converge. They converge. Part of that is the way they've been trained, but that's what's the genius of this is that the generator is not very smart because all it's doing is modifying pixels, and the discriminator is actually quite simple. It's simply saying, "Does this thing look like these things?" And because you pit them against each other in this adversarial game, this sort of amazing thing happens out the other side.

Hakeem Oluseyi:

So here's the question, on average, how many iterations does it take? And then how much time does that translate to the real world?

Hany Farid:

Yeah, that's a great question. Typically, the time is in seconds. There's two phases. You train the GANs, that's a really long process. But then what we call inference, which is that, "Run this thing," it happens in seconds, and the reason it happens in seconds is... By the way, that is hundreds of thousands of iterations, but it's on a GPU, which is very powerful and very fast. And then there's these tricks to make it even faster. You start with small images and then you make them bigger over time, so there's these tricks to make, but it is literally seconds to make that image. What the brilliance of that is the two systems are competing with each other. And then this thing that seems like intelligence come out, even though it's not. If you think about those two individual components, they're pretty basic.

Hakeem Oluseyi:

They're pretty dumb.

Hany Farid:

But then you have this emergent behavior almost. It's like, "You know how to generate images of people. That's amazing."

Hakeem Oluseyi:

So let's have a little fun. I understand that you brought me some fakes and some real images to put to the test, to see if I can discern the difference.

Hany Farid:

Yeah. I'm going to play for you a couple of audios. Before I do this, let me say, I've been doing this for a long time and I'm pretty good at it. I'm pretty good at what I do. I had created three audio samples. I'm going to play them for you.

Hakeem Oluseyi:

Wait, are you allowed to say that, that you're good at what you do? I'll say that. Hany is really good at what he does.

Hany Farid:

I said pretty good, by the way.

Hakeem Oluseyi:

He's amazing.

Hany Farid:

This is a true story, by the way. I made three audio clips for you of me talking, and you and I have been talking for a little while, so you now know what my voice sounds like. I got off the plane and I was in the car coming over here. I wanted to make sure they worked and I played all three of them, and I couldn't tell which one of me was real or fake. I wasn't 100% sure.

Hakeem Oluseyi:

Wow.

Hany Farid:

And I do this for a living, and it's my voice.

Hakeem Oluseyi:

Right, yeah.

Hany Farid:

Okay, so that is... Okay.

Hakeem Oluseyi:

Wait a minute... Which AI did you use? This was something that you created or something that's generally available?

Hany Farid:

Here's the thing you have to understand about AIs, this is so readily available. Here's what I did. I went to a service, it's a commercial service. I uploaded... I think it was about three minutes of my voice. I said, "Please, clone my voice," and it clones my voice. What I mean by that is that it learns the patterns of my voice, what I sound like, the intonation, my cadence, how fast I speak, where I put the pauses. And then I can simply type and have it say anything I want to say. I'm going to have you listen to three sentences. I'm going to give you a hint. One of them is fake and two are real.

Hakeem Oluseyi:

Okay.

Hany Farid:

Let's see what we can do.

Hakeem Oluseyi:

Let's see. All right.

Hany Farid:

Okay, here we go. And in fairness, this is not the best speaker, but okay...

Hakeem Oluseyi:

Are there guardrails in our law?

Hany Farid:

Good. First of all, when I went to do this service, I uploaded my voice and there's a button that says, "Do you have permission to use this person's voice?" I did because it was my voice, but I can upload anybody's voice and click a button. The laws are very complicated and they actually vary state to state and, of course, internationally. So there are almost no guardrails on grabbing people's likeness. Even if there were, there's-

Hakeem Oluseyi:

You could still do it anyway.

Hany Farid:

There's no stopping this. There's no stopping it. Okay. All right, number one... Oh, and, by the way, the three... This is part of a talk I gave recently on deepfakes, so you'll hear a consecutive thing. Okay, ready?

Audio:

And if you invite me back next year, almost certainly everything will have changed, the nature of creation of deepfakes, the risk of deepfakes, and the detection of deepfakes-

Hakeem Oluseyi:

That's the deepfake right there, man.

Audio:

... is changing.

Hany Farid:

Hold on. That was good.

Audio:

It is a fast moving field, and we have to start thinking seriously and carefully about the threat of misinformation.

Hany Farid:

Okay, good, and one more.

Audio:

We are living through an unprecedented time where we are relying more and more on the internet for information, for information that affects our health, our societies, our democracies, and our economies.

Hakeem Oluseyi:

Can I hear number one again?

Hany Farid:

Yeah. You're a little less sure than you were a minute ago.

Hakeem Oluseyi:

Yeah.

Hany Farid:

And if you invite me back next year, almost certainly everything will have changed. The nature of creation of deepfakes, the risk of deepfakes, and the detection of deepfakes is changing.

Hakeem Oluseyi:

I think it's the first one still. I got it right?

Hany Farid:

Yeah. I struggled with it, by the way. Honestly, I couldn't remember.

Hakeem Oluseyi:

I'm from the future.

Hany Farid:

You're the time traveler, it turns out.

Hakeem Oluseyi:

Wow. Well, you know what? I started my media work in audio being a voice actor and, very quickly, I was able to pick up on music and commercials and movies where they were dropping in, pickups versus [inaudible].

Hany Farid:

The reason I figured it out is there's a difference in the background noise. One had more reverb than the other, which is how I then remembered it. You got to admit, all three of them sound like me.

Hakeem Oluseyi:

Oh, they all do. They all sound like you.

Hany Farid:

Oh, and, by the way, not only can-

Hakeem Oluseyi:

Let me tell you what has gotten me recently is I'll get these social media announcements, "Oh, there's a new song by Tupac and Eminem," and I started listening to it and, halfway, I'm like, "No. This is AI," but at the beginning they gave-

Hany Farid:

It's coming from music. It's coming from music as well, by the way. This is one of my favorite videos, by the way. Let me just show this to you.

Video:

If you invite me back next year, almost certainly everything will have changed. The nature of the creation of deepfakes, the risk of deepfakes-

Hany Farid:

That's real?

Video:

... and the detection of deepfakes is changing.

Hany Farid:

That's real. Wait... wait for it.

Video:

[Spanish].

Hany Farid:

I don't speak Spanish.

Hakeem Oluseyi:

And your mouth is doing it.

Hany Farid:

I don't speak Japanese.

Video:

[Japanese]

Hany Farid:

Doesn't it sound like me?

Hakeem Oluseyi:

Yes, it does.

Hany Farid:

I know. So, now, I can do full-blown video, any language. By the way, here's what's really cool about this. Here's a really cool application. I like foreign films a lot, but I can't stand bad lip syncing.

Hakeem Oluseyi:

Yeah, I'm with you on that.

Hany Farid:

It makes me crazy.

Hakeem Oluseyi:

Same.

Hany Farid:

But you don't need it anymore.

Hakeem Oluseyi:

You don't need it.

Hany Farid:

We're now going to make videos in any language you want, and it's going to be perfect.

Hakeem Oluseyi:

How did you do that?

Hany Farid:

This is also commercial software. You upload a video, say that you have permission to do it, and you say, "Please, translate this into Japanese, Korean, Spanish, French, German," anything you want. It's amazing.

Hakeem Oluseyi:

That is nuts. The fact that the mouth changed to voice the words-

Hany Farid:

Yeah. By the way, the way this works, this is really amazing, is you upload a video of you talking. What it does is it takes the audio and transcribes it, so it goes from audio to words, and then it translates from English to Spanish, and then it synthesizes a new audio in Spanish, and then it puts that audio back into the video. Every one of those is an AI system, by the way, and it does that in about three minutes, and it's amazing. So if you wanted to take this podcast and distribute it in Spanish, French, German, just upload it and you're done.

Hakeem Oluseyi:

I'm just hitting India, China, Southeast Asia-

Hany Farid:

Two and a half billion people, done.

Hakeem Oluseyi:

10 cents each, we're good to go.