AI Clones Your Voice After Listening for 5 Seconds (2018)

(google.github.io)

911 points | by lukeplato 1637 days ago

74 comments

  • arkades 1637 days ago
    I once got a call while I was lecturing some students. It was repeated three times in three minutes - I assumed it was an emergency and stepped out.

    I was greeted by someone explaining that my father had caused a car accident, and they were calling on his behalf. That someone would need to send over some money for repairs or they’d call the police.

    Sure.

    They added that their cousin, the driver, is a parolee now holding my father at gunpoint. That if I don’t send them money to make them whole, they’ll kill my father.

    This was super fishy, you know? But still, with things like “life of a loved one” at stake, it’s hard to call a bluff.

    I can only imagine what I’d have done if I’d heard my fathers voice pleading for help. They might have been able to get any amount of money out of me.

    Well, if my father hadn’t passed away nine months prior. They were not delighted to hear that.

    • inetknght 1637 days ago
      I live in Houston. I recently read a Houston Chronicle article describing very similar scenario. I don't have the exact link (it was from 2019), but here's one from 2013 [0].

      Combined with the inability to verify the actual phone number displayed on caller ID has led me to tell all of my family to not ever accept a phone call from a number they don't recognize. There's literally zero trust in our phone system upon which we've built our modern economy.

      Unfortunately that's not possible for everyone. Some people are legally required to answer the phone, always, even for numbers they don't recognize.

      [0]: https://www.chron.com/news/houston-texas/houston/article/Hou...

      • losteric 1637 days ago
        Given caller ID spoofing, they really shouldn't even accept calls from numbers they do recognize... especially with tech like this. Let it go to voice mail then return the call afterwards.
        • inetknght 1637 days ago
          I wholeheartedly agree. I don't answer the phone for unknown numbers unless I'm expecting a call from an unknown number; the expectation will have been set up via prior correspondence.

          Unfortunately, not everyone can do that. Some people are legally required to answer the phone, even if they don't recognize the number. And unfortunately many businesses only communicate via the phone system.

          So, unfortunately, our entire country is built upon a system in which we're told to implicitly trust but doesn't have any capability for us to verify.

          • LargoLasskhyfv 1637 days ago
            Why are there people who are legally required to answer the phone? On what grounds? Why? (Assuming private persons here)
            • inetknght 1637 days ago
              For example: people who are entangled in the court systems are required to answer their phone. Even if they're not convicted and are out on bail, they still must answer the phone -- it could be their bail bondsman. If someone's on parole, they must answer their parole officer. So as part of the bond contract and the parole contract, you must answer the phone.
        • jiveturkey 1637 days ago
          to someone that will likewise not answer any calls!
          • Tepix 1637 days ago
            Chances are whoever just called you will answer the phone when you call back rightaway.

            It reminds me of port knocking.

        • drdec 1637 days ago
          So what's the end game there - using voicemail as a poor man's texting?
          • rstuart4133 1636 days ago
            The end game is everyone gets the shits, and there is a noticeable drop in the usage of the old POTS (plain old telephone system) network. People use Apple Facetime, Google Duo or whatever instead. Then the telco's start to notice they are loosing customers.

            At that point one of two things happen. One is that telco's fix their networks. The second thing is they decide it isn't worth the effort, and let the traditional phone system die. Given phone calls are effectively free so there is stuff all revenue in them, I bet it's the latter.

            If that happens it will be painful. Like it is with messaging now, but even more so. Messaging now is either SMS with it's limitations (like you can't use it from a computer), or a choice of a zillion walled gardens - Apple, Hangouts, Slack, Signal, Viber, Telegram, WhatsApp, Facebook, ... most of which I don't have installed so I can't communicate with someone using them. The voice equivalents are Facetime, Duo, Viber, Signal - many of the same things in fact. The result will worse than messaging - the ability to communicate universally with anyone dies, but with no SMS fallback.

            But that's not the end point. Universal communication is just too useful to be dispensed with - as the explosion of internet and the postal system before that have shown. So something will replace it, and once again we will all be able communicate with anyone we please.

            However, the replacement has to solve the parasite problem. Once the cost of sending a message drops below a certain point every universal system we've had so far has been overrun with parasites, aka spammers. The postal system has junk mail, email has it's spam, now the phone system, and of course SMS.

            A solution may be to allow the recipient to charge the sender any amount they like for successful delivery of a message. Most people would allow friends to send for free, messages from unknown recipients to cost something, messages from spammers cost more.

            That could happen with the existing phone system of course, but I'd lay log odds the incumbents have too much in common with the dinosaurs for it to even cross their minds. Sadly that means we are in for a very painful transition period. In fact they are already losing customers as people stop using land lines in droves, so I'd say the writing is on the wall.

          • dredmorbius 1636 days ago
            Loss of trust in PSTN, generally. I'd suggested this a few days ago in a similar discussion, bolstered by a recently-discovered quote from an industry engineer:

            https://news.ycombinator.com/item?id=21494300

            [S]ince mid-2015, a consortium of engineers from phone carriers and others in the telecom industry have worked on a way to [stop call-spoofing], worried that spam phone calls could eventually endanger the whole system. “We’re getting to the point where nobody trusts the phone network,” says Jim McEachern, principal technologist at the Alliance for Telecommunications Industry Solutions (ATIS.) “When they stop trusting the phone network, they stop using it.”

            https://nymag.com/intelligencer/2018/05/how-to-stop-spam-rob...

            At the point at which individuals and businesses in sufficient numbers find the downsides of participating in the PSTN exceed the benefits, they'll start defecting to other systems. Likely small and closed networks initially.

            It took decades for the telephone to become established as the principle means of business communication, and as it was, numerous other alternatives existed in parallel: postal mail, telegraph, telex (for what we'd now call b2b communications), fax, and early email systems.

            Email seems to be dying along with telephony, and for much the same reasons.

            It's occurred to me that much the value in social networks is in trying to corner a sufficiently large directory (that is, user base) to be able to credibly take on telephony. What seems to happen is that as these networks grow in size, they too fall prey to the hygiene factors already affecting telephone and email comms: spam and annoyance messages, with concommitant trust issues in the network as a whole.

            Whether a technical solution to the trust and identity problem can emerge (and preserve privacy and protect against the surveillance state, surveillance capitalism, and surveillance by other actors (organised crime, racist or facist oppressors, stalkers, etc), remains to be seen. I'm starting to think that's a hard, possibly an impossible, problem. An essay of Herbert Simon's I've recently turned up is exceptionally discouraging owing to a critical error Simon made in it (claiming Nazi Germany committed it atrocities without the benefit of mechanical data processing -- it in fact had ample assistance willingly provided by IBM).

            More generally, I'm suspecting that progress in information technology and communications capabilities reduce trust relationships, with some fairly strong historical evidence.

            (Overall risks may be reduced, but the mechanisms by which this occurs replaces actual trust with validation, verification, and surveillance mechanisms).

      • krustyburger 1637 days ago
        I’ll bite.

        Who is legally required to do that? Are they not allowed to sleep or be otherwise indisposed?

        • gt2 1637 days ago
          One I can think of-- people whose job requires it of them. Risk could still be to them personally if someone gets their number/extension/transferred to them. But risk could be on the business as well, which could just as easily be targeted by scams like this.
    • headcanon 1637 days ago
      First off, Awesome story.

      I have a friend who had something similar happen, he got a frantic call from his grandmother who learned via a scam call that he was in jail across the country and needed bail money. This was a few years ago, so they couldn't have used a duplicate of his voice, but possible they were relying on imperfect memory.

      Sweeping generalization, but elderly are and would likely prime targets of this kind of scam in the future since they likely have funds and are less likely to be educated in the state of the art for this kind of tech, not to mention a protective instinct.

      • pontifier 1635 days ago
        I received a call with a human on the other end. When I said hello, the person said in a friendly tone "Grandpa!" And tried to start talking to me.

        That strategy probably works some percentage of the time.

    • toxicFork 1637 days ago
      How do you even prepare for something like that... Do we need to assign identifying keywords to each other when we leave home so we know we are really ourselves? Like a vocal pgp?
      • robbiemitchell 1637 days ago
        I told my wife that if I ever mention <redacted> while on a phone call, she should know that I am in trouble an unable to speak freely.

        Sound like we'll all need more things like this eventually :(

        • elefantastisch 1637 days ago
          It would make sense to have another word that indicates it is genuinely you and you are genuinely speaking freely.
          • dredmorbius 1636 days ago
            If that's the default situation (a likely scenario for most people), you'd need something other than a single code word.

            In practice, most people can conduct a reasonable verification through a series of challenge/response interactions based on shared knowledge, should they need to do so. Mentioning something done, said, or shared in private recently would suffice in many instances.

            For more robust tradecraft, should you need it, a set of one-time codes (passwords or passphrases) might substitute.

            When the former head of InterPol was arrested in China, he managed to alert his wife through the use of a duress signal, an image of a knife:

            https://www.nbcnews.com/news/world/wife-missing-interpol-pre...

            Not subtle, but effective.

            Spoiler:

            In the film Capricorn One (1978), one of the astronauts alerted his wife by referring to a holiday they'd recently taken together, by mis-stating the destination as Disneyland, rather than Hollywood -- the land of make believe -- as it had actually been, which led to the revealing of the hoax mission.

          • baroffoos 1637 days ago
            That would solve the problem of having to find a word that you would never normally use but could slip in to a sentence normally.
          • Double_a_92 1637 days ago
            If you can speak freely you don't need an extra codeword to explain that you are using the codeword in it's real meaning. Unless maybe you suspect that somebody is listening to you and might learn your codeword from that.
            • elefantastisch 1637 days ago
              The codeword is to make it clear your words should be taken 100% seriously without considering the risk you are being coerced / spoofed with AI. If I agree on a word in advance with someone that no one could possibly guess and insert into an attempt to coerce / spoof my voice, then if there is truly an emergency in which I need this person to wire money to a random account, they will actually do it because they will know my request is genuine.

              If I'm being coerced, I could have a codeword to indicate that. If I'm being spoofed with AI, I'm not in control of "my" words, so I can't. I need instead to prove when I'm not being spoofed with AI. That's the purpose of this second codeword.

        • animal531 1637 days ago
          I invented two code phrases for pretty much exactly the same reason, but in case I ever met myself from the future.
        • r00fus 1637 days ago
          While it's a great idea, how do you test this? Is it worth the time?

          All my loved ones are on my iCloud so I would just ping their phone/watch while confirming location and asking the assailant to let you hear the phone ping on the line.

          • bjelkeman-again 1637 days ago
            Find My Friend is so unreliable for me it is nearly worthless. Apple isn’t really delivering what I need.
            • hawaiianbrah 1637 days ago
              Really? How so? I use it probably almost daily and haven’t had an issue, at least I don’t think I have!
              • gt2 1637 days ago
                I've seen it fail to update the location pretty often.
        • jiveturkey 1637 days ago
          how do you pronounce the < > symbols? i mean, 'redacted' is already a pretty strange thing to say by itself.
          • Shorel 1637 days ago
            You just make static noises, like a radio signal being lost.

            Also, you are lacking in the abstract thought department. Get that fixed, for your own benefit.

            • throw1234651234 1636 days ago
              It's either a knowledge gap...or he is hopeless if he knew the symbol but didn't pick up on it. Nothing I know of can improve that.
            • protomolecule 1635 days ago
              I'd bet jiveturkey was joking.
          • seanhunter 1636 days ago
            They are pronounced "wakka" and "wakka" respectively.
          • friendlybus 1637 days ago
            less than redacted greater than, though it's clear he's keeping the real word a secret.
          • Double_a_92 1637 days ago
            <...> is a common way of indicating a placeholder.
      • lm28469 1637 days ago
        > How do you even prepare for something like that.

        You don't because it statistically never happens. Just like you don't prepare for a plane crash or a lightning bolt striking you.

        • Smoosh 1636 days ago
          Yet, so many people have made plans for what they will do when they win the lottery.
        • ohithereyou 1636 days ago
          I prepare for the astronomically improbable chance of a plane crash or lightning bolt striking me by having life insurance, so why such a reaction to someone asking this question?
      • LinuxBender 1636 days ago
        Yes. My mom and I had phrases for duress. Since she has long since passed away, I can share ours. "I love you". Sad, right?
      • bobloblaw45 1636 days ago
        I'm broke so it makes things a lot easier.
    • ehsankia 1637 days ago
      It was a real rollercoaster reading this comment! Also now, you have to also be worried about talking back being a smartass, because now they will record YOUR voice and use it to contact another loved one...
      • LinuxBender 1636 days ago
        In my opinion, all phones should be set to not ring unless the number is in the address book in a specific category. Mine won't make a noise. If it's important, they will leave a message.
    • jancsika 1637 days ago
      > I can only imagine what I’d have done if I’d heard my fathers voice pleading for help.

      Hm... that certainly gives me pause, and my first reaction was to be very afraid.

      On the other hand-- it still doesn't hold a candle to pyramid scheme sales techniques. I mean in a lot of cases those involve your actual loved ones betraying your trust and love in order to sell merchandise for a third party. Yet somehow in the face of a rising tide of those we still have functioning communities in the U.S.

      • bonestamp2 1637 days ago
        At least in a MLM sales pitch, nobody thinks that one of you is about to die. For that reason, I think the scam call is far worse than any MLM sales pitch.

        Side note, it's not accurate to refer to MLMs as pyramid schemes... Even though they're legal, MLMs are worse than actual pyramid schemes (and I don't think most people know what an actual pyramid scheme is, which is unfortunate because they're fascinating).

        https://simple.wikipedia.org/wiki/Pyramid_scheme

      • firethief 1637 days ago
        It's harder for Eve to get your loved ones to betray you than to speak for 5 seconds.
    • rolltiide 1637 days ago
      I know someone that had a fling abroad and their fling began asking for money for treatment over facebook

      The American assumed it was a scam and the person did die

      I have often found that truth is stranger than fiction, and people are too conditioned for fiction that they can’t perceive truth

      • sethammons 1632 days ago
        Wait, what? Did you miss a word or a sentence? What is "treatment" and how did this escalate to death? Was it a medical treatment and they were ill and surcumbed to the illness?
    • throw1234651234 1637 days ago
      "They were not delighted to hear that." - I used to do that, be a smartass. Then I realized at worst they get more info, at best, you are training them and wasting your time.

      Then there is that whole thing where they are getting your voice.

      • r00fus 1637 days ago
        POTS just needs to die unless they allow for authenticated CID.

        One could wonder if this was some sort of conspiracy to break one of the most successful protocols in the world (or at least not update it so it dies by neglect) to increase profits by other means.

        • zaroth 1637 days ago
          I’m convinced that POTS is already a dead man walking. The younger generation has no interest in talking on the phone in the first place, and with the amount of spam calls my phone number is absolutely the worst part of my phone.

          Texting and apps is a much more pleasant way to interact with someone, and bonus of no hold times and much can be automated.

          I think business texting is an upcoming startup unicorn which will be another “trivial idea packaged properly into a billion dollar product”.

          • Benjammer 1637 days ago
            >business texting

            You mean Slack?

            • zaroth 1637 days ago
              B2C - not team internal chats.

              It will have “jumped the shark” once there’s an SMS button on the business listing that Google Maps displays.

              Instead of the wonky/creepy Google demo which did speech to text and then analysis and then text to speech relayed over to the business, every business will just communicate directly with customers over text.

              It’s not that this isn’t already done (to some extent). And more so in some countries outside of the US.

              But I have no doubt it will become the primary/preferred way to connect with any business, to the point where you will text with an 800- number long before it would occur to you to dial an 800- number to get service.

              Like for example, the warranty claim I just made on my Dyson handheld vacuum for a battery replacement. Search for “Dyson warranty claim” and they tell you to dial their 800- number. Now their phone helpline is absolutely the best of the best, but even still most people would [will eventually] prefer to interact via text.

              Another example, making a reservation directly with a restaurant (which I prefer to do versus using OpenTable which will take a cut for doing nothing), is a perfect usecase for texting. Also ordering take-out if you already have a favorite order saved, obviously all the notification type things which make sense over SMS instead of email, making an appointment when a dedicated app is too much overhead, etc. etc.

              • BigJono 1637 days ago
                You're way off on this one. A lot of those use cases don't work because texting is an asynchronous communication channel unless it's got some sort of automated system behind it. The reason you can't order takeout or make a reservation through text is because you come to an impasse if the person on the other end gets distracted and doesn't respond. The value in something like UberEats or OpenTable isn't in message passing, it's in state management. With UberEats if a restaurant closes suddenly or doesn't take your order within a timeout period, both you and the restaurant are notified and the state of your communication is updated, so there is no confusion. If you text or slack or whatever your order to a restaurant, and the person on the phone doesn't respond, you're fucked. How do you know whether it's safe to place an order somewhere else or not?

                Sure, every restaurant could build their own automated system that texts you back and manages the communication, but that's never going to happen when there's already a managed, standardized service available.

                • cataphract 1637 days ago
                  You might think that, but when I was in Brazil, ordering food by WhatsApp was commonplace. The restaurants would, generally speaking, answer very quickly. Some would send you the daily menu every morning.
                  • spats1990 1637 days ago
                    That is so nice! And I thought I had it made a few years ago when I lived two buildings down and three floors up from the best Indian restaurant in town and would call them to place the order then go downstairs to pick it up twenty minutes later. Calling. On the phone. Pfft.

                    The daily menu thing is especially endearing.

              • Benjammer 1637 days ago
                What is stopping this text-only paradigm shift from happening? What developments are needed before this happens? Why hasn't it happened already?

                Twilio has given the ability to programmatically text anyone for years. Why hasn't this hypothetical B2C text business developed yet?

                • fasturdotcom 1637 days ago
                  The work involved in setting up a server/dashboard for twilio to work is too high for it to be popular for mass independent businesses.

                  This hints that a "shopify for twilio" would be popular

            • tomrod 1637 days ago
              Mobile slack is a terrible UX for short term engagement.
            • Pxtl 1637 days ago
              Or BlackBerry?
        • thedaemon 1637 days ago
          POTS is pretty much dead almost everywhere in the USA, most are VOiP these days. They are not replacing the copper wire.
    • x2f10 1637 days ago
      How were you to send money?
    • mrtweetyhack 1637 days ago
      What's your number?
  • AbbasHaiderAli 1637 days ago
    Wow, impressive results! Already a few examples in the comments of what bad actors could do this tech. I wanted to share an example of something good.

    I lost my dad about 6 years ago after a Stage 4 cancer diagnosis and a 3 month rapid diagnosis. I have some, but not a lot of video content of him from over the years. My mom still misses him terribly so for her 60th birthday I tried to splice together an audio message and greeting from her saying what I thought he would have said.

    The work was rough and nowhere near what this Google project could produce. She listens to that poor facsimile every year for her birthday. It's therapeutic for her. With some limits for her mental health of course, I'm sure she would love to hear my dad again with this level of fidelity.

    And so would I.

    • drbawb 1637 days ago
      When I was doing a computer repair: I remember a woman coming in with a digital answering machine; the kind that stored its recordings in volatile flash. During a thunderstorm the night prior the machine lost power, and subsequently lost all the stored recordings. As it happens some of those lost recordings included messages from the woman's late mother.

      That moment has stuck with me for many, many years. The heartbreak on her face, combined with my own frustration of knowing that no amount of luck (or skill) will ever be able to flip the bits of that flash chip back to a permutation which contains samples of her loved one's voice.

      Fast forward to the present, my own grandmother passed away shortly after the start of 2019. I was able to salvage some of the many voicemails she had left me over the years, despite having had probably five or six cellphones during that period. Why? I used Google Voice, which is part of their Google Takeout data exfiltration program. I was able to download all those voicemails as MP3s, neatly categorized by caller. My grandma was very terse, so most of them are exactly the same: "Robert, can you please call me?", but in spite of that each one is unique and precious to me. A lot of developers think about getting data into their platform, but it seems to me that not as many think about users getting their data, sometimes precious & irreplaceable, back out of the platform.

      • therealfitz 1637 days ago
        Thanks so much for sharing this story--this never even occurred to us when we created Google Takeout back in the day!

        -Fitz

      • airstrike 1637 days ago
        My pet project I will likely never have the resources to work on would be AI-generated 3D virtual environments based on old photos / videos that you could navigate in VR and relive long lost memories

        I'd pay a good amount of money to be able to relive certain experiences from my childhood with that level of immersion

    • pmoriarty 1637 days ago
      Philip K Dick wrote about people going to commune with artificial personality constructs of their deceased loved ones.

      Unfortunately, it's been a long time since I read it, so I don't remember which book it was in. Maybe someone who's read him more recently can remember.

      Update: Apparently, lots of other people wrote about this too, but PKD wrote about this before any of the ones mentioned so far, as he wrote about this in the 1950's or 60's. I'm not sure if he was the absolute first, however. So if anyone knows of any earlier references, it would be interesting to learn about them.

      • ErikAugust 1637 days ago
        Ubik? Though that is not about artificial personality constructs, it's about communicating with loved ones in half-dead states.
        • pmoriarty 1637 days ago
          Yes, Ubik has half-life states.

          But I'm thinking of a different PKD book where there were actual artificial personality constructs instead.

        • iainmerrick 1637 days ago
          This idea crops up in a few of his novels and stories, but I think it’s most fleshed-out in Ubik, yeah.
          • xkcd-sucks 1637 days ago
            Under the hood they are all about religious Gnosticism and the physical universe as a false facade to the "true" universe. VALIS is a pretty good explication as well as a really good book; if you are into mental illness+theology, only then is his Exegesis a good read
            • iainmerrick 1625 days ago
              Very belatedly, yes, VALIS is strange and wonderful.

              This is making me want to re-read some PKD!

      • blancNoir 1637 days ago
        Check out the movie with Jon Hamm, Marjorie Prime, screenplay by Jordan Harrison. Without spoiling too much, there's a company that can create holographic projections of loved ones which a woman's family gives to her as a gift which is a hologram of her deceased husband, but when he was a young man. The interesting part, narratively, is that while the holograms are near perfect physical recreations, their personalities and memories must be trained by those who knew them, family/friends which raises the question of how we're perceived in fragmentary and contradictory pieces depending on whose doing the training and the amalgamation of a person that's ultimately constructed from these parallax accounts. The writing is actually quite strong and the only scifi aspect is the holograms so I wouldn't say there's much of scifi crutch. I know it's not PKD and there are similar Black Mirror episodes, but I thought the drama itself was robust and displayed the range of Jon Hamm to be someone other than Don Draper.
        • Caspy7 1636 days ago
          This movie is available on Amazon Prime.

          It's not bad, but my recommendation is to go into it with the expectation of a Black Mirror episode rather than something you might pay to see in the cinema.

      • blowski 1637 days ago
        There was also a Black Mirror episode.
        • ben_w 1637 days ago
          It’s also present in the Revelation Space series, Neuromancer, Red Dwarf, and Star Trek.
      • sabalaba 1637 days ago
        Don’t know about the Philip K Dick work but William Gibson has this in Neuromancer.
    • ada1981 1637 days ago
      When I interviewed Ray Kurzweil we talked about the obvious-in-hindsight insight that his life’s work was essentially trying to build an AI to bring his father back to life.
      • azinman2 1637 days ago
        Except no matter how good it is, it will only reinforce that it’s not real and that he’s gone. Perhaps that’s the therapy he needs to move on?

        Note: this is different from listening to recordings from the actual person.

        Having loved ones die is one of life’s universal terrible qualities.

        • AbbasHaiderAli 1637 days ago
          I think it would be cathartic to talk to someone you trusted but who is now gone. There's been decision points over the last few years where I would love to have just said my thoughts out loud to my dad and just have him nod and ask a couple of open ended questions so I could get it out. No specific guidance needed, just his particular style of listening.

          Clearly losing someone and being able to deal with it is an important life skill but just as we build technology powered aids for other situations, I don't think this would be any different

          • pmoriarty 1637 days ago
            "I think it would be cathartic to talk to someone you trusted but who is now gone."

            It would be cathartic, but in this case you wouldn't be talking to them but to a computer, who (at best) is pretending to be them.

            I think it's kind of creepy, when you really think about it, and it reminds me of the aversion the creator of Eliza had to his creation when he found out that people were spilling their guts out to it and treating it as a real person.

            Which isn't to say that it can't be helpful to talk to something that's not a real person (and especially not a formerly living person you once knew) can't be healing. But if people get confused by these machines in to thinking the machines are actually people close to them that died and are now living again, that will make them vulnerable to some really serious manipulation and delusion.

        • bsanr2 1637 days ago
          I think a lot of people's passions are driven by a hole in their heart, that they hope their work will help fill somehow. I suspect that no small amount of the enthusiasm for XR is due to a deep and abiding desire to be someone else, somewhere else, among the people developing or early-adopting for it. Of course, it doesn't have to be so high-tech; much non-profit or social work is prompted personal experience with the presence, or lack thereof, of the service being rendered.

          In the end, I don't know if any of that works. But what's being subscribed doesn't seem too far outside the norm. Deprivation often leads to desperation for even a taste, however imperfect it may be.

    • mottiden 1637 days ago
      I'm deeply sorry for your loss. Thanks for sharing your story.
    • cayblood 1637 days ago
      Beautiful. Thanks for sharing. It's good to point out the positive potential uses as well as the negative.
    • ehsankia 1637 days ago
      This podcast explores how similar tech is used to give voice back to people who have lost it due to voice impairment. Basically allowing all the people using machines that sound like the classic Hawking computer voice to have their own voice instead.

      https://www.npr.org/2019/07/15/741827437/finding-your-voice-...

    • mcdoh 1637 days ago
      It could also be used for recreating voices for people that have lost theirs, like Roger Ebert. I think he benefited by having so much of his voice already recorded, this would make it much easier for regular people.

      https://www.youtube.com/watch?v=hMyxgSLESz8

    • daenz 1637 days ago
      I'm imagining Siri with the voice of your partner.
      • AbbasHaiderAli 1637 days ago
        Either I would wind up being extra nice to a digital assistant or being curt with my partner :)
      • martin-adams 1637 days ago
        Yeah, you probably don't want to mixing up some habits of how you talk to a digital assistant and the person you love.
        • daenz 1637 days ago
          My partner and I frequently ask each other things, to which the response is "I bet you could google that." Seems fitting :)
    • Aaronstotle 1637 days ago
      Sorry for your loss, thank you for providing an optimistic example.
    • nradov 1637 days ago
      That was part of the original idea behind the Infibond start-up in Israel. Not clear how real it ever was.

      https://sifted.eu/articles/infibond-investigation-israeli-st...

    • etskinner 1637 days ago
      Would be interesting to have a community where people looking to be comforted by their loved one's voice would post whatever snippet of recording they have, then others would listen and see if they know someone who has a similar voice and have them record a message.
    • xfkechyk 1637 days ago
      Reminds me of the LifeAfter podcast: https://www.stitcher.com/podcast/panoply/the-message
    • dejj 1637 days ago
      Seems like,

      > This is just

      >> All can be returned, all can be taken away

      > with extra steps.

  • ihm 1637 days ago
    Reading this headline I begin to understand certain people’s worry about having their soul stolen upon being photographed.
    • nwsm 1637 days ago
      Images of us can be sourced from any number of places: social media, government surveillance, private surveillance. Video less so but from the same sources. Audio from phone companies, VoIP services, surveillance, etc. Health data easily from a number of private companies if you use new-age "health" services, or less easily (illegally) from health records.

      Maybe we can find solace in the fact that is or will soon be infeasible to avoid, so we needn't try to avoid it.

      • ben_w 1637 days ago
        “Don’t worry, just about anyone can steal your soul and there’s nothing you can practically do to stop it”

        That doesn’t seem like a message of solace to me.

        • baroffoos 1637 days ago
          The message is "Just about anyone could replicate your voice, its value in authentication is about as trustworthy as writing your name at the bottom of a letter"
        • sundarurfriend 1637 days ago
          >> Maybe we can find solace in the fact that is or will soon be infeasible to avoid, so we needn't try to avoid it.

          It's the meager solace of the absolution of personal responsibility - there's no way to avoid it, so at least no one can say "why did you allow that to happen to you".

        • aidenn0 1637 days ago
          I am okay with the events that are unfolding currently.

          http://gunshowcomic.com/648

        • nsomaru 1637 days ago
          “When remedies are over griefs are ended”
    • jacquesm 1637 days ago
      I had that exact same thought the other day in respect to biometric data from photographs.
    • code_code 1637 days ago
      Even decades ago I gave such reported concern with photographs more credence than the typical western account of it. I also wondered if the translation was precise enough -- could it (in some cases) have reflected a concern with "essence" more generally? Even without reference to a soul the concern can be a bit immaterial.
    • Dramatize 1637 days ago
      I'm looking forward to moving past just talking about the AI and concentrate on the new products the tech enables.

      I guess it's similar to how most photos are a means to an end now, rather than the final product. ie satellite imaging or Instagram.

    • jmann99999 1637 days ago
      You mean the abo-diginals? :-)
  • Lucadg 1637 days ago
    At first glance there seems to be more more malicious uses than good ones with voice. Yes, hearing someone dear to me say things he/she never said maybe comforting. Anything else?

    Maybe some movies with the deceased actor's voice?

    But what if someone who wants to hurt me sends me files (or phone calls) from the deceased person saying horrible things like:

    - "I am still alive but left as I was tired of you"

    - "oh Jan, I love you" [fake phone call from the past, where Jan is a lover which never existed]

    or even from alive people:

    - "I am leaving you"

    - or my live voice saying stuff which gets me fired or in prison.

    We will never be able to believe voice again...how will we adapt?

    • Defenestresque 1637 days ago
      People had the exact same concerns when digital image manipulation software became popular, including the "we will never be able to trust an image again" question.

      To answer your question, I think the biggest step we took in adapting to the ever-present risk that an image may have been manipulated is acknowledging that it's possible. As soon as people knew that something could be faked, they realized that having a purported photograph wasn't irrefutable proof that it happened and learned to ask for corroboration before making assumptions.

      I think we'll learn to deal with this new development too.

      • Loughla 1637 days ago
        Luckily now people don't ever believe photographs as sole evidence in the court of public opinion, and always corroborate that evidence.

        Wait. No. I had that backward.

        How long has photo manipulation been around? And people still fall for it every minute of every day.

        I have zero faith these tech developments will lead to anything good, or that we'll even learn how to deal with them effectively.

        • baroffoos 1637 days ago
          I don't think its so much that they fall for it but that people create fake images that confirm what people already believe and the viewers deeply want the image to be true and will not think critically.

          Verifying an image is not impossible. You just have to consider:

          * Who took the image

          * Do they have any reasons for wanting to fake it?

          * Was anyone else able to verify they saw or took a photo of the same thing

          * Is anyone in disagreement with the content of the image

          We don't need image manipulation to fool people on facebook. Recently a random image of a park full of rubbish was used with the caption that this was the result of a recent environmental protest but the image was actually months old from a totally unrelated event. People believed and shared it because they wanted to. You could just as easily write a text post saying you saw a bunch of rubbish after the event and you would have almost the same effect.

        • hanspeter 1637 days ago
          > How long has photo manipulation been around? And people still fall for it every minute of every day.

          People fall for headlines every minute of every day.

        • Lucadg 1637 days ago
          Yes. Tech is probably developing faster than our ability to adapt.
        • vunie 1637 days ago
          These tech developments will happen regardless of you stance on their morality. Either adapt like everyone else or stay behind. Your choice.
      • kleim 1636 days ago
        "acknowledging that it's possible" Even then, once the poison of suspicion has been delivered the harm is done.
      • x220 1637 days ago
        Facebook users buy into doctored photos all the time. It's part of this little known phenomenon called "fake news". Video deepfakes are still time consuming and difficult to develop.
    • danaris 1637 days ago
      It doesn't have to be about creating audio of specific text with <specific person>'s voice; it can be much more about creating audio of specific text with a wide variety of believable people's voices.

      I could see this, if it becomes commercially viable, potentially being a huge boon to indie game creation, for instance, since hiring a load of voice actors to record the dialogue for an entire game is vastly more expensive than, say, hiring a bunch of different people to record their voices for 5 seconds—or even, if this ever took off, buying a bunch of samples pre-recorded (or networks pre-trained) for the purpose.

      • Dramatize 1637 days ago
        This 100%. It's not about impersonating someone, it's about providing natural sounding text-to-speech for games, film, robots etc.

        We're currently working on using voice AI to create real products over at https://replicastudios.com

      • Lucadg 1637 days ago
        Again this is a good use scenario which seems to bring less benefit than the damage a bad use case scenario.

        Cheaper games vs distressing phone calls in this case.

        I'm open to better use cases but for now I haven't heard any.

        • danaris 1637 days ago
          I mean...it's not like we're getting a choice here. The technology is possible, therefore it will happen (and is now starting to happen).

          Like so much else in life, we have to take the bad with the good, but not looking for good in it doesn't mean we're less likely to get the bad.

    • a_f 1637 days ago
      Maybe a little less insidious than your examples, my first thought was being able to generate voice actor lines for videogames. Maybe not main characters, but background NPCs and the such. Might make the VA union a little nervous going foward though!
    • willmadden 1637 days ago
      >We will never be able to believe voice again...how will we adapt?

      "Say, Grandma, before I wire the money, what's the name of your cat?"

    • MagnumPIG 1637 days ago
      We can barely trust pictures anymore either.

      It seems we'll be going back to judging the likelihood of one's actions based on one's reputation, for better or for worse.

      There soon won't be such a thing as unreasonable doubt.

      • Lucadg 1637 days ago
        real life interaction is going to be more and more valuable. So, "we won't need to travel again because of videos calls" is not going to happen any time soon, it seems.

        And public/private key validation may become invaluable.

    • anextomp 1637 days ago
      I'm looking forward to an AI assistant that can use my voice to make phone calls

      I'm sure some people with selective mutism that would like to use text to speech with their own voice

      > How will we adapt?

      Digitally signing audio clips

      • Lucadg 1637 days ago
        Ok mutism is a great use case, thanks. Still, the problem is solved also with normal text to speech software
    • shrimp_emoji 1637 days ago
      But these attacks are hilarious.

      Your question, rephrased: if technology can be used in ways I find psychologically hurtful, should IT BE ALLOWED??? xD

  • Lordarminius 1637 days ago
    AI can make decisions, create deep fakes, and now, clone voices.

    It may be that the next big business opportunity lies in creating 'anti-AI' technology just as it did with antiviruses in the 90's and 2000's

    • pcmaffey 1637 days ago
      AI that detects AI seems entirely plausible. And like all “anti” measures, is another arms race (and if I put my scifi hat on, is what may lead to AI self-awareness).
      • BOBOTWINSTON 1637 days ago
        Sort of reminds me of a talk Valve gave about creating an anti-cheat for Counter-Strike using AI. When asked if they were worried about people using AI to create cheats to fool the AI, his answer was essentially that it was an arms-race won by the person with more data/processing power. That person would most likely always be Valve.

        Link to talk: https://www.youtube.com/watch?v=ObhK8lUfIlc

        • hr5eaqhera 1637 days ago
          It's a nice sentiment, but there are popular and easy-to-find cheating projects (not sure if I can name them here) that are still widely used, these projects have been active for years, before that talk was made, and still active today. Based on youtube videos and comments it seems many users are still using these cheats with little issue. And afaik, the one I'm referring to (initials P.I.) doesn't use any machine learning at all.
      • Scarblac 1637 days ago
        In Greg Egan's _Permutation City_ this is mentioned in passing (an arms race between AI video call spam bots and AIs screening the calls for spam while impersonating the owner of the phone).

        The anti-spam software loses because eventually having self-aware AI view spam calls 24/7 was considered torture and they weren't allowed to go that far.

      • brandonjm 1637 days ago
        Typically the AI to detect the AI is exactly what they use to train networks like Deep Fake in the first place. GANs are effectively just a local arms race between two machine learning networks.
    • sangnoir 1637 days ago
      Is it that difficult to create/retrofit an AV container format for cryptographically signed audio and video streams? Key management & revocation could be a pain, but it's something that consumer electronics companies like Apple could do: think of it as the MPEG LA, but with signature checking & non-repudiation.
      • justinjlynn 1637 days ago
        Congratulations, you have now created a class of people who can forge audio and video streams at will, relatively cheaply - and an underclass of people who cannot and may not even be able to record genuine footage they wish to record. This is not a road you want to go down.
        • sangnoir 1637 days ago
          No - I just created the equivalent to https for video; the "underclass" can still create, share and play unsigned videos - those would get low-trust warnings[1] (as they should, just like there is an "underclass" with no cert for their site). This wouldn't take away anything from todays' tech, only adds attestation for person/org behind videos they would like to mark as "official".

          1. (edit) It occurred to me that some people may wish to manage the public keys independent of (say, Apple) and they could distribute via keybase or key-signing parties, so they actually don't have to suffer low-trust warnings. Now that I think of it, instead of merely signing streams, they could be signed and encrypted using recipients PK for 1:1 transmissions. Obviously law enforcement won't be a fan

          • justinjlynn 1637 days ago
            Law enforcement will just coerce the CA system you've suggested to secretly improperly issue certificates. Just as it does with the current CA system. Problem not solved - it's just hidden.

            You conflate content trustworthyness with origin sureity but a CA system doesn't even provide that.

    • kevinwang 1637 days ago
      Like the booming anti Photoshop industry of the early 2000s?
      • Retric 1637 days ago
        10 years ago this stuff was often easy to notice. https://thelede.blogs.nytimes.com/2008/07/10/in-an-iranian-i...

        Being good at Photoshop is really difficult and producing good fakes is extremely time consuming. Today, even with a Hollywood budget, most such effects still look off. However, the industry has gotten much with actor enhancement for example generally going unnoticed.

        Which I think is real issue, this stuff is becoming easier over time. AI could be the tipping point where eventually people just stop trusting images and video. But, that transition is gonna be difficult.

    • hans1729 1637 days ago
      Good luck with that. The best product you'll come up with is some sort of snake oil. The whole point of GANs is that you can't really "detect" the synthesized components anymore. Not that this would/will stop people from claiming otherwise in the spirit of profitablity :-)
      • sdenton4 1637 days ago
        No, GANs train exactly one discriminator, jointly with the generator. There's no guarantee that you can't train another good discriminator out of band.

        Furthermore, GAN discriminators are (as I understand it) often hobbled a bit to ensure that the generator can make progress on the loss function. An always-correct D doesn't provide a useful gradient.

      • kordlessagain 1637 days ago
        GANs may produce imagery or audio which fools humans, but they are unlikely to consistently produce imagery or audio which fools humans over time.
        • candiodari 1637 days ago
          GANs train by fooling AIs, not humans. Fooling humans is a side effect, not the primary thing trained for (mostly because that's cheaper of course). It's just that humans are in some ways different from AIs in terms of fooling.

          Looking at the papers I must say I think the ability to fool humans is a scale problem, not a fundamental limitation. Already GAN produced images and sounds survive "normal" human scrutiny: if you have no reason to suspect foul play you won't see it. If you really go looking, you'll see it.

      • TallGuyShort 1637 days ago
        Snake oil-tier anti-ML might do the trick. One of the problems is the level of confidence people put into ML in the first, when a lot of times it's also snake oil-quality. Just being able to cast doubt on that again would help prevent a loss of healthy skepticism and critical thinking.
    • filoeleven 1637 days ago
      Better provenance is the way to guard against fakes.

      For video and audio, I imagine a combination of hardware signing, perhaps with the camera itself living on an isolated, Secure Enclave-like chip, and sending hashes of (incoming images/video * deviceID * trustedTimestamp) to a blockchain or some other public distributed ledger. Getting the timestamp from a service that keeps its own record adds further security.

      This obviously requires an internet connection, and would likely be useful mostly for news and government agencies, law enforcement. But if the culture is affected enough by deepfakes, I can imagine it becoming more ubiquitous. The parts are all there, it’s a question of utility.

      • jimmydorry 1637 days ago
        If you are trusting a camera to upload to a blockchain, the same can be simulated on a computer, given enough time.
        • filoeleven 1637 days ago
          Sure, that’s the reason for including the 3rd-party timestamp AND for keeping the device signing keys on a separate secure chip. The idea is to say “these images/sounds were recorded at this time by this device,” and to have that statement both registered publicly at the time of creation and backed by the reputation of the device maker.

          It’s acknowledging that using AI to catch AI fakes is a fool’s errand, and relies instead on the premise that hashing a raw data stream is much faster than producing a good fake, and that a secure device key is secure. You’d need both for it to work, otherwise you can generate a deepfake beforehand and get the device to sign a fake stream. That may be easier to do than I think; this is not my area of expertise.

    • rojobuffalo 1637 days ago
      Wouldn't an AI-detector just end up being used like a benchmark for fine-tuning AIs though?
      • corndoge 1637 days ago
        Yep! Adversarial learning. Whoever has the best math and compute wins.
    • andrewfromx 1637 days ago
      ah yes, if u listen carefully to the samples, you can always tell subtle things that make it seem a little off. Maybe if you look at the binary data very carefully, it would be easy to show HUMAN_AUTHENTIC or CREATED_BY_MACHINE and sell this service. Someone have a recording of something you never said? For $99.99 get it checked at AreYouHumanDotCom!
    • b0rsuk 1637 days ago
      The only way to stop a bad guy with an AI is a good guy with an AI...
  • CraneWorm 1637 days ago
    Now we can have audiobooks read by anyone we like!

    They can direct us to our destination!

    They can speak at our funeral, being long dead themselves (as long as there is sufficient training material recorded).

    The future is awesome.

    • thomascgalvin 1637 days ago
      I legitimately think this could be huge for self-published authors. It takes a skilled professional about forty hours of work to produce an audiobook from a novel-length manuscript. Tacotron could do it in minutes.
      • jfengel 1637 days ago
        I don't see that coming soon. The voice is one thing, but the performance goes far, far beyond that. Without understanding the text, you can't get good prosody out of a single sentence, much less developing a character for a whole performance.

        You'd have to "direct" this on a word-by-word basis: "Put the emphasis here. Speed up 10% here. Decrease vocal intensity 25%". You'd end up producing a whole "score", and it would take at least as long as the human actor puts into it.

        Having done that, it would be amusing to switch it from voice to voice, as a party trick. But the result would still be much poorer than you'd get out of an actor. Really solving the work of an actor is strong-AI-complete.

        • animal531 1637 days ago
          What about using a tablet to direct the piece by drawing? You can get values for the intensity, speed and volume (up/down) pretty easily and intuitively.

          Even better if its linked to the voice generation system in real time, then you can save/redo sentences etc. as you go along.

        • thomascgalvin 1637 days ago
          Audio books with genuinely good performances seem rare to me. There are a handful of voice actors that stand out, but many of the titles I've listened to have very flat delivery; the first sample in the original article has as much inflection as they do.
        • jimmydorry 1637 days ago
          Feed in enough "good" audio books, and you would probably get something passable for smaller titles.
        • sitkack 1637 days ago
          AI is separating the talent from the looks across the board. As it is now, one has to both be able to act and look good, but now the AI will enable those who can act, to be re-skinned, literally, into whatever the client needs.
    • ccsnags 1637 days ago
      I did something like this before my grandmother passed. She was a teacher and loved reading books to kids. I recorded her reading Dr Seuss and the Giving Tree to my cousins so I could give my future children a glimpse of that wonderful woman.

      It seems that we aren’t far from being able to take those recordings and spin it into a reading of anything. Fascinating. It’s kind of scary though. Grandma’s voice can read anything. Anything.

    • jakob223 1637 days ago
      The emphasis on a lot of these sentences is all wrong; I wouldn't want to listen to an audiobook by this engine. It's still super impressive/terrifying though.
    • jsnk 1637 days ago
      I am currently working on an audiobook project, called Odiobooks.com. I hope to release something soon.

      If anyone's interested in the project, feel free to contact me at iamjsonkim@gmail.com.

    • taf2 1637 days ago
      Imagine when they can also generate the visuals of the book to show you the book as an auto generated video
  • Riccardo_G 1637 days ago
    What it is doing is not really cloning, but because it was trained on 18k different voices, it actually finds one that is closest to yours, and uses that one. It can do a bit of interpolation to create an embedding which is closer to your own, but only if it is well represented by a mix of other voices. Real voice cloning like at https://replicastudios.com/ can take just a minute or two of audio, and it does a fairly good job, and it is always improving. With more audio you start being able to also play with emotion and styles, which is very cool!
    • JaRail 1637 days ago
      I'm not really sure where you're getting this. It doesn't pick a specific voice from a database to use.

      From their introduction: "Our approach is to decouple speaker modeling from speech synthesis by independently training a speaker-discriminative embedding network that captures the space of speaker characteristics and training a high quality TTS model on a smaller dataset conditioned on the representation learned by the first network."

      Section 2 of the paper explains how it works. Two minute papers also goes through it if you'd prefer a video. Link: https://youtu.be/0sR1rU3gLzQ

      • sillysaurusx 1637 days ago
        They’re saying that underrepresented voices will have trouble being modeled. That matches my experience with this project: for example, I had a very tough time cloning female voices compared to nerdy-sounding / deep male voices.
        • zamubafoo 1636 days ago
          It's more that the sounds produced during the recordings didn't cover the entire spectrum of possible sounds, so the model had to estimate their sound. All you really need is a paragraph which you can have the person read to get sufficient coverage or just enough recordings that it's not an issue anymore.
    • ehsankia 1637 days ago
      Was it 18k voices or samples? Also, is it finding the closest voice, or is it a continuous parameter space formed from the voices?
  • lukeplato 1637 days ago
  • Angostura 1637 days ago
    My bank's (HSBC) telephone banking offers the option do away with a PIN and instead a 'my voice is my password'phrase system.

    I'm glad I never opted in.

    • shostack 1636 days ago
      I recently called Chase and got some message like "your voice may be recorded for verification purposes" or something to that extent. Creeped me out and I don't recall ever opting into that specifically, so I'm guessing it is an opt out buried in some agreement.
    • ropiwqefjnpoa 1637 days ago
      Oh god, Sneakers
      • Angostura 1637 days ago
        Just to be clear - I'm Robert Redford
      • mmazing 1636 days ago
        passPORT

        Also, that whole scenario would have been far less cool if they just recorded the dude for 5 seconds doing anything and pulled a whole CSI-style "put his voice into the visual basic GUI neural network and it works, bro!"

        I love Sneakers.

    • bidluo 1637 days ago
      The Tax Office in Australia does the same thing and push it every time I call, I imagine it branches out to the other gov bodies too. It's fun listening to the whole spiel about 'Like a fingerprint your voice is unique to you'.
    • fasturdotcom 1637 days ago
      unchecked, the progress of this technology and the staleness of banking security might cause entire institutions to fail
  • kleer001 1637 days ago
    Saw this on Minute papers last night and had a discussion with my partner about if we needed a secret password or not to tell if it were really one or the other on the phone. I figured that we had enough shared history that that wouldn't be a problem. Then we realized that there's no such thing as a simulated sense of humor yet and that that would be the best natural encryption to any communication.
    • toxicFork 1637 days ago
      2023: ai identifies your sense of humour after hearing you breathe for 0.7 milliseconds
      • kleer001 1636 days ago
        Ha. There'd need to be so many more Ai breakthroughs to make that happen it'd be a little thing.
    • nighthawk454 1637 days ago
      Here's the link to the Two Minute Papers video if anyone else is curious: https://www.youtube.com/watch?v=0sR1rU3gLzQ
  • undershirt 1637 days ago
    “When you see something that is technically sweet, you go ahead and do it and you argue about what to do about it only after you have had your technical success.”

    —Oppenheimer

    • robertjwebb 1637 days ago
      The thing is, these topics have already been discussed by philosophers! Questions of authenticity, human subjectivity, reproducibility etc are not new. But for the average joe and the non-philosophically-inclined techie, the thing has to actually exist before they start really talking about it.
  • goodmachine 1637 days ago
    The malign applications of this technology greatly outweigh the benign. Discuss.
    • carapace 1637 days ago
      Yes, I think there are almost no legitimate uses for the Farnsworth Device. https://theinfosphere.org/A_Device_That_Makes_Anyone_Sound_L...

      My personal hell: My mother has dementia and a land line telephone.

      Scammers call all the time. All day long. (Although the last few days have been pretty good, I assume somebody somewhere is doing their jobs. The scammers will adapt.) One thing they do is spoof their number to have the same area code and prefix as the one they're calling, so it's like "Oh, is this a neighbor?" or something, but of course it's not. It's an automated machine abusing the telephone network to try to steal money from a little old lady with dementia.

      Evil men with robots are attacking my mom. Another one called while I was writing this post!

      This is a goddamned sci-fi dystopia.

      And now the robo-thieving bastards can imitate my voice!?

      I'm going to have to get her one of those satellite-linked walkie-talkies or something. Thank God she doesn't use the internet.

      • hurrdurr2 1637 days ago
        Back when I worked as a telemarketer in high school a long time ago, we sold paper subscriptions, and usually the people that didn't cuss us out and hang up right away are the lonely old people who just wanted to chat. I lasted two months and had to quit; felt like we were just taking advantage of them.
      • nexuist 1637 days ago
        I wonder if there is a market for proxying your mom's calls to you, allowing you to approve/deny each one before it gets to your mom?

        One consistent trend in HN comments is young people complaining about their parents' naivety / incapability to understand the modern scamming world, and wishing they could install something or use some service to keep them from falling into these expensive traps. I know this is a big reason why people get their grandparents iPads instead of full blown laptops, because laptops are much easier to inadvertently install malware on.

      • robertjwebb 1637 days ago
        Land-line telephones are awful because the majority of people who will pick up the phone during the day on Monday-Friday are old or disabled, i.e. easier to manipulate. I don't think I have received a single legitimate phone call during that time. In fact, legit callers know this and know that if they do call for a good reason, the person at the other end will distrust them.
    • SamBam 1637 days ago
      Con: It's now easier than ever to fake someone saying something outrageous, and have that lie spread across the world long before the truth can get its boots on.

      Pro: Humphrey Bogart can direct you to your destination!

      I admit, it's a hard choice.

    • danShumway 1637 days ago
      I'm currently working by myself on a game that will likely launch without voice acting (text only) because I don't have the money or skill to find and pay voice actors.

      If I could act out the dialog myself and then purchase or generate voices other than my own to overlay on top of those performances, the quality and accessibility of my finished product would go up dramatically.

      That would also open up the door for more people to be able to mod the game and add additional dialog options. A big complication with voice acting is that it's essentially static. Even though a big focus of my game is modability, if I do voice acting no-one else can add additional levels or areas or expand on the characters without breaking the recorded dialog.

      It would be amazing if I could ship some kind of compiler so that modders could record themselves talking through new/changed dialog, and then insert it seamlessly into the game with the correct character's voices.

      • skykooler 1637 days ago
        Exactly. I was working on an Air Traffic Control mod for Kerbal Bpace Program, but the work has gone on hold due to having to find enough people to record all the lines (to have a decent number of airport voices). Being able to record everything once, and only having to find people willing to let me record five seconds of speech rather than a lengthy recording session, would make this much more feasible.
        • smush 1637 days ago
          Hypothetically, if we were interested in donating some voice samples, where would we look to see what lines were needed?
          • skykooler 1637 days ago
            I haven't uploaded the list of lines; I should add that to the github repo.
      • Dramatize 1637 days ago
        Have you tried out Replica? I can hook you up with a beta account to see if it'll help with your voice acting.

        https://replicastudios.com/

        • danShumway 1637 days ago
          This looks really interesting!

          That being said, because mod support is such a huge part of the design of this specific game, I have a policy that I won't use any tools or libraries that aren't either owned by me, that are Open Source, or that are exporting to common, open formats that can be freely read, manipulated and written by Open Source programs.

          If I used a licensed product to generate my voices, I would be in the same position as if I hired a voice actor -- I wouldn't have a tool that I was free to ship with the game that any modder could use to edit or add dialog, or to even create new characters with new voices.

          The few exceptions for proprietary tools I'm willing to tolerate for this specific game are things that generate MIDI output, sounds, fonts, and PNG files. Everything else is either Open Source or completely owned by me. Even for the final assets like mp3 files and fonts, nothing can be licensed, because I want to have full control over when players have the ability to remix and distribute game assets in their mods. I need to know that 20 years from now players will still have access to everything in the game.

          I don't want to derail, so to bring that back around to the current discussion on AI-generated fakes, I believe these kinds of AI techniques should be freely available. A world where AI-fakes are considered so dangerous that only a few select guardians can control them is a world where, to me, this technology stops being useful. I'm not saying Replica is in that position -- I'm just speaking to a broad trend in the conversation around AI.

          I think we'll start to see more calls to have single companies controlling AI under the guise of being able to ban bad actors or prevent abuse. I think that would be a mistake -- if anything, ubiquitous technology makes it easier for society to adapt to that technology. A purely SaaS, licensed model for AI generated faces, voices, and text would be all of the negatives of this technology with none of the positives that come from Open access and creative usage.

          Gatekeeping won't work, we just have to adapt.

    • tgsovlerkhgsel 1637 days ago
      I think there are many benign applications, and definitely a massive potential for abuse. In practice, it will be used mostly for benign applications, I think, but due to the outsized impact, you could still say the malign applications outweigh.

      However, what I found reassuring is that the paper actually addresses these concerns:

      "However, it is also important to note the potential for misuse of this technology, for example impersonating someone's voice without their consent. In order to address safety concerns consistent with principles such as [1], we verify that voices generated by the proposed model can easily be distinguished from real voices"

      This doesn't mean it won't fool humans, especially when used in a carefully crafted setting (low-quality phone call with distressing content).

      • gambler 1637 days ago
        I'm guessing this will just be used as an excuse by Google to prevent this technology from being easily or fully accessible to others.
    • telesilla 1637 days ago
      On a benign level, many VO artists will find themselves out of work now that we can have Don LaFontaine back.

      On more positive outlook, perhaps this, along with deepfakes, propels us faster towards an evidence-based society.

      • rshi 1637 days ago
        On a related note, I can definitely foresee a lot of voice actors having their voices cloned for uses they wouldn't really intend. Seems like a big legal grey area as many countries have personality rights.
    • mistermann 1637 days ago
      Here's an outlandishly optimistic take on the possibilities.....as the internet and media is flooded with increasingly convincing but false representations of reality, a widespread habit of greater skepticism of "the facts" starts to spread throughout society, leading people to alter the speed at which they form new opinions on various issues (possibly degrading confidence in preexisting opinions), and the manner in which they construct their personal mental model of reality. As the frequency with which an individual's brain is fed data inconsistent with directly observable reality increases, might a tipping point be eventually released where it refuses to continue making snap decisions, and instead delays judgement until a later point in time when more information is available?

      Perhaps loosely similar to being forced into a stature of "noting" in meditation: https://www.insightmeditationcenter.org/books-articles/menta...

      • Loughla 1637 days ago
        I doubt that. I think that as we have to have more skepticism of 'facts' we'll see more and wider splintering of viewpoint based on preferred communication/news agency.

        If you KNOW Fox News won't lie to you, just go there, and only there. Everything else is a lie. If you KNOW NPR won't lie to you, just go there, but only go there. Everything else is a lie.

        I think it will only make things worse, because that's the simplest, least 'change' solution for the most people. Society is like water, it always seeks the lowest point.

        • mistermann 1637 days ago
          I imagine, but I feel like there's a point of absurdity where people just aren't buying it anymore. The whole Epstein thing sure seemed to get a lot of coverage and outright bi-partisan mocking just as one example. /r/politics and the "organic" front page of reddit aside, it seems to me the number of people waking up to the possibility that the whole thing is an utter and complete farce is growing.

          > Society is like water, it always seeks the lowest point.

          Let's not forget these things are analogies, not laws. A trend remains in place until it doesn't.

      • bitL 1637 days ago
        Imagine we are actually a product of some advanced civilization bored with its capabilities where nothing seems real anymore, so they constructed us as a reality show to experience something that feels real.
    • clort 1637 days ago
      There is the obvious, using this technology to put words in somebodies mouth. The more nefarious though, is that certain people who lie all the time can easily claim that a lie you recorded them saying wasn't them and now you can't prove otherwise. Certain people can say whatever they want and just deny it later. "Fake News" indeed..
    • AnIdiotOnTheNet 1637 days ago
      Con/Pro, depending on perspective: people will have to give up the illusion that they ever really could definitively tell truth from fiction.
    • TomMckenny 1637 days ago
      Are you asking about cordite or gunpowder in genneral?
  • rshi 1637 days ago
    I wonder what the legal implications of this alongside similar developments like deepfakes are going to be in the next couple years. We're already having fraudsters impersonate CEOs using Deep learning-aided Voice generation[1] due to just how low the barrier of entry is now. There's already a public implementation of the paper out [2]!

    [1]: https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos... [2]: https://github.com/CorentinJ/Real-Time-Voice-Cloning

    • IshKebab 1637 days ago
      CorentinJ's implementation isn't quite as good as Google's - I think with some of Google's samples I couldn't tell that they weren't real, especially over the phone. But I could easily tell with CorentinJ's.

      That seems to be common with open implementations of Google's voice synthesis and speech recognition work. I guess they hold back some of the secret sauce, or can afford to train it more.

    • qznc 1637 days ago
      Currently watching UK: https://mobile.twitter.com/FutureAdvocacy/status/11942824810...

      Sorry for the Twitter link but Future Advocacys website seems to be down.

    • jermaustin1 1637 days ago
      The latest episode of Blacklist had a dark plot based on deep-fakes.
      • fasturdotcom 1637 days ago
        didn't know new season was out! thx!
  • moyix 1637 days ago
    This is from 2018 – does anyone know if there are pretrained models and code for this? I found https://github.com/CorentinJ/Real-Time-Voice-Cloning , but the generated audio quality was much worse than the samples here.
    • JaRail 1637 days ago
      The biggest missing piece is WaveNet, which is Google's proprietary voice-synthesizer. With only the models trained for this paper, the best you could build is a voice-recognizer. As far as I know, Google only allows people to do TTS with one of their provided voices.

      I don't expect them to open it up until other companies/academics have achieved similar results. It's too much of a competitive advantage right now. Alexa, Siri, etc all sound like robots compared to WaveNet (google assistant).

  • grawprog 1637 days ago
    So....I'm going to paste the abstract here because the headline is incredibly misleading and should be changed.

    >Abstract: We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently trained components: (1) a speaker encoder network, trained on a speaker verification task using an independent dataset of noisy speech from thousands of speakers without transcripts, to generate a fixed-dimensional embedding vector from seconds of reference speech from a target speaker; (2) a sequence-to-sequence synthesis network based on Tacotron 2, which generates a mel spectrogram from text, conditioned on the speaker embedding; (3) an auto-regressive WaveNet-based vocoder that converts the mel spectrogram into a sequence of time domain waveform samples. We demonstrate that the proposed model is able to transfer the knowledge of speaker variability learned by the discriminatively-trained speaker encoder to the new task, and is able to synthesize natural speech from speakers that were not seen during training. We quantify the importance of training the speaker encoder on a large and diverse speaker set in order to obtain the best generalization performance. Finally, we show that randomly sampled speaker embeddings can be used to synthesize speech in the voice of novel speakers dissimilar from those used in training, indicating that the model has learned a high quality speaker representation.

    • aerovistae 1637 days ago
      Do you want to elaborate on how the title is misleading? From reading the abstract it seems accurate to me.
      • corobo 1637 days ago
        "AI Clones Your Voice" implies there might be something on the linked page that involves an AI cloning my voice. Maybe a way to record a few phrases, maybe a text to speech that then uses my voice. Something like that.

        This does not do that - only provides pre-rendered samples, kinda disappointing. Impressive, but disappointing.

    • godelzilla 1637 days ago
      Thanks... Too long of a scroll to find somebody posting the actual science behind the click-bait.
      • azinman2 1637 days ago
        Nit: this is more design/engineering than science. There is no hypothesis being tested about how the world works.
        • JaRail 1637 days ago
          Did you read section 3 of the paper where they evaluate their system?

          > We primarily rely on crowdsourced Mean Opinion Score (MOS) evaluations based on subjective listening tests. All our MOS evaluations are aligned to the Absolute Category Rating scale [14], with rating scores from 1 to 5 in 0.5 point increments. We use this framework to evaluate synthesized speech along two dimensions: its naturalness and similarity to real speech from the target speaker.

          They're testing if the generated speech sounds natural with a well-defined and reproducible experiment. That's science.

          • azinman2 1637 days ago
            Evaluation doesn’t make it science.

            There’s no investigation of the physical or natural world going on, unless they really think they’re modeling how humans are able to talk. But they’re not — they’re trying to create a system that works no matter how unnatural it is.

            • JaRail 1637 days ago
              I'll take that as a no.

              > There’s no investigation of the physical or natural world going on

              I just quoted them describing their observational method! Do you just not believe psychology is a science?

              > unless they really think they’re modeling how humans are able to talk

              I've lost you. They're not generating birdsong. What do you think WaveNet does exactly?

        • umeshunni 1637 days ago
          Did you know that there's an entire field called Computer Science - https://en.wikipedia.org/wiki/Computer_science
          • amelius 1637 days ago
            Computer science should have been called "Computer math".
    • lukeplato 1637 days ago
      I took the headline from this Two Minute Papers video: https://www.youtube.com/watch?v=0sR1rU3gLzQ
  • jchook 1637 days ago
    Lyrebird has had similar technology for a few years now.

    https://www.descript.com/lyrebird-ai

    • 0xcafecafe 1637 days ago
      An apt name for the technology considering the marvel of nature that the lyre bird is.

      https://www.youtube.com/watch?v=VjE0Kdfos4Y

    • Conlectus 1637 days ago
      Worth noting that Lyrebird is very rough -- at least last I tried -- and produced extremely robotic sounding (though recognizable) audio.

      This method has much clearer audio, but seems to lack generality / TTS capability.

    • wenc 1637 days ago
      Yes, I remember seeing this a while ago. Nice work by the MILA group in Montreal.
  • hurrdurr2 1637 days ago
    Autocratic regimes can rejoice...they can extract public confessions so much easier now...
    • blacksmith_tb 1637 days ago
      Only ones that care about the appearance of propriety though, presumably most never had to try to produce convincing fakes when their word was already law?
  • gist 1637 days ago
    Will point out that it is cloning after a short sample and with an unknown speaker. So this is great for that type of comparison and in particular when the person listening does not know much or have great experience with the person speaking.

    Now if you were to take something by a well known person (where there is a great deal of audio) it would be much harder to clone anything other than a really short passage.

    This would be similar to faking handwriting. Easier to fake one word than to fake three pages. Easier to fake something where you have little to compare a pattern (less can go wrong).

    Not saying this isn't impressive it is. But it's also a bit of a trick based on the very short clips (both samples and created).

    I would say that a trained person could do a better fake because they could take into account all the info and be less likely to make a mistake.

    Now sure you could manually change the AI as well doing the same thing.

  • sillysaurusx 1637 days ago
    I used this repository to make Half Life’s Dr. Kleiner sing “I am the very model of a modern major general”:

    https://twitter.com/theshawwn/status/1171806394783326208?s=2...

    https://www.youtube.com/watch?v=koU3L7WBz_s

    Then @jonathanfly deepfaked Dr Kleiner’s face onto a live performance of the song, which was hilariously unexpected. The AI twitter scene is awesome:

    https://twitter.com/jonathanfly/status/1171907301231513605?s...

    There is some promising new work in the GitHub issues. For example, someone has been training on ~10,000 additional speakers.

  • nmeofthestate 1637 days ago
    VCTK p240: duplicates the (a) north of England accent well.

    VCTK p260: all over the place accent-wise.

    LibriSpeech: can't really comment on the American examples, but they seem decent.

    • jdbernard 1637 days ago
      Sample 9 is a good example. The pronunciation of "biographer" is consistent even when it should be very different. All of the examples stress the first syllable but an American would stress the second.
  • namaemuta 1637 days ago
    I wonder if this could be use on RPG games in which there are so many texts and dialogs that having a recorded voice for all of them is impracticable.
  • dation 1637 days ago
    Now a politician can deny any sound bite. "They just deepfaked my voice!"
  • snissn 1637 days ago
    I heard a rumor that robot calls were harnessing your voice prints. Not necessarily true currently but an interesting concern
  • joshmarlow 1637 days ago
    There's a potential upside to malicious uses - synthesized voices (and deepfakes) can give victims of revenge porn some plausible deniability. This would hopefully take some of the sting out of that experience.
  • redsymbol 1637 days ago
    Not looking forward to the phishing that will exploit this.

    Going to call my parents today and warn them. If they ever hear something from me that's not adding up, be skeptical, and verify it some other way before taking any action.

  • werds 1637 days ago
    Anybody else notice how that Scottish male reference voice sounds considerably more English in the synthesized versions.
    • VBprogrammer 1637 days ago
      Yeah, maybe it's just having a more sensitive ear to the Scottish accent but that to me was the furthest from the reference by far.
      • luxpir 1637 days ago
        I heard that too; being a Brit (and working in languages) probably helps. It did pick it up occasionally though, which gives hope that increased sampling and training could fix the slight miss there.

        It was that and the Swedish-accented English ("Sentence in Different Voices" section, middle recording) made it struggle. No traces of the Scandi-lilt were left in the synth version.

        Final note would be the French speaker at the bottom of the page seems to be English first language, despite having very good spoken French. Not quite as pure a test of that last part as I'd have liked, despite the ability for the speaker to perhaps read the synthesized version in English back in English. That could be fun.

        • seszett 1637 days ago
          I can't hear any hint of an English accent in the French-language recordings, they just sound like regular Québec French to me.

          However, I'm not convinced at all by these voice transfers across language. I can imagine the second Chinese one being the same speaker in both languages, but not the three others.

          • luxpir 1635 days ago
            That's no Quebecois I've ever heard. Sounds like a Brit who picked it up as a second language in the home or soon after.

            Even struggles to finish the sentence due to the effort of reading in the 2nd to last one. Struggles with an extremely common word, 'grand', as well as stumbling over a simple sentence. To be fair, he has heard enough French (i.e. lived and studied there most likely) to get the intonations mostly right but there are a few other giveaways too... it's just not natural or native from where I'm sitting.

  • epx 1637 days ago
    One thing is, voice over phone is so compressed that it actually took a long while for this kind of voice cloning (and associated frauds) are all over the place.

    We are going to need 2FA over voice communications :)

  • SubiculumCode 1637 days ago
    For those who were unable speak e.g. S. Hawkings, it would have been feasible to have had the computer speaking system use a voice that had sounded like him prior to his condition. Amazing.
  • nsxwolf 1637 days ago
    A staple of science fiction comes to life.
    • maxwell 1637 days ago
      Everyone's really pitching in doing their part to get the T-800 ready within 10 years.
    • teraflop 1637 days ago
      And conversely, Star Trek's frequent use of voiceprint authentication looks sillier by the day.
      • visarga 1637 days ago
        It's even sillier when robots in almost all sci-fi movies have 'robotic voice' and human level intelligence. It's actually much easier to have human voice and 'robot intelligence'. They got it backwards. At least the computer voice in Star-trek and Data's voice were human.
      • toomuchtodo 1637 days ago
        Fun fact: Several financial institutions (Vanguard, Schwab) allow your voiceprint to be an authentication mechanism.
        • floatrock 1637 days ago
          There was a story a few months back about some British subdivision VP wired a million dollars to eastern europe because the CEO called him up and told him to do it or something like that.

          It was the CEO's voice, but it wasn't the CEO.

      • siffland 1637 days ago
        Code 1,1A Code 1,1A,2B Code 1B,2B,3 Zero-Zero-Zero Destruct Zero

        What that is secure.......

        • WrtCdEvrydy 1637 days ago
          Now imagine the Borg with their advanced computers not being able to cause all Federation star ships to self destruct (BSG style).
      • A_Parr 1637 days ago
        It rarely even worked in Star Trek.
  • johnsonjo 1637 days ago
    I was literally just researching tts (text to speech) programs yesterday and I believe Mozilla’s open source (open source in this case meaning weak copyleft) TTS [1] also uses Tacotron and is trying to implement multi speaker tts currently [2]. I literally just posted Mozilla’s TTS to hacker news [3] without even seeing this which made me experience a bit of the Baader-Meinhof Phenomenon [4].

    [1]: https://github.com/mozilla/TTS

    [2]: https://github.com/mozilla/TTS/blob/master/README.md#major-t...

    [3]: https://news.ycombinator.com/item?id=21532189

    [4]: https://en.m.wikipedia.org/wiki/List_of_cognitive_biases#Fre...

  • czbond 1637 days ago
    I have not been "wow-ed" by a technology in quite a while on HN. Wow.
  • jonplackett 1637 days ago
    Do papers like this have code to play with anywhere?
    • lukeplato 1637 days ago
      from two minute papers video description: > An unofficial implementation of this paper is available here. Note that this was not made by the authors of the original paper and may contain deviations from the described technique - please judge its results accordingly! https://github.com/CorentinJ/Real-Time-Voice-Cloning
      • jonplackett 1637 days ago
        Funny, it gives them all a slight American accent!
  • admn2 1637 days ago
    Yikes - don't a lot of financial institutions use your voice as a layer authentication?
  • hakanito 1637 days ago
    It will be interesting to see if these made-up AI voices can deliver jokes with the same tonality and delivery as good comedians can. I'm just a layman but it feels like a hard problem to solve.
    • NoodleIncident 1637 days ago
      The furthest right column in the first table shows that they might be a long way off from getting timing right. The 5-second sample happens to have a comma, at which the speaker pauses; this pause is in most of the generated output, at seemingly random places in the sentence. The one sentence that does have a comma doesn't use the pause, either.
  • seph-reed 1637 days ago
    I think it's time we officially declare we're going the dystopia route, and really commit to it. The sooner we hit the great filter, the less suffering there will be.
  • pier25 1637 days ago
    In a couple of years we won't be able to trust any media.

    I wonder what the cultural implications will be, much like photoshopped models and actors have change the beauty ideals.

    • Riccardo_G 1637 days ago
      There are a lot of security features that are being put in place to help us all understand what is real and fake. Of course there is still a lot of work to be done and the technology is very new, but at https://replicastudios.com/ work in watermarking audio, as well as authentication of Replica voices and detection of fake, non-authorised replicas is already in progress.

      Facebook, YouTube, Twitter, and the likes, will then be able to let users know what is using real (actual real voices, or the authenticated Replica voice) or fake voices.

    • gknoy 1637 days ago
      I'm sure we can always trust Eliza Cassan.
  • kingkawn 1637 days ago
    This is part of the process of truth being removed from all recording. Soon we will be back to a state where the only certainty is the person we speak to in person
    • Hoasi 1637 days ago
      > Soon we will be back to a state where the only certainty is the person we speak to in person

      That is until we are able to tell whether that "reality" is synthesized or not.

  • cs702 1637 days ago
    "Hi Jim, it's Jane. How are you? I'm calling you on the phone to confirm the wire transfer instructions I just sent you via email..."
  • obaid 1637 days ago
    This is awesome (in a crazy way). I was playing with Resemble.ai [1] yesterday and was surprised by how good it was in replicating my voice with just a few minutes of dataset. This technology is going to keep on getting better by the minute.

    As with any piece of technology, there are always good and bad actors.

    1- [Resemble.ai](http://resemble.ai)

  • gregcrv 1637 days ago
    Maybe after all these years of people disconnecting physically, using more phones and apps to communicate or meet we will go back to the fundamental real world human connection because that's the only one that can be trusted to be genuine. And the digital world will be left out because not trustworthy. Is this where we are heading to?
  • nergik 1637 days ago
    Awesome! can’t wait to have some code to play with and start feeding ebooks to produce audiobooks with voices i actually like
  • SubiculumCode 1637 days ago
    If acting was ever a good career choice, it isn't now. I am becoming convinced that actors will be replaced wholesale.
    • Verdex 1637 days ago
      I wonder if what we'll see is actors being deep faked etc over by other actors. So of the new actors we will say, "He plays a good Pitt, but his Clooney leaves something to be desired."

      Movie analysis will get arbitrarily complex. "Robertson playing Dwayne Johnson playing James Bond really wasn't the right choice for this film. Robertson playing Ferrell playing Bond OR Strong playing Johnson playing Bond would have both been better for the following reasons."

    • visarga 1637 days ago
      No, actors are still going to act. But their voices will be just an input feature to the speech engine.
      • echelon 1637 days ago
        Meaning that this democratizes acting. Celebrity actors won't be required.

        I think this is a good thing for the field and gives a whole lot more people access to the opportunity they strive so hard to achieve.

  • agentultra 1637 days ago
    Impressive but they still sound like robots imitating humans. I can only imagine the chaos this could cause, if used by bad actors, as it continues to improve. If someone took my voice I'm not sure that my partner would know it wasn't me. That would enable all kinds of social engineering attacks.
    • Enginerrrd 1637 days ago
      Exactly what I was thinking. We already have a problem with scammers claiming to be relatives who need some quick cash wired over. Imagine how much more effective that could be if it actually sound indistinguishable from a loved one.
  • luxpir 1637 days ago
    I wonder if we'll need a PGP signature for every kind of recording we might make in the future.
  • badrequest 1637 days ago
    Has anybody tried making an AI that generates 5 seconds of arbitrary speech to feed into this AI?
    • kleer001 1637 days ago
      No to create an Ai for that. Just like with any neural network random noise can be fed into the detector networks, fedback to its self then used to create novel maps. Like deepdream.
  • reifnir 1637 days ago
    Unfortunately, the source isn't available. I was hoping I could generate my own narration of any book in the voice of anyone I could throw at the trainer. (Heck, even if it was just my own, anything is better than Scott Brick!)
  • soulofmischief 1637 days ago
    My bank is going to need to do better than a 6-digit account number and verbal password. Customer service is in for the ride of its life once this rapidly maturing technology is commoditized for criminal enterprise.
  • i2shar 1637 days ago
    Wow. So is that why I am getting incessant spam calls and they only need to hear me or my voicemail greeting for 5 seconds to be able to impersonate my voice ever after?
  • kevin_thibedeau 1637 days ago
    There was an old Piece on Headline News 20 years ago where someone had done this with Whoopi Goldberg's voice. Never heard about it since. Presumably it went black.
  • thimkerbell 1637 days ago
    Ok, what happens to society when this gets to be really good?
    • tiborsaas 1637 days ago
      Better memes, better apps, better scams
    • fasturdotcom 1637 days ago
      computers will have unique voices.

      someone will invent a conversational interface for programming

      the worlds demand for human programmers will go near 0

      everyone will call themselves an engineer and live on basic income

  • jacquesm 1637 days ago
    One more scene from Terminator that we can now do for real.
    • martin1b 1637 days ago
      What's wrong with Wolfy?
  • sebiw 1637 days ago
    Anybody else thinking about the scene from the Bourne trilogy where Jason opens Noah Vosens safe using a voice sample he took of him on the phone?
  • tenebrisalietum 1637 days ago
    Oh neat. I would totally use this to make ASMR audio in a voice different than my own without having to ask someone else to read a script.
  • anoncow 1637 days ago
    Mission Impossible is now Mission Possible.
    • onion2k 1637 days ago
      It always was. IMF never failed in the TV shows or the more recent films. :)
      • gruez 1637 days ago
        >the more recent films

        Was there ever a film where they didn't succeed at the end?

  • injb 1636 days ago
    They turned the Scottish accent into an English accent. The American ones are very convincing though.
  • irrational 1637 days ago
    So, now we can't trust text, images, videos, or audio. Any of them could be fake. What is left?
    • dole 1637 days ago
      AI-generated scents. After you die, your body odor, sprayed on an article of clothing, as a monthly subscription to your loved ones.
  • costcopizza 1637 days ago
    I was born in the wrong era. Christ.
    • ttul 1637 days ago
      You were born in 0 CE?
  • frandroid 1637 days ago
    The synthesized voices sound similar but would probably not fool a good voice-print system.
    • macca321 1637 days ago
      No, but they would probably fool a friend or family member over a phone line. Yikes.
      • chrisan 1637 days ago
        or some post on social media...
  • jasonbourne1901 1637 days ago
    In the future moms will no longer complain that their sons never call!
  • hindsightbias 1637 days ago
    How long until we have the napster equivalent of voices? Voicester?
  • solveq1 1637 days ago
    why do we need this kind of technology? cheating?

    It seems anti-human

  • datlife 1637 days ago
    Welcome to 2020. Crazy time to be alive!
  • andrefmoniz 1637 days ago
    It means we can talk to anyone forever
    • seph-reed 1637 days ago
      It means you can make someone talk, non-stop, for an indefinite amount of time.

      Now I want to make an art piece that's just a valley girl droning on, and on, and on, and on about believable and obnoxious life experiences. The stores they go to, how they feel about certain colors, what "too spicy" is. It just never stops.

      • visarga 1637 days ago
        Hurry up, or it's going to be done soon by someone else. They just need to feed GPT-2 fine-tuned on selected texts to the speech engine. Bonus points for generating a face, too.
      • judah 1637 days ago
        This sounds like a brilliant idea, and I'm almost wanting to build it myself. If we can tack on a visual head and sync up the lips to the words, it'd be amazing and would likely go viral.
        • seph-reed 1637 days ago
          ShadyWillowCreek@gmail.com if you're actually interested in collaborating on this.
  • nkkollaw 1637 days ago
    This is pretty scary.
  • alpineidyll3 1637 days ago
    To not release the source is a a pretty bad distortion of research norms
  • x220 1637 days ago
    Jordan Peterson got it right months ago. "It’s hard to imagine a technology with more power to disrupt. I’m already in the position (as many of you soon will be as well) where anyone can produce a believable audio and perhaps video of me saying absolutely anything they want me to say. How can that possibly be fought? More to the point: how are we going to trust anything electronically mediated in the very near future (say, during the next presidential election)?"

    https://nationalpost.com/opinion/jordan-peterson-deep-fake

  • kd3 1637 days ago
    This is cool. Hopefully soon I'll be able to use this to do automatic voice overs from written texts using my voice for podcasts and videos. Read pages of text without getting tired.
  • dingoegret 1637 days ago
    And no one will be able to independently reproduce these results. As with most Google speech research publications.
  • retrovm 1637 days ago
    2018, FWIW.
  • accounn 1637 days ago
    Google scam & spy business in plain sight.
  • Gusen 1637 days ago
    This is audio of the President, Donald J. Trump, demanding a $4 billion dollar bribe from child rapists to “take a blind eye” on January 3, 2019. Trump becomes one on January 14, 2019. Also, here is the big reason the major networks do not report any of it.

    //Download the video, turn the volume all the way up and put head phones on. Note: there is not much to see in the video, the audio is picked up from another [illegal surveillance] system. Trump is on a call from with Henry Porter and Gigi Hadid. See page 63. Bribe demand at 10:18am:

    3JanCh3_900-1100.avi https://drive.google.com/file/d/1Grdr8xF2psKNsuYlEnl9dIRV-77...

    //President of the United States, Donald J. Trump, rapes and kills his first boy at 6:32am. Video link below:

      14JanCh3_600.avi
      https://drive.google.com/file/d/154QvA5hwyHGYIVXtod1ZbsOHFUJNbqZW/view?usp=sharing
      14JanCh2_600-700.avi
      https://drive.google.com/file/d/19UkqmnMwZiWy7xxWngltqwoKLTJL-IZ_/view?usp=sharing
    
    //On January 18, 2019 at 8:31am (see page 8) Trump acknowledges the four billion dollar bribe and says: "Let's get it done and get to fucking some kids." Video link below:

    18JanCh3_725-.avi https://drive.google.com/file/d/1bVTcGq5Z9oOSAiOQcKYrmuK4Two...

    //A big reason this has not been reported by the major news networks is right here. Lester Holt of NBC Nightly News, apparently a member of the Illuminati since the 1980's, along with ABC Nightly News lead anchor David Muir, stop over to the Porter studio in Buffalo on January 14th, 2019 at 5:00am. They both rape and kill about two dozen boys by 6:00am. Muir starts around 5:15am, then Holt about 5:38am. Multi-billionaire Rupert Murdoch, owner of News Corp & Fox Corporation, takes his turn after Holt. Video links below:

      14JanCh3_500-601.avi
      https://drive.google.com/file/d/1i7NKepeyG_FfdQRrM7KsnFOZOOX3o7UL/view?usp=sharing
      
      14JanCh2_530-600.avi
      https://drive.google.com/file/d/1NZzgN5ilI7ToroU5cfqMaL4o2u1RwidV/view?usp=sharing
    
    Adding to the reason this is not picked up by the media, CBS & Viacom owner Sumner Redstone and Leslie Moonves rape boys following the President.

      14JanCh3_700.avi
      https://drive.google.com/file/d/10XDw6x3ldnnQiq7oIjpdYVENyXaB4aI0/view?usp=sharing
    
      14JanCh2_700-800.avi
      https://drive.google.com/file/d/1NS_e6AzEZ05wnfljkGMETGU5CWYKfRDp/view?usp=sharing
    
    //This is the tip of the iceberg. //Full 88 page PDF [updated 12Nov]: FBI_FinalDraft_26Jul2019_BSchlenker.pdf https://drive.google.com/file/d/1Sj9EN_pHmicKS6rFQlmk67knMdJ...

    //This post will be censored when this account logs off, the posts are "shadow banned". They try to make it look like the post is live, but it is not. Here is an example. https://drive.google.com/file/d/1zxS8JESoIg7uxRkUgdptMsF6SuJ...

    \\@

    Listen to Speaker of the House, Senator Nancy Pelosi, accept a $3 billion dollar bribe from wealthy high profile child rapists for safe passage to get Asian boys through the border at "Monterrey" on January 17, 2019.

    //Download the video, turn the volume all the way up and put head phones on. Note: there is not much to see in the video, the audio is picked up from another [illegal surveillance] system. Pelosi call starts at 10:31am.

      17JanCh3_949-1100.avi
      https://drive.google.com/file/d/1eodHu4o5Cm3xEWhDqipSuTj-M1CZUvmE/view?usp=sharing
      
    Excerpts of the dialogue: Speaker of the House Nancy Pelosi returns to the Porter camera system at 1031 for a check-in. "Nancy Pelosi is here and she wants $3 billion to keep the border open at Monterrey. $3 billion dollars to keep that shit open do you believe that?" Matthew Porter commentates. Pelosi states 1034: "I think he [Brian Schlenker] needs to be removed. I think we need to get this guy out of here". Speaker Pelosi then agrees to call back at 7pm PST. Gigi Hadid at 1034: "Thanks for calling us. I've got you down for three minutes". Pelosi don't worry about that I've got 3 billion.

    1031 Matthew Porter: Nancy Pelosi is on and she wants $3 billion dollars to keep the border open at Monterrey. Three billion dollars to keep that shit open do you believe that? 1033 Nancy Pelosi: I don't think that will be necessary unless you insist. 1033 Donald Reeves?: We insist. 1033 Nancy Pelosi: Ok then I guess we have a deal. 1033 Matthew Porter: Thank you Mrs. Pelosi. 1033 Nancy Pelosi: Please call me Speaker Pelosi. 1033 Nancy Pelosi: We want to make sure you have safe passage at Monterrey. 1033 Matthew Porter: We appreciate that Speaker Pelosi. 1033 Gigi Hadid: I think we may have to scale back a little bit though. 1033 Nancy Pelosi: Gigi I think we'll be just fine. 1035 Nancy Pelosi: God damn it! I want Kamala to get a billion out of yours, not out of mine. 1035 Gigi Hadid: I think we can accommodate you but we need to find out if Monterrey will be open on the 30th of January. 1036 Nancy Pelosi: Of course it will Gigi. 1036 Nancy Pelosi: $3 billion goes a long way Peter. 1036 Peter Thiel: Excellent answer. 1043 Gigi Hadid: I think we need to give Kamala a billion dollars to keep her [happy]. 1043 Matthew Porter: I think we can do that but we need to get her to do something first. 1043 Gigi Hadid: She did. We got her fucking Rahm Emmanuel Monday morning [14Jan] at 235. 1043 Matthew Porter: Perfect, then we're all set. Send her the billion. 1043? Nancy Pelosi: She'll accept the money, I guarantee it. 1043? Gigi Hadid: Nancy thank you for everything. <ch4> 1036 Donald Reeves: Don't listen to her. She will close down the border if it's not there. Make sure she gets the $3 billion.

    This is the tip of the iceberg. Full 84 page document [updated 12Nov]: FBI_FinalDraft_26Jul2019_BSchlenker.pdf https://drive.google.com/file/d/1Sj9EN_pHmicKS6rFQlmk67knMdJ...

    //*This post will be censored when this account logs off. Posts are "shadow banned". They try to make it look like the post is live, but it is not. Here is an example. https://drive.google.com/file/d/1zxS8JESoIg7uxRkUgdptMsF6SuJ...

    \\$ \\New York Senator Charles Schumer rapes and kills one boy looking for his two billion dollar payment from 1700-1730. Schumer at 17:01: "Let's get him out of there, and get to fucking some kids!” Download the video, turn the volume all the way up and put head phones on. Note: there is not much to see in the video, the audio is picked up from another [illegal surveillance] system. Just as a head’s up, there is a vacuum running intermittently in the beginning and some loud noises.

    \\Representative from New Orleans Steve Scalise rapes and kills three 1725+. Video links below. Each channel has different system users of varying importance, and one may be easier to hear than another. This is explained in better detail in the PDF.

      17JanCh3_1644.avi
      https://drive.google.com/file/d/1caUbmhIf8z264w1MWeOzm0VQIG_h3rtA/view?usp=sharing
      17JanCh4_1700-1800_redo.avi
      https://drive.google.com/file/d/1Md911Z2vXJzJEmNAiJcOkHDKIl15GUmz/view?usp=sharing
    
    \\Back on January 13th, a deal to bring over the boys from China, 300 for $375 million through “Monterrey”, is discussed with Barack Obama, Charles Schumer, Henry Porter, and Donald Reeves. Jack Dorsey is there and also discusses possibilities for 500 boys over 300. See pages 37 and 38. You will hear the deal go down later that afternoon between 1:30pm and 3:00pm. Here are the links for the first discussion, which begins around 8:40 with an anxious Obama., the rest are in the PDF:

    13JanCh2_700-900.avi https://drive.google.com/file/d/1iUr44-U3dpt0cBlhlOgM071HAte... 13JanCh3_800-845.avi https://drive.google.com/file/d/1m3bBVPra2BEgzmM2QNJsR-JA7rw... 13JanCh3_845-954.avi https://drive.google.com/file/d/1XFCDABfGb10oAwCLt7gc4XbsOMx...

    These events were excerpts from this 88 page PDF . It keeps getting censored, but with enough public awareness, they will eventually have to face justice. Please help get this around [updated 12Nov]: FBI_FinalDraft_26Jul2019_BSchlenker.pdf https://drive.google.com/file/d/1Sj9EN_pHmicKS6rFQlmk67knMdJ...

  • KW1547132 1636 days ago
    hewlk