OpenVoice: Instant Voice Cloning

(github.com)

270 points | by tosh 10 days ago

23 comments

  • grej 10 days ago
    This story just within the last days:

    "Athletic director used AI to frame principal with racist remarks in fake audio clip, police say"

    https://apnews.com/article/ai-artificial-intelligence-princi...

    • fennecbutt 6 days ago
      Which is why we need to make this technology readily available and well known so that people are more aware of it and don't trust everything and look up sources.

      Ah, who am I kidding, most people will still not fact check.

    • skeaker 9 days ago
      [deleted]
  • nolok 10 days ago
    We're really in an era were laws and their enforcement will have a lot of catching up to do very fast.

    Fake historical proof, fake leaks, fake endorsments, fake ads ... People couldn't be bothered to double check when it was mere random text article on facetok, it's going to be so much worse ...

    • jorvi 10 days ago
      From hypernormalisation to hyperreality.

      I’ve been telling my friends that 5-10 years from now, the only things that you’ll be able to ~100% trust is what happens in front of your eyes, in that very moment. You can elect to trust reliable news organizations to vet things for you, but even if you do, due to polarization a huge subset of the world will think you’ve been got and discard everything as fake.

      Look at stuff like Sora, or all the new voice models coming out. Just a few days ago there was a high school athletic coach (!) arrested for cloning the principal’s voice using that to say vile stuff. He only got caught because he used his own e-mail.

      Now combine that with the fact that Microsoft’s new Phi-mini model approaches GPT-3.5 performance using 3.8 billion parameters, whilst GPT-3.5 uses 175 billion. And we’re only ~5 years of optimization into this tech.

      I want to get off Mr Bones’ wild ride.

      • pfannkuchen 9 days ago
        > the only things that you’ll be able to ~100% trust is what happens in front of your eyes

        Won't this just be a return to the historical norm?

        Prior to photography being invented, there was no guarantee that any retelling of events (whether spoken, written or drawn) was true.

        It will be weird for people alive today, but it doesn't seem that risky from a societal perspective.

        • Breza 3 days ago
          Good point. When I was first exploring the Bible, I'd get bored with the detailed lists of people who saw things happen. Took me years to realize that the lists were people who were still alive at the time and could provide verification.
        • zamalek 9 days ago
          > doesn't seem that risky from a societal perspective.

          I would wager we will see an improvement to society.

      • bombcar 9 days ago
        We already know you cannot trust what you see with your eyes (check any "compare eye-witness reports with trusted video recordings" or watch Penn & Teller).

        We're in for a fun and wild ride.

      • dudefeliciano 10 days ago
        > you’ll be able to ~100% trust is what happens in front of your eyes

        5-10 years from now we may have vision pro gen 5, or whatever system achieves market dominance, between our eyes and what happens in front of us.

        • noAnswer 9 days ago
          That is a plot point of Ghost in the Shell's Laughing Man.

          People where unable to see the face of the hacker. Instead they saw a smiley.

          • Der_Einzige 9 days ago
            Ghost in the shell anticipated so much. The premise of the first movie is that a model gains sentience (creates its ghost out of thin air), claims asylum, and the model merges itself with the ghost within the major.

            Everything in that series depicted is stuff that either exists today or will exist within 30 years. Everything.

        • ben_w 10 days ago
          Even without that, we have at least one crowd funded YouTuber growing mammalian cells in vats with the explicit intention of making a "meat robot".

          https://youtu.be/Z_ZGq8Tah0k?si=GuV1QC50jiv7yHP7

          There's not a huge gap between that and a Westworld style infiltration-by-printed-robot.

          • justapassenger 9 days ago
            The gap is as huge as between ancient Greeks discovering planets and us having colony on Mars.
            • ben_w 9 days ago
              You think that the gap between "a meat robot" and "a meat robot of a specific shape" is that big?

              (If you're thinking of the robots' brains in the TV show, sure, that's a totally different challenge; but you don't need them for a replacement-takeover scenario, there's a lot of other ways to do that).

        • giantg2 10 days ago
          AR has better real world capabilities and use cases than VR. AR is what you use to enhance reality or make it easier. VR is where you go when you want fiction.

          Aside from entertainment, I don't understand this VR push when AR is arguably better for many of the non-entertainment uses

          • dudefeliciano 10 days ago
            Wouldn't AR be as susceptible to a sort "real world" injection attacks as well? Vision Pro uses cameras to record the outside world and display it back to the user, it should be relatively easy to add/remove objects in the rendered feed.
            • giantg2 9 days ago
              Vision Pro is VR. It's just using the surrounding environment to build the virtual one. Something like HoloLens is true AR.
              • fennecbutt 6 days ago
                And here I thought it wasn't any of those but "spatial computing". Apple told us so it must be true.
        • daelon 10 days ago
          I won't.
      • bparsons 10 days ago
        You can trust what happens in front of your eyes until everyone starts wearing augmented reality contact lenses.
      • jareklupinski 9 days ago
        > I want to get off Mr Bones’ wild ride.

        the ride never ends

      • giantg2 10 days ago
        "I’ve been telling my friends that 5-10 years from now, the only things that you’ll be able to ~100% trust is what happens in front of your eyes, in that very moment. You can elect to trust reliable news organizations to vet things for you,"

        The time is now. Even the mainstream news organizations are probably high 90s% reliable as they've been caught selectively editing, not vetting sources or facts, and displaying biases.

    • andsoitis 10 days ago
      Trust is a dependency for human existence. Not just for civilization, but also very small communities and basic exchange of ideas, goods, and services.

      I cannot foretell how the risk of trust destruction from GenAI will unfold, but I'm optimistic our creativity will win out.

      • cthalupa 9 days ago
        If you think about it, though, the window for there being something resembling objective universal truth has been a very very short period of human history. It really didn't exist before the internet and ubiquitous smartphones.

        Before the internet, TV, radio, newspapers were our sources of truth beyond just trusting people in our immediate vicinity, and these were all heavily filtered by what stories they decided to run, the amount of detail they focused on, any human bias that crept into their reporting, etc. I'm not a "FAKE NEWS!!!" kind of guy, but one has always had to ingest news from these sources with some level of filtering in this regard, and understand that there might be other sides to the story, or whole stories of importance going unreported.

        If we revert to subjecting images/video/audio clips to the same level of skepticism we had with random people informing us of pieces of news with no proof, then we're effectively just at the same level of objective universal truth as we had been for the overwhelming majority of human history.

        I'm not arguing this is a good thing - just that it might have been a small and blissful island that some of us had the privilege of enjoying.

      • froh 10 days ago
        yes. that depndency is why it's even in the ten commandments: don't give false testimony / don't slander, don't lie.

        https://en.wikipedia.org/wiki/Thou_shalt_not_bear_false_witn...

        but instead they discuss who I've married. sigh.

        anyhow: I share your optimism.

      • bloopernova 10 days ago
        This is partly why I'm so fascinated and disgusted by trolls and astroturfers. They erode trust in a given forum, which degrades the quality of discourse because no one wants to invest time in untrustworthy discussions.

        Sometimes I wish I could get an honest answer from trolls about what they hope to achieve, but of course that will never happen.

        • tetraca 9 days ago
          > Sometimes I wish I could get an honest answer from trolls about what they hope to achieve, but of course that will never happen.

          It's usually not that complicated: They enjoy provoking people, particularly people that can be reflexively upset by reading words. It's a game to them, against a party that they do not respect. The words they say might upset you, but the words ultimately mean nothing to them outside of provoking you, and the more chaotic they can make the situation the more amusing it is.

          • maksimur 9 days ago
            2 possible additional explanations for trolling:

                1. Spite/damage a community for perceived or effective injustice/discrimination. I remember this happening on Reddit and Stack Overflow circa 2010-2014.
                2. Actually a neurodivergent person mistaken for a troll. I remember this happening on HN. Remember him being either schizophrenic or autistic. Another example might have been the creator of Temple OS.
        • ranger_danger 9 days ago
          I get/got trolled ALL the time, and it was extremely frustrating. I would get irrationally angry at them. Then I started doing it myself and realized it was fun. Not sure if it was really a coping mechanism or I was just bored or what. IRL I don't think I troll people or say inflammatory things, I tend to be quiet and reserved. But most online chat platforms I have been on have been extremely toxic, with not really any better alternatives, so I guess if you can't beat em, join em... but I think that mentality mostly evolved subconsciously in me over time.
          • andsoitis 9 days ago
            > so I guess if you can't beat em, join em...

            That's not the way. Be the change you want to see.

            • ranger_danger 9 days ago
              I agree with this in principle but in my experience many trolls never change, and trying to act nicer just fuels them to keep going, because they love attention.
    • throwthrowuknow 10 days ago
      A digital audio file is not even close to being proof of anything. Even without voice cloning you can easily edit, clip and compose audio into almost anything you want. It’s also not difficult to simply impersonate someone else’s manner of speaking with practice something that is commonly done by both amateurs and professional actors. The only thing that changes is the ease with which this can be done which should help everyone understand how unreliable such “proof” is.
      • telesilla 10 days ago
        Sounds like a remake of Sneakers is needed--with a fresh take on impersonation and social engineering, to remind people what's possible and potentially dangerous.
      • colecut 10 days ago
        When dealing with social conflicts, there is very rarely "proof", just evidence

        In courts of law, digital tapes have been frequently used as evidence.

        • bo1024 10 days ago
          One thing that surprised me, as a juror, was that nobody ever simply submitted a piece of evidence. Each of the hundreds of exhibits was presented with a person on the stand attesting to its provenance, under penalty of perjury.
        • exe34 10 days ago
          The thing about evidence is that it can be contested. I imagine that in the cases you refer to, one side presented them as evidence and the other side said oh bugger, you got me. If they had reason to believe the audio had been tempered with, or entirely fabricated, they would then bring expert witnesses and eventually the other side would have to offer more evidence that it was not tampered - often this sort of thing comes down to "under oath, I swear I recorded this at the date/time and the subject on the tape is $name". This might not be enough to convict or exonerate, but it'll then count as part of the case.
          • close04 10 days ago
            > The thing about evidence is that it can be contested.

            The danger is in the court of public opinion where just having the conversation is damage enough. If you have to go out there and fight it then you give it even more visibility and it takes your focus and resources away from more useful activities. At the end of the day many people only remember the accusation, not what comes after.

            • exe34 10 days ago
              > At the end of the day many people only remember the accusation, not what comes after.

              Never a better time to learn the dichotomy of control. What is in your control is your thoughts and your actions. Outside of your control are: you body, your reputation, your wealth, you health, etc. It matters what you choose to do, but the outcomes are out of your control.

      • tyingq 10 days ago
        Legal opinions don't seem to have caught up...

        https://www.justice.gov/archives/jm/criminal-resource-manual...

      • jilijeanlouis 9 days ago
        There are actually a big opportunity for companies like loccus: https://www.loccus.ai/
      • nolok 10 days ago
        I don't know the political situation in your country, but all I will say is that the putin-aligned far right wing in mine uses very much fake quote and deep fake videos and things like that to propagate ideas, and that their "followers" eat it up, and then you either have to let those non truth remain or you spend all your energy fighting them / defending yourself, making you look guilty. And in the past 3/4 years (and it exploded with covid), it's the followers themselves that now start those things.

        For AI, those AI move it to "someone with some dedication can do it" to "anyone with a computer can do it".

        • SoftTalker 10 days ago
          They do it here too, especially obvious in political campaign ads but no doubt the established powers do it to maintain their own image as well.
      • croes 10 days ago
        Easily for people who what to do.

        Now it's easy for everyone.

    • ActionHank 10 days ago
      These are big issues, but I would say that a bigger issue is the case where a spam caller has you on the line talking for ~10 seconds and then calls the bank or family member as you.

      Android and iOS should support real time voice changers as the norm with a quick switch button on the dialer to disable it and an option to have it off for known contacts.

    • andy99 10 days ago
      I've come around to the idea that the hype around criminal or bad actor uses of AI is the same as the hype around other uses. Some real uses will shake out but the delta between what's actually enabled by the tech and what was possible anyway is way smaller than people like to represent.
      • jddj 10 days ago
        I'm not sure. Maybe I'm caught in the hype but I feel like possible is one thing, scalable is another.

        We're at the point now where, for example, a leak of phone number + address book would enable a high quality, large scale automated impersonation pipeline that many here could put together in a weekend.

        Install a compromised app (or just be a once-user of a newly-compromised app), then answer the phone for a minute and then everyone you know receives a somewhat believable phonecall to empty their wallets / your accounts?

        • Eisenstein 10 days ago
          It will happen, then people will adapt. Those that don't will suffer. It is tragic but not new. People are getting scammed now by being told in broken English that they need to buy gift cards in order to fend off Tax evasion charges or whatever.

          The fact is that everyone gets taken for a ride at least once. Many people never admit it to themselves, but we have all been or will be taken advantage of. Some of us end up devastated by it or are an 'eternal mark', but the vast majority of us just learn a lesson.

          What happens to society when trust becomes risky to the point that everything could be a ruse by a bad actor? It ends up doing what it always does in that situation -- authoritarian regimes can be looked at for examples of what will happen: people will rally around family and extremely close friends, and close off all vulnerability and vastly decrease their tolerance for risk around everyone else, unless they have no other choice.

          • sandspar 8 days ago
            Imagine it's your mom being scammed.
            • Eisenstein 7 days ago
              How would that change anything?
        • effluvium 10 days ago
          It is the scalability. A motivated well-funded bad actor could reek total havoc.
      • nolok 10 days ago
        I am not talking about criminals or similar, for that I agree with you.

        I am talking about political. There has been a MASSIVE explosion of fakeness since covid, coupled with people who completely lost their ability to actually check information, and anything that makes it easier for anyone to propagate is dangerous in my view.

        We've reached the point where "check by yourself / do your own research" moved from what the proponent of the truth used to tell people to go check, to what the other side is now using to give as much value to their facetok videos and whatnot (I am speaking in the general sense).

        • cruffle_duffle 9 days ago
          > There has been a MASSIVE explosion of fakeness since covid, coupled with people who completely lost their ability to actually check information, and anything that makes it easier for anyone to propagate is dangerous in my view.

          The best part about this statement is it doesn’t even matter what “side” you are describing because all “sides” are subject to the same issue. Hell for more than a year every covid article from New York Times quoted an obscenely high 4% “death rate” despite it being off by multiple orders of magnitude. And if pressed they’d be able to easily “prove” their rate by citing some dodgy data model, using badly collected data, or just not understanding how to interpret the data. And that “truth” propagated by the NYT was cited all over causing all kinds of wildly over the top responses.

          We live in a post truth world. You can always find some authoritative sounding source to support whatever it is you want to believe. There is always a way to massage the data and “facts” in a way that supports your claim as the “true truth”. It does even matter what “team” you are on… you’ll always be able to craft “the truth” by selectively using whatever source material you want.

          I don’t even know how you’d solve this problem because it is a core issue with trust. It’s very, very hard to establish “knowledge trust” on something like the internet.

          • ranger_danger 9 days ago
            After having very similar observations I accidentally went down the philosophy rabbit hole. I would look up wiki pages like "Newton's flaming laser sword" and get into a neverending cycle of reading all the "see also" links recursively and started to realize just how meta-ly subjective literally everything is.

            And while I was already super logical being a programmer and such, I don't even think that I like having did this, because now I am starting to lose the ability to "posit" or believe something is true, because one can always argue that so many things are subjective that it is impossible to come to an agreement on something in any reasonable amount of time.

        • psychoslave 9 days ago
          Or an other perspective is that politics and religions are just byproducts of propaganda mesmerizing people. So they are already many battle proofed efficient way to brain wash people, and probably many more furtive and cost effective than these new stuffs.

          Thankfully you still can rely on fellow HN commenters like myself to bring you thoughts free of any mind binding manipulations and exempt of sarcasm.

      • unraveller 10 days ago
        It's not a very consistent pearl clutch either, would they deny all access to computers because terrorists will use them to fire heat-seeking missiles?

        I think it's just a lack of imagination, we can do this funny offensive misleading thing right now, it can surely be used in other ways but you'd have to let your mind wonder for one extra moment after your social media induced reaction fixation episode. This could be depression hack to hear your own voice in different accents motivate you.

    • corobo 10 days ago
      Honestly it couldn't hurt for people to re-learn not to trust things on the internet.

      I haven't heard someone snarkily say "it's on the internet so it must be true" in like 12 years. Let's bring that back.

      • ranger_danger 9 days ago
        Really? I hear it all the time. I mean, if it's on the Internet it must be true.
    • simion314 10 days ago
      >We're really in an era were laws and their enforcement will have a lot of catching up to do very fast.

      I mean there are existing laws that would apply, there are laws against scamming or whatever bad stuff you can do with this. The world might need a law though to force media to label AI generated content and IMO a law where media and users would take responsability for their content.

      What I mean if say as a random user I claim something in a media post like "eating shit cures you by covid" I should either be forced to add "btw I am not taking any respectability for my comment and I am not a medic/lawyer/expert" otherwise I should be forced to pay damages is someone sues me for my bullshit claims.

      So cloning voices should be legal, sscamming people is illegal.

  • _joel 10 days ago
    It's not cloning, it's just copying the tonality. It states this in the docs but still calls itself voice cloning.

    I tried it and I ended up sounding American, not my ususal dulcet Lancashire tones. Absolutely nothing like me.

    • unraveller 10 days ago
      You should be able to bring it back to your proper accent using https://voiceshopai.github.io

      VoiceShopAi can convert from young to old, male or female, or into any country's accent.

      found via https://github.com/metame-ai/awesome-audio-plaza who is tracking most things in the voice space as they come up.

      • _joel 10 days ago
        Can't see any code for that, unless it's there and I've missed it.
        • unraveller 10 days ago
          You're right. No code ever. Demo looks to just be a tease for the commercial offering to come. It seems they think it is more ethical for people to live their lives misinformed by the availability of this tech because only good people will study the field or run the voice APIs.

          >we share the details of our findings here, but do not plan to publicly release the model checkpoints or implementation at this time [due to deepfake potential]

    • youngNed 10 days ago
      never really thought about how much i need a 'Fred Dibnah' voiced AI until now
      • _joel 10 days ago
        Aye lad!
    • tiborsaas 10 days ago
      Same here, I've tried it with my own voice and luckily it sounds nothing like me.
    • causal 9 days ago
      Yeah not the best title/name. On a more meta note, I sometimes feel like HN comments are increasingly Reddit-style headline reactions with little investigating TFA or peering into the tech itself.
      • sandspar 8 days ago
        When people leave Reddit half of them come here. Lots of people left Reddit recently.
  • screamingninja 10 days ago
    What is a legitimate use case for this? I can think of a hundred applications for deceiving others but struggle to come up with a scenario where one would want their voice cloned or reproduced.
    • dannyw 10 days ago
      You're recording a podcast and want to tweak some of your own words, without the hassle of re-recording.

      You're an indie game developer, and want to have vibrant NPCs with their unique voices and dialogues powered by a LLM.

      You're producing a movie, and want to tweak certain lines of dialogue; with the consent of the talent.

      You suffer from health conditions and are gradually losing your voice, but you still want to communicate.

      There are certainly legitimate use cases of this technology. I personally believe illegitimate use cases overshadow the legitimate use cases, but I don't think it's fair to say there are no legitimate applications.

      We should strictly regulate the use of this technology by criminalizing abuse; not by banning it altogether (which is pretty hard in the case of software and small models).

      • dylan604 10 days ago
        > You're producing a movie, and want to tweak certain lines of dialogue; with the consent of the talent.

        The latest agreement to end the last round of strikes was to prevent this very thing.

        Of your list, the medical condition to give someone their real voice instead of a Hawking voice would be the most legit reason. Everything else is a skewed sense of morally acceptable as I think they are shady

        • dannyw 10 days ago
          I wouldn't go as far as that. Plenty of indie to AAA games are produced using commodity assets / resources (e.g. why make your own tree model, when there's plenty of pre-mades in the marketplace). Yes, that takes away work from artists, but it is part of productivity and game development.

          Centuries ago, elevators were manned. Today, they're all electronic. It is the inevitable march of progress and productivity.

          • dylan604 9 days ago
            There's a difference from starting the project by using OpenVoice vs hiring an actor for 90% of the work but then cloning their voice because you can' be bothered to reschedule the same actor for creative changes.

            But if you start that project by making a voice sound like Morgan Freeman because you can't afford Morgan Freeman but you feel entitled then you can go pound sand. So your choice of making a generated voice should be of a voice that someone else isn't already using.

      • ranger_danger 9 days ago
        I wish I had your creativity when trying to think of something useful to code.
    • whycome 10 days ago
      It’s only a matter of time before Alexa and other agents use better customizable voices.

      Audiobooks could have voices read by characters rather than a single narrator faking it. (If even)

      You have a cold but still want to give a speech without coughing.

      Low bandwidth transmission of audio: transmit just the text and use local voice model to replay it.

      Talk to your loved ones after they’re gone.

      Hilarity and comedy.

      • beretguy 9 days ago
        > Talk to your loved ones after they’re gone.

        Ok, no, that's bad. Have you seen Black Mirror?

      • ranger_danger 9 days ago
        Imagine the day when Alexa speaks back to you in your own voice. People would go insane.
    • r2_pilot 10 days ago
      You may not be trying hard at all then. The first thing I thought was to clone your voice to use in real-time translations. I can probably think of several others mentioned in comments below, but this is a 100% always-useful never-nefarious(assuming perfect translation not being maliciously used) application.
      • brigadier132 10 days ago
        This tech makes me not even want to speak.
        • Wistar 9 days ago
          Yes, I have, just now, been thinking about how few, and featureless, syllables I can utter to determine if an unknown caller has a legitimate reason for calling, such as a large-delivery driver, medical lab or other real call.

          It seems the best tactic is to make the caller do as much of the talking as possible.

          Or use a generic cloned voice to interact with unknown callers.

        • beretguy 9 days ago
          This tech makes me not want to have a YouTube channel.
        • figglestar 9 days ago
          Use the tech to speak and hide your real voice.
    • anotherevan 9 days ago
      I have a friend with a paralysed larynx who is often using his phone or a small laptop to type in order to communicate. I know he would love it if it was possible to take old recordings of him speaking and use that to give him back "his" voice, at least in some small measure.

      Unfortunately I have yet to see something that can do this and provide a voice model that you could plug into Android TTS and/or Windows which are what he uses.

      • jilijeanlouis 9 days ago
        Why would you use a dedicated app? Does it have to be natively embedded in android ?
        • anotherevan 8 days ago
          Does not have to be natively embedded. Should work locally without needing an internet connection to perform the speech generation though.
    • colechristensen 9 days ago
      Fixing small errors in narration, voiceovers, or other recorded content.

      Translation of recorded content with the original voice into new languages.

      Comedy as long as it’s obvious that it’s a fake.

      Actually intentionally selling your voice to be the voice of some text to speech product. Maybe I want Alexa to have the voice of Danny Devito, as long as he’s ok with it and getting paid.

    • shortrounddev2 10 days ago
      I play a lot of counter-strike and it's very amusing when people hurl insults at the other team with the voice of Joe Biden
    • lukevp 9 days ago
      My wife has been sick all week and has to communicate over text because her voice has gone. We’ve been talking about making voice clones of ourselves for situations like this. Some people never regain their voices so preserving them before they lose it is super valuable.
    • gotrythis 9 days ago
      I imagine training people, and having everything I say be available in any language, matching my tonality, and being able to reach a global audience. I'm very much looking forward to this.
    • mlboss 10 days ago
      Podcast production without speaking. Audio correction in media.
    • bhickey 10 days ago
      > What is a legitimate use case for this?

      Voice loss.

    • tmaly 9 days ago
      What if you wanted to create audio for your videos without having to have a recording session.
    • smrtinsert 10 days ago
      Tiktok videos for the amusement of millions?
      • dylan604 10 days ago
        For 6 months before it's banned!
    • tjbiddle 9 days ago
      Sending personalized messages to customers
    • codelobe 9 days ago
      Indie Gamedevs can do their own Voice Acting? See also indie film, same use case. Actor dies / hit by buss before a work is finished - Create a few more lines posthumously (it'll be in the fine print of the contract that you allow voice & image fakes in the event not able to do them). Satire, Pranks, and alleged pranks (stuff that makes folk laugh).
  • ChildOfChaos 10 days ago
    Where are the best places to keep up with all of this? I'm very interested in this area as I want to use these tools to create things with and my own voice isn't great for this.

    Speech to speech seems like it might be better than TTS to get it to be more natural, i've played around with some tools like RVC etc, but I feel like there are maybe a lot of great AI workflows I am missing amoungst all the AI noise, it's the interesting workflows and people doing interesting things with AI that I am more interested in.

    • jilijeanlouis 9 days ago
      Definitely twitter. This is where everything is announced and commented
      • ChildOfChaos 9 days ago
        Thanks, it's annoying some of the best places seem to be the places I am trying to avoid due to wanting to avoid the more negative aspects of it.

        Do you have any recommendations on twitter of who is good to follow?

        Particularly interested in people doing intreasting things with AI, I already subscribe to the normal AI newsletters, such as bens bites etc.

  • skeaker 9 days ago
    Awful lot of doomsaying and drama in here. What makes this release so bad compared to the existing voice cloning AI methods that have been publicly available for ~1 year already?
  • mindcrime 10 days ago
    My voice is my passport, verify me
  • andy_ppp 10 days ago
    I really can’t wait until voice cloning means we get a version of audiobooks read in the author’s voice. Of course it will never be quite as good as them reading it themselves but I think the author’s voice adds something that voice actors can’t- they appear to be too generic and too affected in their pronunciation for me to connect with.
    • smeej 10 days ago
      What the author adds, if they're not also a trained/well-practiced voice author, is that their inflection exactly matches how they meant the words in the book to be spoken/understood.

      AI isn't going to be able to do that. As good as it may get, it won't be able to read the mind of the author. It's going to be even more generic than a human reader.

      • throwthrowuknow 10 days ago
        Exactly, the improvement will be in rerecording terrible readings into something enjoyable or at least inoffensive. That and personalization so you can choose the voice that you prefer.
      • wddkcs 10 days ago
        Will this be the case when/if books become largely written with the help of AI? Let alone when AI start writing the books themselves.
        • nilsherzig 4 days ago
          There already are a lot of ai generated books on Amazon Kindle. Especially books for children (which require less text) are popular. I think that's a big problem, since LLMs are pretty good at imitating the style (good enough to fool a potential buyer) but don't seem to build the story on some sort of good message for the child.
        • smeej 9 days ago
          It's an interesting question, but I don't think the models are trained to be spoken-language first like humans are. I think we all ("we" meaning "people who communicate by speaking," because I'm less sure if this would be true of sign languages) inherently think of writing as a somewhat lossy graphical representation of speech (which is itself an extremely lossy form of telepathy).

          But the LLMs don't, do they? If anything, they're written-first, or even on some deep level, binary-first. I don't think what they write even has "a way they meant the words to be emphasized in speech."

          • wddkcs 9 days ago
            Seems like a lot of guessing. I'm not convinced AI aren't "thinking" in the same manner we are. Eventually we'll have models trained on speech only, or modes of expression we can't fathom. Humans have no moat.
            • smeej 7 days ago
              I wonder if AIs would even decide to do things in the same way we do. Most of what humans do has come from generations of having to operate within constraints that change over time. AI gets to leapfrog those constraints for a whole different kind.

              Why would we assume what comes from them will even aspire to "being like humans"?

              The number of reasons AIs might not add the same things to an audio version of a book (the context we're talking about in this thread) is essentially infinite. It seems vastly more likely that they won't add what the author adds than that they will.

              Humanity may not have a moat, but each individual human does, especially when it comes to art, where I'd include writing.

              • wddkcs 7 days ago
                If humans have unique capabilities individually, we would have them collectively as well. I have yet to see a single argument that any biological process can't be replicated or synthesized. Until there is such an argument, it's special pleading.

                I can't say anything an an AIs aspirations, but that fact that we're imbuing them with all of our collective data, means they will be skewed to perceive the world similarly us, at least initially.

            • jilijeanlouis 9 days ago
              +1000
      • Kuinox 7 days ago
        So you just need a sample of the author saying the "odd" words.
    • PodgieTar 10 days ago
      Odd, because I actually worry about this. I don't see why you'd want your books read by the author. Trained Voice Actors do a much better job, and can modulate their voices based on tone.

      Autobiographies? Fine, but most of the time they are usually read by their authors.

    • Rodeoclash 10 days ago
      If you think that a voice actor reading an audio book is too generic then I've got bad news about an AI trained on the author's voice...
      • andy_ppp 10 days ago
        I was hoping it would be voice transfer so the voice actor would give all the intonation and emotion and the AI would take that and make it sound like the author. Reading text with AI is getting better but yes it’ll be worse for a long time.
    • joshstrange 10 days ago
      I have nearly no desire to have my book read my the author. They are good at writing, and an audiobook is not simply “reading” the words on the page. Maybe something like Descript that the author can use to tweak pronunciation after it’s narrated but I don’t want the author’s voice.

      I would like train a model on Allyson Johnson’s voice (narrated the Honor Harrington books) and then use that to re-narrate the 1-2 books in one of the spinoffs (I think it was in the Saganami Island series?) where they used a different narrator (who was horrible).

      I also might be interested in using it to clean up the Wheel of Time series where, while it’s the same 2 narrators, they change the pronunciation of various names/words book-to-book. “Moghedien” being the one that stands out most. They pronounce it at least 3 different ways:

      * Mo-gid-e-on

      * Mo-ga-dean

      * Mog-a-din

      • kornork 9 days ago
        It's curious, to me at least, why they didn't just go back and fix those themselves later. The early ones were on CD (or tape?), so maybe that's why.
        • joshstrange 9 days ago
          I also wonder that. I'm not an audiobook narrator but if I were I'd need a audio "library" of names/places/etc that I could refer back to before reading a passage with a word I can't remember how to pronounce. The source of that "library" could either be from the author and/or my previous pronunciations. Without that I'd have no idea how I would stay consistent.
    • kornork 9 days ago
      I think I'd prefer to have options for each audiobook. I have favorite narrators, and find others unlistenable. There are also thousands and thousands of books that will never otherwise be turned into audio format unless an AI is used.
    • infoseek12 9 days ago
      Writing and being a voice actor are two quite different skills. My experience with author narrated audiobooks is that there isn’t very much overlap.
    • block_dagger 10 days ago
      Never be as good as human? I disagree, seems like it’ll be nailed, no way to tell from the outside.
    • andrewstuart 9 days ago
      Audiobooks in the authors voice….. fine for non fiction, usually terrible for fiction.
  • peter_d_sherman 10 days ago
    • exe34 10 days ago
      Do you know of any of these that actually works? So far every time I've tried one, it just sounded like a random new voice neither the target (me) nor the source(literally any voice).
      • spmurrayzzz 10 days ago
        ElevenLabs[1] is actually very good, and may have gotten better since the last time I tried it, but its neither free nor is it open source/weights.

        [1] https://elevenlabs.io/

  • tmaly 9 days ago
    I see there are some python notebooks, but it would have been nice to see some example code in the README.
  • Draiken 10 days ago
    Maybe I'm doing something wrong, but I "cloned" my own voice and the outputs were nothing like my voice.

    I thought I was going to hear myself speaking French, but it's definitely not that.

    A bit misleading if none of the examples actually clone a voice when the title is "instant voice cloning"

  • thimkerbell 10 days ago
    How does a society function, with lookalike tools like this around? How would we prepare for it?
    • squigz 10 days ago
      The problem is it's nearly too late to "prepare" for it - the time for that was years ago

      Instead, we've spent that time doing things that are only going to exacerbate the issues - like doubling down on voice as an ID

      More importantly, trust in both public and private institutions and even between citizens has slowly eroded - understandably so, if you ask me, but still - which as another commenter has pointed out is going to be even more important going forward. I imagine rebuilding that trust is going to be even more painful with stuff like this.

      (The only other alternative I can think of is along the lines of cryptographically-signed content from end-to-end, government ID to access the Internet, etc. This would be highly detrimental to a free society, so I hope we don't go this route)

      • joshstrange 10 days ago
        The movie Sneakers sure would get a lot less interesting with voice cloning. And I’d miss out of one of my favorite/most funny scenes in the movie “I would really like to hear you say the word ‘Passport’” [0]. I cannot hear the word “Passport” without thinking of that scene and then I always have to say “My voice is my passport, verify me” in my head to complete the circle.

        [0] https://youtu.be/WdcIqFOc2UE?si=tP2HxVEskl9szuKO

    • courseofaction 10 days ago
      Establish and constantly vett networks of trust? Only the source can be trusted now.
    • j-bos 10 days ago
      I can imagine much as it did in olden days, networks of person to person trust. But with the added possibility of using public private key pairs for verification.
    • oefrha 10 days ago
      Tons of people have already been defrauded on the phone with cloned voices of their family members in recent years. There are a million convincing Trump speeches on YouTube, complete with video, that he never delivered. It’s not something to prepare for, it’s already here.
    • throwthrowuknow 10 days ago
      Don’t take any wooden nickels
  • glandium 10 days ago
    The last example where the generated voice talks in different languages? They don't sound like the same voice at all. I know I don't sound the same when I speak French or Japanese, but the difference in the example is bigger.
  • g4zj 10 days ago
    I would love something like this for musical instruments. To be able to dial in the tone of my favorite artists using AI would be incredible.
    • SoftTalker 10 days ago
      Protools or another DAW will do that easily.
  • andsoitis 10 days ago
    How is it cloning when it doesn't preserve accent? Seems like overselling.
  • teeray 10 days ago
    “Introducing VoicePrint—our new way to secure your account! Speak with a representative to enable this feature today!”
  • deputy 9 days ago
    [deleted]
  • alokjnv10 10 days ago
    [flagged]
  • alokjnv10 10 days ago
    [flagged]
    • chrra 10 days ago
      There's install instruction under “How to Use” and example code in the .ipynb files. It's mostly geared towards TTS, but it seems to work with voice recordings too.
  • cryptonector 9 days ago
    [flagged]
  • youngNed 10 days ago
    [flagged]
  • bparsons 10 days ago
    Your scientists were so preoccupied with whether or not they could, they didn't stop to think if they should.
  • vouaobrasil 10 days ago
    We've got a pretty sick society. We've spent so much money developing this technology to clone voices when we've got 8 billion voices on the planet today. We've developed 4K and even 8K video when we've already got beautiful scenes outside -- oh wait, we ruined those already with technology and roads and cars. We've got the smell of beautiful flowers and yet we have an entire chemical industry trying to replicate more artificial smells.

    We're sick and AI is the apex of that pathology. Can you imagine how nice life would be if we just devoted all this energy into providing for basic needs and focusing on social problems instead of technological ones? We're making the world worse with this shit.

    • echelon 10 days ago
      Your way of thinking is "sick", some kind glass-is-empty malaise, and you need to find a cure [1].

      If you think art is perfect the way it is, you're not dreaming.

      I want a thousand science fiction epics. The world only has Star Wars crap and bad Dune adaptations.

      I want to take what's in my mind and put it to audio-visual, and play it for myself and friends.

      I don't want the tech and art of today. I was born too early. I want what's next.

      I have so many dreams, and for just once, I'd like them to start coming true.

      This tech is a first glimpse over that hill.

      [1] (Sorry, I'm not attacking you. I just wanted to use your language to grab attention. I believe this is too negative an outlook for something that will have incredible use cases.)

      • relaxing 10 days ago
        There’s so much more SF out there, even in the space opera subgenre, than Star Wars and Dune. If you don’t know how to navigate that, there’s no way you’ll produce anything of worth.

        Start reading.

        • Der_Einzige 9 days ago
          The original dune movie was excellent and I will die on that hill. It accurately portrays the absurdity of the books.

          You’d give the same complaints to anyone who bothered to try to film Charlie and the great glass elevator.

      • CyberDildonics 10 days ago
        If you think star wars and dune are crap, what makes you think anyone will want to watch your home made AI science fiction?

        I have so many dreams, and for just once, I'd like them to start coming true.

        Everyone wants a genie they can wish on, but they don't exist.

        • echelon 9 days ago
          > Everyone wants a genie they can wish on, but they don't exist.

          The means to unpack your mind are about to come into play in full force, CyberDildonics.

          You and I keep debating this [1], but everything I'm saying is coming to pass.

          So we can keep score:

          - The tools to make movies in AI will surpass classical tooling and the majority of linear video content that people willingly consume will be powered by AI rather than "photons-on-glass". Most creators will be served by the new Gen AI tools, and though we may still have Wes Anderson shooting in the classical style on set, most action, adventure, drama, etc. films and shows will be produced with Gen AI pipelines.

          - Most porn in the future will be AI. Porn movies, porn images, etc. Pornhub better start incorporating AI, or they'll get mowed over. Real pornographic actors and actresses will be rare relative to the "choose whatever you want" future of Gen AI.

          I enjoy how differently we see the world, and I'm always up for adding more predictions into this mix. Perhaps these:

          - In the future, most music will be made with AI tools. There will be famous musicians that only use AI tooling.

          [1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

          • CyberDildonics 9 days ago
            You didn't answer my question, you just made predictions of the future without any rationale or evidence behind them.