Pink Trombone

(dood.al)

575 points | by errozero 1920 days ago

24 comments

modeless 1920 days ago
In a similar vein: http://www.adultswim.com/etcetera/choir/
The most interesting thing about this one is the chord progressions it generates.
[-]
- riebschlager 1920 days ago
  If you enjoyed that, the developer, David Li has a lot of other great projects: http://david.li/
  [-]
  - mhroth 1920 days ago
    The same team did a squishy Morty head for Adultswim. http://www.adultswim.com/etcetera/elastic-man/. Chris Heinrichs (Dolphin Club Audio formerly of Enzien Audio, https://dolphinclub.website/) does the procedural audio models.
- jdietrich 1920 days ago
  In a historical vein, the absolutely magnificent Voder, from the late 1930s:
  https://youtu.be/TsdOej_nC1M?t=16
  https://en.wikipedia.org/wiki/Voder
  [-]
  - taneq 1920 days ago
    I found it difficult to tell how well it's actually speaking because the announcer is priming the audience for every utterance. Very cool though!
    [-]
    - krolley 1920 days ago
      After your comment, I muted the audio when the announcer told the audience what the computer would say. I couldn't get it straight away, but when they emphasised different words in the sentence I could usually tell what was being said "Greetings everybody!".
- peterlk 1920 days ago
  If you're interested in chord generation, I've used Cthulu[0] by Xfer records to experiment with chord progressions quickly.
  [0] https://xferrecords.com/products/cthulhu
  [-]
  - ryan-allen 1920 days ago
    Cubase Chord Pads are also pretty sweet!
  - Cthulhu_ 1920 days ago
    Praise Him.
- alanh 1920 days ago
  I didn't expect that to be so beautiful.
- pieterk 1920 days ago
  You’re right, the progressions are beautiful! Do you know how they are generated?
  [-]
  - veli_joza 1920 days ago
    Very impressive indeed. I dug around and found his tweet: "uses a neural network I trained on choral pieces to generate the harmonization for your melody"
    [0] https://twitter.com/daviddotli/status/1075068713936830464
    [-]
    - quacked 1920 days ago
      Man, shoot. I compose a bit myself and am really into vocal harmony and I was hoping that he'd hard-coded some theory rules.
      I did notice that it seems to vary with the past 3 notes you've moved to. If you continue repeating the same pattern of jumps among the scale, you can replicate the chords the voices slide to.
      [-]
      - abrichr 1920 days ago
        > I compose a bit myself and am really into vocal harmony and I was hoping that he'd hard-coded some theory rules.
        Would you find this useful? Is this not already available somewhere? Sounds like a fun project!
        [-]
        quacked 1919 days ago
        Yeah, I think I'd find it useful- I'd love to be able to ape a few of the tricks that the four voices use to slide to pleasing chords. The music theory "rules" are certainly available, but only in the same sense that the rules of calculus are available somewhere.
        Either way I agree, definitely a fun project.
- castis 1920 days ago
  that is incredibly satisfying to play with
sgentle 1920 days ago
A little while back I used this combined with a physics simulator to make a toy where you throw polygons in the air and they scream: https://ohgodwhathaveidone.stackblitz.io/
Code's here if anyone wants to play: https://stackblitz.com/edit/ohgodwhathaveidone – I did a fairly medium job of abstracting the synthesis engine away from the UI, but it might be a decent starting point if you're looking to make other Trombone-based web silliness.
lqet 1920 days ago
This reminds me of the Sprechmaschine [1] ("speaking machine") built at the end of the 18th century by Wolfgang von Kempelen (the guy who build the original mechanical turk [2]). Here is a YouTube video showing it in action (for example, the machine says "Mama" around 1:14): https://www.youtube.com/watch?v=k_YUB_S6Gpo
[1] https://de.wikipedia.org/wiki/Wolfgang_von_Kempelen#Die_Spre...
[2] https://en.wikipedia.org/wiki/The_Turk
[-]
- tech-no-logical 1920 days ago
  also, the 1939 voder comes to mind :
  https://en.wikipedia.org/wiki/Voder
  https://www.youtube.com/watch?v=0rAyrmm7vv0
speps 1920 days ago
Literate C version: https://pbat.ch/proj/voc/
Literate depot: https://github.com/PaulBatchelor/voc
Actually compiled source is there: https://github.com/PaulBatchelor/Soundpipe/blob/master/modul...
goodmachine 1920 days ago
Lest we forget that speech synthesis is not just for grotesque but amusing semi-real vocal synths like this, here's a BBC Radio 4 history of speech synthesis as an assistive technology - Klatt's Last Tapes, by Stephen Hawking's daughter, Lucy:
https://www.youtube.com/watch?v=097K1uMIPyQ
jedberg 1920 days ago
Man this brings back memories. We used this program for my linguistics homework in college. In 1996. Although I think it was an app then not a web page.
[-]
- porphyrogene 1920 days ago
  Referring to something from 1996 as an app and a modern web app as a page is a bizarre conflation of terminology.
  [-]
  - tempestn 1920 days ago
    I'm guessing the thing from '96 was a Java applet, so indeed an app.
    [-]
    - djsumdog 1920 days ago
      or a desktop/windows app. Back then we'd be more likely to call them programs.
      [-]
      - codetrotter 1920 days ago
        And also sometimes referred to as applications. And while the abbreviation “app” only became mainstream around the time of the first iPhone, the warez scene had a habit of referring to software applications as “appz” since way before then, which admittedly is not the exact same word as “app”, appz being an intentional misspelling of a pluralization of the abbreviation, but it’s close to it. They sure loved the letter z. Appz, crackz, mp3z, moviez, gamez, ebookz.
        This way of writing gave people some unique words to query search engines for, making it easier to find warez sites, torrent trackers and ftp servers hosting pirated files but I think it orginated long before the web was born — even before any sort of networked computing existed.
        Consider the word “phreaking” which was invented at the time of mechanical telephone switching systems. This word came from combining the words “phone” and “freaking”. I think that word could have inspired hackers to use use ph in place of f in other words, and then once you start making that substitution, other substitutions follow, like using z instead of s.
        I dunno, I grew up in the 90s so there is a lot of hacker culture that precedes my time. What I do know is that a lot of early hacker culture spawned other subcultures, and that several influences of the origins remain central in these. For example, the demoscene.
        [-]
        baddash 1920 days ago
        How were you involved in the warez scene? How did things work, and how have they changed? Stories that are part of the history of the internet are always intriguing to me so I'm really interested in hearing about that.
        borski 1920 days ago
        To be fair, we also said progz.
  - wavefunction 1920 days ago
    We had applications back in the 1990s. I know, it seems unfathomable now but applications preceded the App Store.
  - tirpen 1920 days ago
    Could you explain in what sense?
emmelaich 1920 days ago
This would be useful to demonstrate the difference between p/f and l/r for those brought up without those distinctions.
I'd also (as an English speaker) like to see/hear Dutch g and Xhosan clicks.
[-]
- undershirt 1920 days ago
  yes, I think it's time for this to be connected to text input for seeing how any word is pronounced.
  specifically: https://twitter.com/shaunlebron/status/989192507828432896
  [-]
  - emmelaich 1920 days ago
    Perfect.
smlacy 1920 days ago
Would it be possible to use Reinforcement Learning + Speech Recognition to turn this thing into a real voice synthesizer?
[-]
- hannasanarion 1920 days ago
  No need. This thing is already a voice synthesizer. This is how modern synthesizers work, more or less: by generating a sine wave and then modifying it in the same way as the vocal tract does.
  [-]
  - dreamcompiler 1920 days ago
    Not a sine wave. The vocal tract is subtractive; the larynx has to generate a waveform with lots of harmonics some of which the vocal tract can then remove.
  - smlacy 1920 days ago
    I meant "text to speech engine" then I guess.
    [-]
    - hannasanarion 1920 days ago
      Yeah, I know what you meant. You use a tool like the CMU pronunciation dictionary[1] to turn words into phonemes, and then you use a model similar to the pink trombone to turn the phoneme string into sound, including the transitions between different phones (which, it turns out, actually matter more than the phones themselves for making it understandable). This is how TTS works.
      1 http://www.speech.cs.cmu.edu/cgi-bin/cmudict
      [-]
      - smlacy 1920 days ago
        Great man-splaining of how synthesis works.
        Pink Trombone has no phoneme mapping, you'd have to code all that by hand, and that's exactly what my original question was.
- jterwill 1920 days ago
  I wonder if this sort of model would lend itself well to program induction similar to this paper: https://web.mit.edu/cocosci/Papers/Science-2015-Lake-1332-8.... Having a mouth seems like it would enforce a strong inductive bias.
zackmorris 1920 days ago
Reminds me of Xiph's Speex/CELP model of speech as a mix of noise and frequency to achieve high compression, requiring as little as 2.15 kilobits (275 bytes) per second. It sounds perceptibly similar to the original recording, even though the difference between the input and output sampled data may be high:
https://www.speex.org/docs/manual/speex-manual/node9.html
Bitrate comparison:
https://www.speex.org/comparison/
Samples:
https://www.speex.org/samples/
Maybe higher compression can be achieved with better prediction, aka machine learning.
colanderman 1920 days ago
I've actually been looking for the opposite of this (i.e. sound in, mouth representation out) for a while. Does anyone know of such a thing?
[-]
- ssewell 1920 days ago
  Yes! Oculus makes an SDK for this. You can use it in Unity 3D, Unreal, or directly in a native app. https://developer.oculus.com/documentation/audiosdk/latest/c...
  [-]
  - colanderman 1920 days ago
    Thanks! That's like 80% of the way there. It looks to be missing a lot of state internal to the mouth (understandable given that it's targeting avatar lipsyncing), and appears to discretize the values somewhat, making it less useful for linguistics practice. But I bet the underlying technology could be adapted easily.
t0mbstone 1920 days ago
Wow! I was able to successfully recreate all sorts of letters and sounds just by imagining how my own mouth works, and then manipulating the different components on the pink trombone in the same way. I'm impressed!
smlacy 1920 days ago
Pondering what makes this sound "male".
[-]
- mobilejdral 1920 days ago
  I have actually used this very tool back in the day to help learn how to speak in a male or female voice. One of the five things I do is manipulation of my tongue to change the cavity of my mouth to make the space bigger (more masculine) or smaller (more feminine) which this tool demonstrates very well.
  Edit: to be clear I used to sound male 24/7 and now I sound female 24/7 Rather than thinking you are speaking male or female it helps if you think you are playing a musical instrument with a number of controls that you control (with your mind whahahaha). Then it is just about learning what each control does and how to play them so you get the result you want.
  Your voice is muscle memory so while at the start I had to actively "play" a female voice that is no longer the case and now if I ever want to "play" a male voice I have to actively think about how I am going to speak each word to make it male.
  [-]
  - evincarofautumn 1919 days ago
    This is also exactly how male countertenor singers produce a female-sounding voice, by making the vocal cavity smaller to adjust the formants upward. If you don’t do that, it just sounds like a male head voice or falsetto.
- tlb 1920 days ago
  You can make it sound female by dragging the voicebox control to the right and down. It requires both higher pitch, and less power in the odd harmonics.
  [-]
  - smlacy 1920 days ago
    I agree it gets closer, but as is with much of speech synthesis, it still to me ends up sounding like "a male talking in a high-pitched voice" and not "a female".
    In addition, this would imply that only males can talk in a low register, which is patently false. Low register female voices are fairly common.
    [-]
    - raverbashing 1920 days ago
      It's a bit of a subtle distinction. A lot of voice acting on cartoon boys is done by women (classic example: Bart Simpson)
      So they can make it sound more masculine but still high pitched
- nearbuy 1920 days ago
  It has to do with formants. It's briefly explained in this video, which also demonstrates changing a female voice to male: https://youtu.be/nPAINeIGxMc.
- crb002 1920 days ago
  Vocal chord.
triclops200 1920 days ago
Sudden sound warning.
[-]
- 3chelon 1920 days ago
  Sudden disconcerting sound warning!
willchang 1920 days ago
This is amazing. I can get it to make almost any speech sound, but one I can't get is [s], because the model lacks teeth!
[-]
- slx26 1920 days ago
  disable "always voice", and click a bit below hard palate, slightly to the right towards the lip (below the at in palate), so there will only be a small gap for the air to go through
  [-]
  - willchang 1920 days ago
    You're right. It sounds to me like [s] with lip rounding, where the teeth don't contribute as much acoustically, but it does indeed sound like an [s].
- phamilton 1920 days ago
  The model also lacks mouth width/lip shape, which is crucial for differentiating between Swedish vowels (for example i vs y, which have most of the mouth the same except for y has the lips protruding and i is more of a smile).
- SamBam 1920 days ago
  Were you able to get "m" and "n"?
  [-]
  - willchang 1920 days ago
    Yes, just click in the nasal cavity above 'lip' and 'hard palate', respectively, to get [m] and [n].
BFatts 1920 days ago
I feel like I am simulating orgasmic responses!
jarmitage 1920 days ago
This was also ported to C++ inside a modular synth:
https://www.youtube.com/watch?v=PDn7ygnJUfI
https://www.youtube.com/watch?v=3jcqKnIa8T4
https://www.youtube.com/watch?v=bo5ZEgBEapk
https://github.com/giuliomoro/pink-trombone
sonnyblarney 1920 days ago
What someone needs to do is put sensors in people's mouths, record them saying known phrases, then stick the sensor data+phrases into some AI and see if we can't get that Trombone talkin'!
glitcher 1920 days ago
Previous post in Apr 2017: https://news.ycombinator.com/item?id=14135658
philsnow 1920 days ago
The shape of the tongue control reminds me a lot of the rhombus in https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...
... which is because the latter was patterned after the shape of the mouth.
ben-hudson 1920 days ago
I feel embarrassed playing with this
[-]
- King-Aaron 1920 days ago
  You should have seen the looks from my coworkers when my computer started shouting AAAAAHHHHHHH
mmjaa 1920 days ago
Thats nice and everything but until I have a hardware version where I can just switch a button on and do this and that, I'm not really going to be truly satisfied. Please hardware'ify.
joshu 1920 days ago
how do i use this to answer the phone?
Digit-Al 1920 days ago
Is the author aware of the other meaning of "pink trombone"?[1]
[1] https://www.urbandictionary.com/define.php?term=pink%20tromb...
[-]
- mmjaa 1920 days ago
  I'm fairly sure there is no other meaning than the one to which you refer, so its perhaps more likely that you are missing the risqué point being made by assuming the prude levels are higher than you might think. A lot of the folks who make these kinds of hacks, are perfectly fine with the obscure, obscene, perverse nature of their naming of things ... Those of the anthropological inclination may decide that, in fact, an obscene name for something like this is a requisite.
  [-]
  - RandomGuyDTB 1920 days ago
    Like how in jargon.txt they used 69 as an example of a big number:
    "69 adj. Large quantity. Usage: Exclusive to MIT-AI. "Go away, I have 69 things to do to DDT before worrying about fixing the bug in the phase of the moon output routine..." (Note: Actually, any number less than 100 but large enough to have no obvious magic properties will be recognized as a "large number". There is no denying that "69" is the local favorite. I don't know whether its origins are related to the obscene interpretation, but I do know that 69 decimal = 105 octal, and 69 hexadecimal = 105 decimal, which is a nice property. - GLS)"
- plants 1920 days ago
  I was scared to click on the parent link at work because I KNEW this term had to have a second "urban dictionary" meaning...
- Karlax 1920 days ago
  For some reason, dirty names are common practice in audio tools. Some examples: "Rectal Anarchy" is another vocal synth for Buzz, grANALizer (emphasis not mine) is a popular granular audio effect, and even in the commercial world, it gets more subtle but it still exists: Image Line sells a plugin called "Gross Beat", which is a bilingual en/fr dirty joke.
- oarabbus_ 1920 days ago
  How anglocentric
  [-]
  - coldtea 1920 days ago
    The words "pink" and "trombone" are english words, so there's that...