Fakelish – Fake English word generator

(fakelish.nwtgck.org)

110 points | by lioeters 843 days ago

26 comments

  • mrbukkake 843 days ago
    Nice idea, naive implementation which leads to the output being unconvincing as hypothetical English words. I had a brief look and it seems to be proportionally selecting and sticking together sequences of letters sampled from English words (lib/word-probability.ts). This doesn't take into account syllable boundaries, the way the English spelling system maps between phones/phonemes and the phonotactic properties of English which is why the output looks unconvincing.

    A better approach would be to use a markov chain built from sampling English text letter-by letter... an even better approach would be to build your stats from some source of English words in IPA transcription with syllable boundaries etc marked, then map from IPA to spelling via some kind of lookup table. We use a similar process in reverse in my research group for building datasets for doing Bayesian phylogenies of language families

    • KennyBlanken 843 days ago
      Clearly you are far more of a linguist than I am, but from such a perspective, I had a similar impression; I reloaded the page several times and none of the words struck me as being remotely plausibly English. These are worse than most Hollywood scifi words/names.
    • rlayton2 843 days ago
      A significant improvement on letter-by-letter, but not that much harder, is to use n-grams: "two letters to predict the third" etc. Still not "industry grade", but the results start making more sense.
    • bruce343434 843 days ago
      A letter-by-letter markov chain would lead to similar unconvincing results. As you said, vocal groups matter much more than single letters. If you know anything about korean, they actually group letters into characters that way. If one could build such a markov chain for English it would be very convincing I think.
      • mrbukkake 843 days ago
        You're right, I forgot that markov chains are memoryless
        • dminor 843 days ago
          I used a letter by letter Markov chain for this: http://password.supply/

          The output is definitely not convincing as actual words (but reasonable for somewhat more memorable passwords).

    • rajansaini 843 days ago
      You should check out the VOLT paper, I think it would work well. It's a new technique for splitting up a vocabulary into subwords while minimizing entropy. These subwords could then be mixed and matched, maybe by a neural model, for better results.
    • themdonuts 843 days ago
      I got "minable" on my first try and found it impressive and surprised that it wasn't a word. After 3 other reloads nothing else came up.
      • tw04 843 days ago
        Definitely not a fake word. Coal, for instance, is a minable resource.

        https://www.dictionary.com/browse/minable

      • phs318u 843 days ago
        Similarly, ”shitbin” was the second word on my first try, and I had to internet search to convince myself that it isn’t in fact a word.
      • thaumasiotes 843 days ago
        It definitely is a word, since "mine" is an existing verb.
      • Wistar 843 days ago
        I got "episexic" and, well, I kind of like that one.
  • SavantIdiot 843 days ago
    Speaking of gibberish english: I know this has been on YouTube for 10 years, but there are always newcomers who haven't had their brain melted by it:

    https://www.youtube.com/watch?v=-VsmF9m_Nt8

    • BrandoElFollito 843 days ago
      For a non-native speaker of English - this sounds like lots of songs.

      Tangentially related - this is how I discovered Nightwish some 15 years ago: https://www.youtube.com/watch?v=gg5_mlQOsUQ

      • speedcoder 843 days ago
        Nobody could make up words like Frankie Smith (may he RIP 2019) in the middle of Double Dutch Bus https://youtu.be/fK9hK82r-AM
        • genewitch 843 days ago
          The sound engineer on the loveline show had Dr Drew Pinsky trying to sing this song as an evergreen.
      • mPReDiToR 843 days ago
        Thank you.

        I know this comment doesn't add anything of value to the discussion per se, but that's given me the biggest laugh I've had in months.

        Nightwish came into my life in the 00s, and I couldn't tell you one song meaning, yet I love the sound.

        This is just a perfect video, thank you for sharing.

        • BrandoElFollito 841 days ago
          Unfortunately, despite now knowing the words, the song will never sound right anymore :)

          Great band, though.

    • dustintrex 843 days ago
      Modern version: https://youtu.be/ybcvlxivscw

      "English" starts at around 0:48, but the others are also worth a listen!

    • LordDragonfang 843 days ago
      Here's another similar one, but acted prose instead of a song:

      https://www.youtube.com/watch?v=Vt4Dfa4fOEY

    • formerly_proven 843 days ago
      This is what a parse error feels like.
    • avgcorrection 843 days ago
      My brain isn’t melted. This could just be some obscure Dutch dialect for all I know.
      • SavantIdiot 843 days ago
        I'm sure some people don't hear it, like "the dress", but for some of us it sounds like an Uncanny Valley of English: close but not quite, just enough for our brains to trip over / struggle to comprehend b/c it is so close.
        • avgcorrection 838 days ago
          I wonder how much those exposure those creeped-out people have had to other Germanic languages.
  • scubbo 843 days ago
    As well as the associations with [1], this also made me think of one of my favourite essays, "Horsehistory study and the automated discovery of new areas of thought"[2]

    [1] https://www.thisworddoesnotexist.com/ [2] https://interconnected.org/home/2021/06/16/horsehistory

  • gumby 843 days ago
    <obligatory>

    Jabberwocky

    ’Twas brillig, and the slithy toves

          Did gyre and gimble in the wabe:
    
    All mimsy were the borogoves,

          And the mome raths outgrabe.
    
    “Beware the Jabberwock, my son!

          The jaws that bite, the claws that catch!
    
    Beware the Jubjub bird, and shun

          The frumious Bandersnatch!”
    
    He took his vorpal sword in hand;

          Long time the manxome foe he sought—
    
    So rested he by the Tumtum tree

          And stood awhile in thought.
    
    And, as in uffish thought he stood,

          The Jabberwock, with eyes of flame,
    
    Came whiffling through the tulgey wood,

          And burbled as it came!
    
    One, two! One, two! And through and through

          The vorpal blade went snicker-snack!
    
    He left it dead, and with its head

          He went galumphing back.
    
    “And hast thou slain the Jabberwock?

          Come to my arms, my beamish boy!
    
    O frabjous day! Callooh! Callay!”

          He chortled in his joy.
    
    ’Twas brillig, and the slithy toves

          Did gyre and gimble in the wabe:
    
    All mimsy were the borogoves,

          And the mome raths outgrabe.
    
    </obligatory>
    • inglor_cz 843 days ago
      I am aware about two translations of this poem into Czech. They are completely different from each other and both very playful.
      • mPReDiToR 842 days ago
        Have you seen the ActionScript version?

        Many years of /. posts and other results might find you a version that's readable if you search.

      • gumby 842 days ago
        I love this.
  • nkrisc 843 days ago
    Sorry, after a few refreshes not a single word was anything that looked remotely like English. It all looked like complete gibberish or words in another language. Most of them weren’t even pronounceable.
    • LordDragonfang 843 days ago
      On my first load, I got "Plailmly", which uses a sequence of consonants that I'm reasonably certain occurs nowhere in the English language.
      • nkrisc 843 days ago
        I think ailml is the offending sequence here. It's pretty difficult to say and doesn't sound like something that you'd find in a native English word.

        There's calmly which is similar, to be fair, but there's something about the tongue positions for ailml that I find noticeably more difficult, it's too far forward.

      • lokl 843 days ago
        Not nowhere, but uncommon: calmly, filmlike, ...
        • thaumasiotes 843 days ago
          Try for -ailm-.
          • Kaibeezy 843 days ago
            Ailment
            • thaumasiotes 843 days ago
              As with flailmen, you've put a syllable break (and a morpheme break!) between the L and the M. This will make continuing the sequence into -ailml- impossible, since an English syllable can't start with ml-.

              Interestingly, there's nothing wrong in general with starting a syllable with ml-; it's fundamentally the same mouth motion as starting with bl- or pl-, both of which are common in English. But ml- isn't allowed.

              This plays into a pet observation of mine, which is that an underappreciated constraint on the space of words that actually exist in a language -- as opposed to the space of words that could conceivably exist -- is that by and large they must descend from older words in an older form of the language, so that even if a word like "plailm" obeys the rules for modern English syllables, it can't exist because its precursor word would have violated the rules for older English sounds. (I don't know if this is actually true as applied to "plailm", but the phenomenon (of possible sounds failing to exist due to their precursors having been impossible) is real.)

              • Kaibeezy 843 days ago
                Very good, mlord / mlady ;)
                • thaumasiotes 843 days ago
                  Milord and milady do not involve syllables starting with ml-. They involve a reduced vowel coming between the /m/ and the /l/, making milord two syllables and milady three. They also aren't spelled "mlord" or "mlady"; your options are "milord", "milady", "m'lord", or "m'lady".
                  • Kaibeezy 842 days ago
                    Yes. That is why I used the word “;)” at the end there. And, yes, I know ;) is not a word.

                    I’ve been splained that one of the reasons “humor” sometimes doesn’t play well on HN is that people here have such a wide diversity of English grok. I didn’t anticipate someone could have too much knowledge, but, huzzah, there 'tis, one small step f’r ’man, and so forth.

                    • thaumasiotes 842 days ago
                      In that case, you might wish to know that putting a smile at the end of a comment like that is also a common way of calling the person you're talking to stupid.
                      • Kaibeezy 842 days ago
                        Welp, it was a wink, not a smile. The intention was good-natured. Just havin' fun on the internet with my new pal, Thaumasiotes, who is plainly the only other person among the swarming billions who found this tiny quirk of language worth blethering about with me. I hear you, mostly people think this sort of mishegas is nutballs. Their loss.

                        τί οὖν τίμιον; τὸ κροτεῖσθαι; οὐχί. οὐκοῦν οὐδὲ τὸ ὑπὸ γλωσσῶν κροτεῖσθαι: αἱ γὰρ παρὰ τῶν πολλῶν εὐφημίαι κρότος γλωσσῶν.

          • genewitch 843 days ago
            Flailmen
            • robbedpeter 843 days ago
              Flailmen: Awkward males, made uncomfortable and rendered incoherent by the close proximity of a romantic interest. Also, medieval warriors wielding flails.
            • Kaibeezy 842 days ago
              Also mailmen, you know, male posties.
    • clavicat 843 days ago
      Runinal Worriably Homenite

      I like these, especially the last.

  • foobarbecue 843 days ago
    Down due to rate limiting so I can't look at it, but sounds similar to the fantastic https://www.thisworddoesnotexist.com/
  • quercusa 843 days ago
    The first word I got was 'scrotal', which is a real word.
    • jstx1 843 days ago
      After a few refreshes I got 'sundial'.
      • echelon 843 days ago
        Should probably do a final pass filter against an English word dictionary.
        • Terretta 843 days ago
          The github example contains “trident” so figure author knows.
  • annetipasto 843 days ago
    Can anyone tell me more about how this works? Most of these don't resemble English words at all to me lol, wondering what the generative procedure/parameters are in the first place
    • jaclaz 843 days ago
      I find much more interesting:

      http://www.thisworddoesnotexist.com/

      as it also fakes the definition.

      But if you want to write some Vogon like poetry, the words generated by Fakelish might be just fine.

      • newsbinator 843 days ago
        dynoderma

        dyn·o·derma

        a slender, membranous musclelike structure, believed to represent a cross between a cranium and the external spaces of fish and invertebrates, supporting the glans in most vertebrates

        "a dynoderma is thought to have existed in all living organisms"

    • dharmaturtle 843 days ago
      https://raw.githubusercontent.com/nwtgck/fakelish-npm/develo...

      Basically a big probability map. I'm guessing this was machine generated though, and it isn't clear to me how that was done.

  • a9h74j 843 days ago
    The following is the text of a recent HN comment (not my own) on the subject of non-drug highs. As it suggests starting with a nonsense word, the OP fake word generator ought to suit:

    > It is really quite easy: have someone you know provide a nonsense word. In needs to have no logical sense or connections to to anything - pure nonsense. Then, with that phrase held in your most present and loudest inner voice you repeat that phrase in your head. Repeat it over an over, forcefully to drive any other thoughts or thought fragments out of your mental conversation(s) (at all mental conversation levels, if you have more than one going at once). After a few minutes of forceful repeating, it echoes on it's own, and a few realization moments later 20-30 minutes have passed and it feels like waking from a refreshing dream. When in the "state", it really can't be described because it is whatever your imagination and recent experiences feedback froth back and forth. It's relaxing and refreshing, and a great way to clear one's head when working on difficult complex mental goals.

  • 4ensic 843 days ago
    Quite a few cromulent words, but far from perfect.
    • kaczordon 843 days ago
      I see what you did there
  • aendruk 843 days ago
    Aimlessly flying though Dasher can create some pretty plausible new words. It’s worth playing around with if you haven’t seen it. It’s in most Linux package managers.

    https://www.inference.org.uk/dasher/

  • alanlammiman 843 days ago
    I got Donsize. It's when the family handles the layoffs
  • hyperbovine 843 days ago
    I recently started playing the NYT Spelling Bee game. There you find yourself wishfully inventing a lot of plausibly English-sounding words, only to learn that indeed, (e.g.) "vilicent" is not a part of the language. IMO the quality of these words is low compared to what a human being comes up with.
  • SkipperCat 843 days ago
    So many of the generated names sound like pharmaceutical brands...

    Also, if anyone is playing NYTime's spelling bee game, you've probably become pretty familiar with common english three/four letter combos and then iterate/manipulate them to find words. It all about the patterns!

  • shoto_io 843 days ago
  • delgaudm 843 days ago
    These read just as plausibly as "Transient companies selling low quality imported products on Amazon." If perhaps a bit too easily pronounced in English.
  • jnellis 843 days ago
    I've seen most of these drugs advertised on television.
    • labster 843 days ago
      Came here to say this.

      For life’s more persistent problems: Ask your doctor about Subrixate today!

  • deegles 843 days ago
    This or pronounceable password generators are great for making usernames for random sites. Sometimes you can even get the .com for them! (if you’re into that)
  • surfingdino 843 days ago
    Coming soon to a Teams meeting in front of you ;-) Amazing!
  • Zenst 843 days ago
    Portmanteau's are absobloddylutely fun. Though a bit cruel upon those learning the language.
  • kottaram 843 days ago
    I'll be honest... the words dont look convincing XD
  • trynumber9 843 days ago
    Strange, most the words I saw looked Greek or Latin
    • dbavaria 843 days ago
      As an American assumed it was more like British English.
  • andrew_ 843 days ago
    I want to register all of these as domain names.
  • tony-allan 843 days ago
    This website has been temporarily rate limited
  • jcmontx 843 days ago
    The website got hackernewsed
  • Orionos 843 days ago
    Markov's chain?