OpenGPT-2: We Replicated GPT-2 Because You Can Too

(medium.com)

201 points | by programd 1708 days ago

11 comments

  • steve19 1707 days ago
    It sounds like all the drama OpenAI made about not releasing the model was all just marketing. $50,000 is nothing for a nation-state or even just a motivated third party. I had always assumed OpenAI had spent well into the 6 or even 7 figures to train the full model.

    MSFT has sort of invested $1 billion into OpenAI so I guess it worked!

    • gradys 1707 days ago
      My understanding of their rationale for the disclosure policy is that while this model might not necessarily be so incredibly dangerous,

      1) There was a non-zero chance that their releasing it would do more harm than good by making things like fake news, spam, and sock puppeting cheaper and more scalable, and

      2) It's very likely that eventually they or others will develop more unambiguously dangerous models, and it's valuable to begin experimenting with responsible disclosure policies now so we're better prepared.

      Perhaps some here would disagree with the premise of (2), that it's possible for a model to be dangerous. I can understand that.

      Working in the field, and having been exposed to many of the same arguments that are likely informing OpenAI's concern, I think it's valid. Valid enough at least that I'm prepared to believe that OpenAI is acting in good faith, and that the disclosure policy is not "just marketing".

    • skrebbel 1707 days ago
      I have a beef with this. OpenAI is run by people who believe and spread horror stories about how AI can totally change society, for better or worse.

      A lot of what AI can and cannot do depends on cost and computational power as much as anything else. If I understand correctly, there's a whole bunch of "you could but it's prohibitively expensive" stuff going around.

      I'm convinced that the people behind OpenAI understand this nuance, even if most non geeks struggle with the difference between possible and feasible. This means that OpenAI mixed the two up on purpose.

      If the people warning us of the robot apocalypse can't be trusted to communicate openly and honestly about odds, then what do we do? They clearly have no qualms about misinforming the public to serve a hidden agenda. It might be pretty harmless in this case, but it's indicative of a cultural pattern.

      Basically I just hope (and kind of believe) that we'll never achieve AGI because that would make all this talk unimportant.

      • Veedrac 1707 days ago
        > I'm convinced that the people behind OpenAI understand this nuance, even if most non geeks struggle with the difference between possible and feasible. This means that OpenAI mixed the two up on purpose.

        You should probably actually read what OpenAI have to say before making claims about what they have to say.

        “We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems.”

        https://openai.com/blog/better-language-models/

        • Bartweiss 1707 days ago
          It's definitely worth reading OpenAI's statement, but I don't think speculation is out of bounds here either. The statement is reasonable on its own, but doesn't acknowledge that the decision is a sharp break from OpenAI's entire founding ideology.

          Musk said in 2015 that the point of OpenAI was "to empower as many people as possible to have AI. If everyone has AI powers, then there's not any one person or a small set of individuals who can have AI superpower." Sikka argued that "openness" was the fundamental reason he supported the project. OpenAI was heavily criticized by other AI risk researchers for its public approach, and argued strongly that it was doing the right thing. It specifically invoked the threat of AI tools being abused by small groups of people in secret as more pressing than the threats from self-directed or public-use AI.

          But the closest the statement comes to addressing that is describing the restrictions as "an experiment", and mentioning that in 2018 OpenAI added a charter with a caveat saying it might publish less stuff. That caveat and the decision to invoke it here are essentially unexplained; since the charter almost certainly postdates some GPT-2 work it seems to be a circular argument.

          Frankly, GPT-2 seems like exactly the sort of work OpenAI was worried about in the first place. Ten thousand people can troll and spam ideas without any AI support, but with a GPT-2 equivalent it becomes possible to a government, company, or intelligence group to massively scale its impact on public debate. If OpenAI thinks state actors can't replicate their work, the fears invoked at its founding are called into question. If it instead thinks public access to such tools is dangerous and they need to be kept in select hands, the project's entire rationale is invalidated. If they simply hope to spark discussion and give people time to prepare (mentally or technologically) for the idea of compelling auto-generated text, that really ought to be said more clearly.

          • Veedrac 1707 days ago
            Elon Musk isn't part of OpenAI any more. I don't think his stance should hold that much weight any more.

            > Sikka argued that "openness" was the fundamental reason he supported the project.

            I believe the relevant quote is this one

            > Sam asked me if I would be ok with the fact that such an endeavor would be untethered and would produce results generally in the greater interests of humanity, and he was somewhat surprised by my reaction, that indeed I would only support this venture if such an openness was a fundamental requirement!

            https://web.archive.org/web/20151222094518/http://www.infosy...

            I'll leave this open to interpretation; I think there are multiple ways of taking it and I'm not convinced which are accurate.

            > OpenAI was heavily criticized by other AI risk researchers for its public approach, and argued strongly that it was doing the right thing. It specifically invoked the threat of AI tools being abused by small groups of people in secret as more pressing than the threats from self-directed or public-use AI.

            I'm not sure who you are referring to here. The only critics I've heard claiming OpenAI are too open are those who believe in Bostrom-style AGI risk, which is the idea that (far-term) AI is intrinsically dangerous, rather than being dangerous predominantly because of malicious use.

            > If OpenAI thinks state actors can't replicate their work,

            I don't believe OpenAI believes this.

          • p1esk 1707 days ago
            If they simply hope to spark discussion and give people time to prepare (mentally or technologically) for the idea of compelling auto-generated text, that really ought to be said more clearly.

            They have been pretty clear about this [1]:

            "We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems."

            [1] https://openai.com/blog/better-language-models/

      • sytelus 1703 days ago
        Initially I was amused by OpenAI's claim of "too dangerous to release" but during the past few months I have come to agree with the decision they made. Auto-generation of convincing text is actually very disruptive. Our news cycles and influencing stories are driven by things like Twitter hashtags, how many tweets are on a given story. People look for validation of their thoughts in places like 8chan, reddit, hn etc. There is an undeniable influence of all these discussions that occur online. Now imagine, one is able to create hundreds of accounts and do autonomous discussions to create an impression of consensus towards desired goal. Imagine large number of tweets that makes wise cracks are actually autogenerated. Imagine you as a person who is on borderline is taking cues from number of people pro-something to make decision. Imagine NYT running article on smearing some activist because there were thousands of Twitter smart-bots convincingly calling for his resignation with humorous mimes. Better the quality of AI, better the chance journalists and others would think it's real. I think GPT-2 would go down as one of the first real large scale danger of AI.
      • repolfx 1707 days ago
        OpenAI is run by people who believe and spread horror stories about how AI can totally change society, for better or worse.

        I think this is reflective of a substantial disconnect and polarisation in society, because OpenAI's position here was driven by a very fundamental intuition about human nature which is not universally shared.

        Namely, their rationale was something like this: the world is dominated by people whose opinions are fundamentally shaped by what others are saying, and moreover, by how frequently other people seem to be saying it. They don't really think about things or rely on their own experience: they're just mimics.

        In such a world being able to auto-generate fake messages of support for some political position or another at scale would give you immense power, because the population would automatically swing behind you based merely on the perception that everyone was swinging behind you.

        So that's their fear. But is it realistic?

        Well, this is where we get into the polarisation. "You aren't smart enough to have an opinion" is the sort of viewpoint that leads to an elitist vs populist conflict. There have been reams of analysis about this and it's not really AI specific - e.g. did people vote for Brexit because of Twitter, or because of things they saw in the press, or did they vote based on their own experiences, or what their close friends/family thought, or what mix of "all of the above" is the truest mix?

        When they refused to release GPT-2 the OpenAI researchers took an extreme on that spectrum, asserting that essentially they had built a mind control device. But of course which didn't work on themselves, only on lesser minds. Not surprisingly this was very controversial, albeit I think the Valley/Hacker News set would be surprised at just how controversial it would have been if anyone outside the AI world had really noticed. You can't tell most of the world their opinions aren't really their own and not expect pushback.

        Personally I don't share those intuitions at all. The controversy over DeepFakes is similar. Photoshop has existed for decades and hasn't led to a dystopia, but suddenly DeepFakes is going to create fundamental social change? I don't think so.

        AI researchers, especially in academia, need big narratives to keep up the funding and suggest they're on a mission to save the world vs e.g. making slightly better playlist recommendations. GPT-2 seems like a classic case of this being taken to absurdity.

        • Veedrac 1707 days ago
          > When they refused to release GPT-2 the OpenAI researchers took an extreme on that spectrum, asserting that essentially they had built a mind control device.

          Having read everything I could find from OpenAI on this topic, I have no recollection at all of such claims by OpenAI. Sources would be appropriate.

          • repolfx 1706 days ago
            They said they wouldn't release the biggest model because it could be abused to make bots that sway politics. Consider the chain of reasoning required to believe that.
            • Veedrac 1705 days ago
              Again, I suggest you give direct sources for your arguments.
              • repolfx 1705 days ago
                Are you sure you read their announcement blog? You seem to believe that they didn't say what they spent an entire blog post saying.

                Here it is again.

                https://openai.com/blog/better-language-models/

                They said

                "Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code."

                And in case it wasn't clear what they mean by "abusive language", they clarified:

                "Today, malicious actors—some of which are political in nature—have already begun to target the shared online commons, using things like “robotic tools, fake accounts and dedicated teams to troll individuals with hateful commentary or smears that make them afraid to speak, or difficult to be heard or believed”"

                "These findings, combined with earlier results on synthetic imagery, audio, and video, imply that technologies are reducing the cost of generating fake content and waging disinformation campaigns. The public at large will need to become more skeptical of text they find online."

                Or summarised:

                • We think people believe things they read online today.

                • We think there are political "disinformation campaigns" being waged, by the sort of people who would use AI to wage them more effectively.

                • We think GPT-2 would let them create "deceptive, biased or abusive language" automatically, and this is too dangerous to release because people aren't skeptical enough of what they read online.

                • We think the entire population needs to change its behaviour for GPT-2 to be safe to release.

                I think that directly supports my arguments. Their whole worldview assumes the population is naive, easily manipulated by text and more importantly the volume of it ... they also draw a bogus analogy to DeepFakes. It's bogus because nothing stops anyone writing words today that OpenAI staff might find "biased" or "deceptive", literally anyone can do that. But substituting a celebrities face onto porn would require very advanced video editing skills most people don't have: DeepFakes is a truly new capability, whereas GPT-2 is merely a capability of scale.

                To believe GPT-2 changes anything you must believe the majority of people make up their minds on political issues based purely on volumes of anonymous online comments. Hence my argument.

        • derefr 1707 days ago
          > asserting that essentially they had built a mind control device. But of course which didn't work on themselves, only on lesser minds

          More like asserting that they had built a nuclear reactor, which someone else could easily turn into a nuclear bomb. They haven't turned it into a nuclear bomb themselves, so the fact that they haven't been blown up by it doesn't require any special justification.

          • Judgmentality 1707 days ago
            My real issue is they started as a non-profit supposedly developing AI for the benefit of humanity, then quickly became a for-profit with technology so dangerous only they are allowed to wield it.

            If they just said "we are developing AI because AI is awesome" I would have a lot more respect for them than I currently do. I wish they would at least change their name.

            As it stands, it's hard for me to see their release process as truly an experiment and not as just a clever marketing ploy - because people are talking about them a lot more now than before.

        • paggle 1707 days ago
          Why does “the world is dominated by people whose opinions are fundamentally shaped by what others are saying, and moreover, by how frequently other people seem to be saying it“ imply an elitist vs populist class divide, rather than applying to EVERYONE? I can honestly say I have never met a person whose thoughts were not primarily a construct of their environment. Take everyone I’ve ever known, and have them born 2000 years ago in Greece, and they’d probably believe that slavery is fine, child sexual slavery is a perk of being rich, and that it’s perfectly normal to burn heretics. Anyone whose thoughts weren’t dominated by societal cues would probably end up in jail.
          • repolfx 1705 days ago
            You're conflating environment and reading of written opinions.

            People are products of their experiences, sure. That experience is much wider than "possibly bot authored comments on social media", but OpenAI is acting as if it isn't.

            Most people won't agree with their stance. It does however seem to have some weight in certain social circles, the sort of circles that assume most people aren't educated enough to make good decisions.

          • AgentME 1707 days ago
            Yeah, there's very few of my opinions I hold now that I would've been able to invent in a vacuum. I adopted pretty much all of my opinions from seeing them online among others and reasoning my way to decide between the options. If I had never seen the opinions I now hold, then I probably wouldn't have picked them. If these good opinions that I ended up picking had been lost in the noise of a sheer flood of bad opinions, then I would've become a different person easily.

            The idea of a good text-generating AI possibly being able to flood forums is a little scary, and the idea that even the little bit of caution OpenAI has done is elitist is ridiculous.

    • anchpop 1707 days ago
      Before they release it, they need to prove to themselves it's safe (I'm speaking morally). I'm not sure why so many on HN seem to think it's the other way around, that you have to prove it's dangerous before withholding it. $500k for reproduction is peanuts for a nation state but quite a bit for many other groups who might be interested.

      Even if it is entirely safe, it's still good to withhold it because it helps create a culture where people think about the safety of their projects before releasing them

      • Bartweiss 1707 days ago
        There are two very different questions here: is it good to withhold, and is that sensible for OpenAI to withhold? I think a lot of people on HN view restricting GPT as incoherent by OpenAI, where they might not for another group.

        OpenAI's raison d'etre was democratizing access to AI tools, and preventing them from being abused by concentrated powers like governments. If replicating GPT-2 is trivial for state actors but prohibitive for hobbyists and other private citizens, it's creating the same issue Musk described OpenAI as setting out to oppose. Even the general idea of treating AI tools as hazardous-by-default goes a long way to validating the project's original critics.

      • buboard 1707 days ago
        ITs really difficult to impossible to prove it s eithrt safe or not. We re still in the early stage of ML research so this kind of fud is a brake to research. Similar to how genetic research has been held back by regulators
      • jgalt212 1707 days ago
        > Before they release it, they need to prove to themselves it's safe (I'm speaking morally)

        I don't think it was a moral question. I think the lawyers got involved and were uncomfortable with potential litigation exposure.

    • p1esk 1707 days ago
      OpenAI most likely spent 6 or even 7 figures to produce GPT-2 model. The guys who wrote this article mentioned they spent $500k to replicate OpenAI’s results. Each training run costs $50k, and they needed many runs to find the optimal config. Note that the GPT-2 paper had already described many crucial details (eg dataset). Without those details it would be a lot harder to do this.
    • skybrian 1707 days ago
      Nation-state actors aren't the only people who can do damage. You also have to worry about the script kiddies. There are people who will do bad things for the lulz, but only if it's easy.
    • buboard 1707 days ago
      OpenAI is the google of the new ML era. Except that they went from "dont be evil" to typical greedy corporate in less than a year.
      • est31 1707 days ago
        My personal (conspiracy) theory is that the "don't be evil" trick was done to attract ML talent. Many people, if they are convinced they are working for a good cause, are okay with being paid less than if they worked at evil inc or a defense contractor or something. It's a big reason why academia has so many people at the top of their fields accept little pay (compared to their pay in the industry).
        • buboard 1707 days ago
          These people are now held hostage by the "100x" promises. I 'm hopeful however that all this "Old Tech Money" - FAANGs&OpenAI won't be able to keep them for long, because AI may be used to destroy their existing revenue streams (advertising - optimization - engagement)
  • anon1253 1707 days ago
    "The cost of training the model from scratch using our code is about $50k."

    Still a substantially steep curve for a bootstrapping startup. It's something I continually run into myself. I have somewhat of a weekend project trying to build a search engine but man ... the cost of just the SSDs and GPUs is daunting on a regular salary. As the complexity of these models grows, so does the barrier to entry for a regular joe like me; which is a shame I think. I know in the US it's fairly normal for a data scientist to pull 100k+ / year, but in the Netherlands salaries pretty much stall at 40k (and angel investment in IT/AI is at an all time low). More generally I fear this will become a bit of a sociotechnical issue if complex AI models will be out of reach for entire economies (especially for cases like language because not everyone speaks English and "minor" languages like those in EU countries are a massive market to explore, yet hard to get into).

    • sytelus 1703 days ago
      Seriously, you do not need massive amount of money to build search engine. The V0 version of Google was literally a single cheap desktop running whole thing end to end. Average webpage size even today is just 2MB. You can just start with crawling 1B pages which you can comfortably store and index on 8TB HDD (you don't need SSD at this stage). You should make your algorithm work for single user on this 1B page crawl. Don't worry about freshness or speed, just focus on relevance. If things works beautifully, find a VC and show them the goods. If things works out, you might probably end up with $100K seed money. Then you go out and buy dozen good desktops, increase crawl size to may be 15B pages and support few dozen simultaneous users. Now you can go out to big guns, send them a link to your search engine and get next level of funding. At that point you hire real employees and now your job is scaling up to thousands of invite-only users and squeezing out the long tail of performance in terms of relevance. Constantly measure your customer satisfaction and complaints, use gmail-like invite your friends model to incrementally get high-value users who are willing to try out something new. At this point, if your search algo is really much better, you should be able to plot exponential DAU/MAU growth and get funding rounds in $100M range to really scale up.
    • tzapzoor 1707 days ago
      And they're just "two masters students, with no prior experience in language modeling" with $50k lying around for training a huge model.
      • est31 1707 days ago
        The money came from Google: "We would like to thank Google (TensorFlow Research Cloud) for providing the compute for this".
      • p1esk 1707 days ago
        They mentioned they spent $500k (in research credits) on all the experiments to actually find the hyperparameters.
        • godelski 1707 days ago
          Where did you see that?

          Also how do two masters students with no experience in NLP get $50-500k in compute credits? How do I get that deal?

          • p1esk 1707 days ago
            It was in the article last night, they deleted it for some reason after my comment.

            One of the authors has a peer reviewed NLP publication [1], the other has several publications in computer vision. I don’t know how they got research credits from Google.

            [1] https://arxiv.org/abs/1905.13153

            • godelski 1707 days ago
              Well that is pretty dubious. That completely changes the metric of how much money and how difficult this costs. (And I definitely believe you because there's several other comments with that same number). Could be a typo? But I feel like that's something you'd say. Then again, these are masters students.
              • p1esk 1707 days ago
                It's $50k per training run. Their main contribution has been finding the optimal hyperparameters, not described in the OpenAI paper. Obviously you need more than one training run to to that.
          • dgacmu 1706 days ago
            They didn't, their advisors -- Stefanie Tellex and George Konidaris -- did.

            And the answer to that part is, they asked: https://www.tensorflow.org/tfrc

            • godelski 1705 days ago
              Interesting. I'm doing ML in a niche area so I'm wondering if I can get funding (for my PhD). Obviously there's politics involved, but honestly I just want to try out methods and experiment (typical grad student I guess). So if they have a funding route I'm stoked. Thanks for the info.
              • dgacmu 1704 days ago
                In most programs, if you get accepted into the Ph.D. program, you're funded. Obviously, you should check first. Different programs treat these things differently with TA requirements, but most Ph.D. students are funded by an "RA" (research assistant, aka, doing your actual eventual-thesis research...).

                For more info, my colleague Mor wrote a very useful doc about the Ph.D. process in CS: https://www.cs.cmu.edu/~harchol/gradschooltalk.pdf

                • godelski 1704 days ago
                  Oh I'm funded. Definitely was told not to go unless I'm fully funded. But $500k for computing resources is bigger than a lot of small term contracts (I'm mostly on research and usually don't have to TA unless there's a hiccup in funding schedules. Gov money...). I definitely see that amount for long term, but this seems like a pretty short project.
                  • dgacmu 1703 days ago
                    Got it. (1) Yes, 500k is a lot, but (2) the TFRC program is making fairly expensive TPUs available at shockingly good prices for researchers willing to do some tire-kicking and feedback-providing. So 500k of TPU time is easier to come by than, say, 500k of generic compute.
    • buboard 1707 days ago
      There is no need to train the model, they already provide the parameters. These transformer models , like BERT are pretty adaptable to reuse.
      • anon1253 1707 days ago
        There is no need to train this particular model, but adopting this (or any novel model, the field is moving fast) to say Dutch, Italian, Hungarian, Icelandic or whatever still requires training. Luckily for most languages they are provided (at least in the case of BERT, FastText, or regular skipgram). But there is also still quite a bit of leeway in domain specific adoption (for example SciBERT for scientific texts, or legal and financial documents) reddit/wikipedia does carry a bias. Each of which not only requires pretraining the model, but also generating a huge and fairly well formatted corpus. And, although the parameters are usually finetunable, it does break down sometimes on the various sub-word tokenizations used.
  • high_derivative 1707 days ago
    It may be high time to discuss what AI policy has actually done so far. From what I can tell, not much other than letting social scientists get in on the deep learning gravy train.

    Meanwhile, misuses of ML are proliferating without limits, and 'AI policy' is apparently mostly used as a fig-leaf to collect good-will, marketing, and buy a seat at the table for future regulations. As usual, regulations will protect incumbents, so my as-usual cynical read is that OpenAI's policy interests are about protecting its own future interests. From that perspective, the entire GPT-2 stunt was highly effective.

    Now depending on your outlook, that may be an argument that we need more people in policy, or fewer. Or different ones.

    • repolfx 1707 days ago
      Meanwhile, misuses of ML are proliferating without limits

      Are they? Where?

      For all the hype I haven't seen any obvious abuses of AI. I've seen better speech recognition and a few other useful things. I've seen stuff that's wrong but ultimately kind of trivial like auto-generated porn with celebrity faces, but that's the worst stuff so far.

      I haven't seen clear, unambiguous cases of abuse beyond that. I've seen a lot of allegations that AI is being abused e.g. "Russian bots" but on investigation these stories usually evaporate.

      If anything I've been kind of disappointed by AI so far. Amazing demo videos abound, but I'd guess 90% of the impact of AI in my life has been Google improving their already quite good services. Better translations, better search results etc. All very welcome but not really life changing.

      • visarga 1707 days ago
        > better search results

        Funny you say that. I was recently searching for a e-scooter lighter than 10Kg and all Google could find was the max allowed weight of the person riding the thing (around 100-130kg). Not to mention that it didn't understand how to make a conjunction and show me light e-scooters with suspension. It's just matching keywords without understanding anything about the relations between them.

        I am disappointed at the current quality of Google search, especially in shopping related queries where money is to be made. Instead of stuffing my pages with irrelevant 'personally' targeted ads and tracking my every move they should make an effort in that particular moment when I actually want to buy something and give me a good suggestion.

        • derefr 1707 days ago
          Coincidentally, this kind of "modelling the question" is what IBM's Watson is/was supposed to be uniquely good at. It seems like Google hasn't even really considered entering the same space. Maybe each query has too high a incremental cost to run for them to be profitable right now?
      • kuzehanka 1707 days ago
        > I haven't seen clear, unambiguous cases of abuse beyond that.

        That's the power of these things. Would you know you're reading a social media comment generated by something akin to this? No, at best it would be ambiguous.

        There is no way for an everyday person to tell how much of their life is impacted by ML at this point.

        • repolfx 1706 days ago
          But I don't decide anything based on reading social media comments unless it makes a good point, which GPT2 text never does (so far).

          I honestly wouldn't care much if I was reading stuff written by bots, as long as I didn't waste time talking back ;)

    • s_Hogg 1707 days ago
      Hear, hear. This whole trust-us-we're-the-good-guys come-on line self-regulation is giving us is clearly not good for much of anything beyond an orgy of self-interested PR.

      It doesn't matter how many times people say the market works. There are plenty of self-appointed policy experts but no external force (I.e. regulation) helping people to live up to what they're saying. I think AI policy is pretty well-catered for labour at this point, but enforcement is not.

      • buboard 1707 days ago
        Regulation is going to make it worse, by picking winners and losers, and will leave states with weapon-grade AI hidden from the public. The only way to fight this is with open technology, like this, publishing state of the art models for everyone, and creating open datasets. It is only a problem today because AI is siloed behind FAANGs and openAI. We need more openness, and i think it will come.
    • buboard 1707 days ago
      The biggest offenders of AI are of course governments. AI is military technology. Yet we re arguing about the merits of gender inclusion.
      • Dobbs 1707 days ago
        And banks, and large rental companies, and credit lenders, and Amazon, and Facebook, and Google, and so many other people. So many of which easily encapsulate their biases into their algorithms, which then perpetuate the biases into society as a whole.
        • ethbro 1707 days ago
          Which is why regulation should be model transparency in sensitive areas: full stop.

          Sure, use it to decrease your operating costs. But it doesn't get to be secret sauce.

          • godelski 1707 days ago
            > Which is why regulation should be model transparency in sensitive areas: full stop.

            It's not the model that makes the bias. It's the data. Garbage in, garbage out.

            The tricky situation is that's the magic sauce to most ML algos (as really GPT2 demonstrates). I don't care about what model they use (CNN, transformer, whatever) or their architecture (layers and how things are connected). What I'm concerned about is the implicit bias in the data. This is the hardest things to figure out and so the issue is how do you build trust of that data? Handing out that data makes you lose your competitive edge in a marketplace. Not only that, in many cases that data contains sensitive information that people wouldn't want public. So it's hard to say what to do. 3rd party auditors? But how do you prevent them from becoming corrupt and complicit like the credit agencies.

            I think it sounds like there's easy answers, but honestly this seems really difficult to me.

            • ethbro 1707 days ago
              My point was to the actual trained model.

              I can't see access to data ever being accepted, for the reasons you mentioned.

              However, publishing the model in a way that third parties can fuzz it seems to allow for discovery of the worst bads.

              • godelski 1707 days ago
                Releasing the trained model has the same consequences as releasing the data. Even worse, people don't have to spend the capital required to train.

                For example let's say big company trains uber translation network. Releasing the trained model means anyone can now use that model.

                Maybe we could have laws about this like patent laws but there's still difficulty because is fine tuning the model making it substantially different? How do you deal with countries like China that don't respect patents (especially when we're talking about technology worth hundreds of millions or billions of dollars)?

                I'm not saying we should give up. I'm saying that there isn't a simple answer here. We shouldn't expect one either! But to get a good answer we need to discuss and figure out the nuance of the situation. Because on one side we can't trust these companies too much. It's too easy to make mistakes. On the other side we can't just bankrupt them because we'll lose a huge standing in the world economy. So where's the middle ground? That's what I'm after.

                • ethbro 1706 days ago
                  Releasing a model is in no way like releasing the data. ML models are fundamentally compression algorithms for data sets, when you look at them from an information theory perspective. They are lossy encoding.

                  Furthermore, it's a huge field, of which NLP is a significant, but tiny, subset.

                  The vast majority of models being used in the real world don't generalize thusly, because the data sets and processes they're linked to are bespoke.

                  The middle ground is the black-box model (perhaps with technical safeguards against decompiling-equivalent, or hosted as a service). It provides the ability to statistically prove bias, while protecting the majority of privileged or private information.

                  • godelski 1705 days ago
                    What I'm saying is that there's going to be a lot of push back for releasing a full trained model because anything with that model can make similarly accurate predictions. Additionally one can fine tune that model and make something better. So someone spends a few million in getting the original model. Someone else spends a few thousand to fine tune a better version.

                    What I'm saying is there's a huge downside to releasing the full model compared to holding it tight. Many companies don't patent things for similar reasons.

            • tecleandor 1707 days ago
              It's the data, and it's the model. You can also build a biased model, the same as you can build biased code.
        • buboard 1707 days ago
          The things that a government can do to its citizens are worse than any corporate.
  • 6gvONxR4sf7o 1707 days ago
    It's worth noting that, like other attempted replications, the perplexities of this model mostly aren't as good as GPT-2. Given that the title of the GPT-2 paper was "Language Models are Unsupervised Multitask Learners," I'd be interested in a lot more metrics before I'd believe GPT-2 has actually been replicated. Especially because every other time someone says this, metrics show otherwise. Until then, this is just a really big model.
    • Smerity 1707 days ago
      I made the WikiText-2 and WikiText-103 datasets they compare against and held state of the art results on language modeling over PTB, WT-2, and WT-103 not too long ago.

      OpenGPT-2's results are near equal to GPT-2's in zero-shot language model perplexity on multiple datasets [1].

      The zero shot perplexity results are also exactly where we'd expect the 1.5 billion parameter to be, markedly better than OpenAI's 775M GPT-2 model[2] (the second largest model OpenAI trained) that they released in the last few days.

      To me this is about as close a replication as you could expect, especially given OpenAI didn't release many of the exact training details. If OpenAI retrained the 1.5 billion parameter GPT-2 model I wouldn't be surprised to see the same variance in performance simply due to the random initialization of the parameters.

      [1]: https://miro.medium.com/max/3200/1*h1JoiQq9f1qOHS-rN4u57A.pn...

      [2]: https://www.semanticscholar.org/paper/Language-Models-are-Un...

      • 6gvONxR4sf7o 1706 days ago
        That's true, but isn't the point of GPT-2 that it's a strong at many tasks? It did really well at a lot more than just the four perplexity measures reported in OP.
    • happycube 1707 days ago
      The results are (mostly) between the large and extra large GPT2 models, and it's possible to reproduce if you have the resources.

      Between this and the 774M GPT2 release, it's been a pretty good week :)

    • doctorpangloss 1707 days ago
      I suppose you can go and shit on people trying to replicate scientific work, and telling them, extremely reductively, “Well your almost as big number isn’t as big, so fuck you,” like a soccer ultra comparing their team’s score of 3 versus the opposing team’s score of 2, as though the only question that matters is “Whose Soccer Team is Best?” Is that even the right question to ask?

      People who actually do research, they don’t just look at the absolute comparison of published numbers! Do you think that’s how research is done, by chasing whatever has the biggest number? No repeat innovator does that.

      It’s an interesting collision of world views for sure. This is a social media forum for a venture capital firm. They’d hate for anyone to discover that the numbers don’t tell the whole story, that actually everybody starts at zero, and that being second, because of the price premium put on first, is a huge opportunity. So even in some narrow, cynical interpretation, your point of view would lose people a ton of money. But I don’t really know anything about that.

      • 6gvONxR4sf7o 1706 days ago
        Holy cow, dude. That's not even remotely like what I said. If you read the original paper, they show that GPT-2 does all sorts of cool things. In OP, they show that it's almost as good at one thing on a couple data sets.

        It's like if I wrote a paper showing that widgets improve liver health in young men, young women, and adult men. Additionally, widgets make you happy and taller and turn blue. Then you come along and try to replicate my results, showing only that your version of a widget makes young men and women's livers almost as healthy as mine did.

        But sure, pretend I said whatever you want to argue against.

  • Felz 1707 days ago
    What's it take to actually run a model like this, hardware-wise? I've been toying around with a gpt2 discord bot (https://github.com/ScottPeterJohnson/gpt2-discord) using just a CPU calculation, and already it takes up 2 GB RAM (and is slow obviously) on the 345M model. I might be able to get the 774M model running, but there's no way I can afford the full model, assuming linear RAM use. And that's just for CPU compute, I can't even begin to imagine how expensive GPU would be.
    • dgacmu 1707 days ago
      I use it for http://nametango.ai/ (a startup/product name generation assistant). There are more details on my page, but in a nutshell, it takes about 7 seconds to generate a result on CPU (I generate a batch of 40 results for every query and de-dup them), and it runs in about 300 milliseconds on a Titan V GPU.

      That is, however, a very batch friendly application, so a serial generation may perform somewhat better on CPU only.

      (I'm comparing to dual xeon gold 16 core CPUs). When using the GPU to generate the GPT-2 results, the CPU is mostly bored.

      (responding to a comment that someone deleted, yes, it does suggest a lot of existing names, I haven't yet added something to filter out stuff that already exists... noted, and suggestions appreciated, but this probably isn't the thread for it! Feel free to drop me a note)

    • ageitgey 1707 days ago
      Inferencing on this model works fine on Google Colab which gives Tesla K80 GPU with access to 12GB of GPU RAM. You can buy a used K80 for probably about $850, but it's not really ideal for putting in a home computer because of the cooling requirements.

      [ deleted reference to 2070 Super ]

      • p1esk 1707 days ago
        Used K80 can be had for $350 [1] Not bad actually (it's probably as fast as 1080Ti, and has 24GB of memory).

        https://www.ebay.com/itm/NVIDIA-Tesla-K80-GDDR5-24GB-CUDA-PC...

        • happycube 1707 days ago
          K80 is 2 GPU chips with 12GB, so it's not always as good as one newer/larger GPU. Much more affordable though :)
          • p1esk 1707 days ago
            If I remember correctly K80 memory is actually 24GB, not 2x12GB. This is a pretty important distinction in this context (training GPT-2).

            Also, you can get at least 6 K80s for the price of a single RTX Titan (also 24GB). So it would be faster (I don't think RTX Titan is 6x faster than K80) and 6x more memory for the same price. It's a very good deal.

        • lostmsu 1703 days ago
          300w with passive cooling? o-O

          How does that work?

          • p1esk 1703 days ago
            You would cool the server.
      • acd10j 1707 days ago
        since when RTX 2070 ship with 14GB of GPU Ram, Max memory for RTX 2070 super is 8 GB.
        • ageitgey 1707 days ago
          Oops, you are correct. I mis-read the spec sheet.
  • minimaxir 1707 days ago
    Twitter thread by a Research Scientist at OpenAI addressing OpenAI's policies in response to this discussion here: https://twitter.com/Miles_Brundage/status/116495932263331840...
  • exabrial 1707 days ago
    Without context, this article reads like something generated by machine learning.
  • p1esk 1708 days ago
    They spent $500k replicating it. But sure, you can do it too /s
    • gwern 1707 days ago
      They used research credits, and even that aside, with their code and training tips, you can redo it for $50k on cloud instances or less on dedicated hardware + patience. And look at ImageNet training progress: you can train a near-SOTA ImageNet CNN in like a minute for $20-40 after a lot of optimization work. We've already seen a lot of improvements in LMs over the past 2 years... (For example, the main barrier to training GPT-2 is just the bloody memory use from the Transformers exploding at runtime, which pushes you into high-end hardware like cloud TPUs on GCP. Do Sparse Transformers fix that?)
      • p1esk 1707 days ago
        Wait, how can I get to near SOTA on Imagenet in a minute (!) for $40?
      • ZhuanXia 1707 days ago
        Are you going to update your poetry engine now that we have this?
        • gwern 1707 days ago
          Hm, maybe. It depends on how easy their training code is to use and how long retraining would take. It presumably will take at least a week because 345M took about a week, but I'm not sure I want to spend the money on a week of a very large cloud instance (which would be what, $300?) for what is probably a substantial but not stunning improvement in generation quality.

          I might rather wait for the next leap, from something like a Sparse Transformer approach which can get global coherency by having a lookback over the entire poem or getting a better poetry corpus with delimited poems (rather than entire books).

    • the8472 1707 days ago
      You're off by an order of magnitude and omit the caveats to that cost estimation. From the article:

      > The cost of training the model from scratch using our code is about $50k. It’s important to note this figure is the estimated value of the cloud compute, and does not reflect the much smaller intrinsic costs involved

      • p1esk 1707 days ago
        They edited the article after I left a comment there. The original text stated they spent $500k to run all the hyperparameter search experiments to replicate OpenAI results. Only after they did all that work you can run their code for $50k.
    • kuzehanka 1707 days ago
      Where did you get 500k from? They said 50k. In estimated cloud compute costs.
      • p1esk 1707 days ago
        They removed this info for some reason. It takes $50k per training run, and they initially said they spent $500k total on experiments. Only after they did all that work you can run their code for $50k.
  • macawfish 1707 days ago
    It's fun to feed it the first paragraph of the Sokal affair paper:

    Prompt:

    There are many natural scientists, and especially physicists, who continue to reject the notion that the disciplines concerned with social and cultural criticism can have anything to contribute, except perhaps peripherally, to their research. Still less are they receptive to the idea that the very foundations of their worldview must be revised or rebuilt in the light of such criticism. Rather, they cling to the dogma imposed by the long post-Enlightenment hegemony over the Western intellectual outlook, which can be summarized briefly as follows: that there exists an external world, whose properties are independent of any individual human being and indeed of humanity as a whole; that these properties are encoded in ``eternal'' physical laws; and that human beings can obtain reliable, albeit imperfect and tentative, knowledge of these laws by hewing to the ``objective'' procedures and epistemological strictures prescribed by the (so-called) scientific method.

    Completion:

    Before one can explore these ``objective'' principles and procedures, one must do something important, namely, invent the objectivity -- or, more precisely, the objectivity-transcendent properties -- of the ``subjective'' functions of ``man" (as a species of ''objective'', or something-in-itself-independent, human, being). Since the 19th century, this has been done on a literal or phenomenological level, the metanarrative of psychoanalysis. However, the ultimate test of what constitutes an objective fact or structure is the statistical discovery of reliable facts or structures residing in the human mind or body. That is what the discipline of psychology is for. The distinction between the subjective and the objective is therefore necessary for the discipline of psychological science, which is by no means a waste of time, which is today a flourishing field of scientific inquiry, and which might contribute a great deal toward furnishing a genuinely complete theoretical understanding of human behavior and psychical experience.

    In short, the philosophical implications of the post-Enlightenment philosophy of the sciences and of psychical psychology are -- when considered in the context of both human existence and human thought -- deeply unsatisfactory. Once we abandon the most basic postulate of scientific rationality -- that subjective knowledge and belief and behavior are not to be said to be existent but to be, not to be, knowable -- there is nothing left to suggest that the content of human knowledge and experience is dependent upon our conscious perceptions or personalities. We might describe that content as ineffable (it is irreducible to language, outside the experience of individual thinkers), or inconceivable (that which is not one's own consciousness, but is beyond the realm of conscious experience). In either case, there is no genuine need for an objective world, or, at least, no need for an objective foundation. In that sense, it is erroneous to assert that the scientific standpoint is a paradigm of objective reality; rather, it is the antidote to the paradigm of irreducible subjectivity and self-consciousness.

    Unsupported by this analysis of the problem of scientific knowledge, the field of psychical science is often rejected by younger scholars, for whom the study of the human condition and its ineffable psychology is regarded as a particularly enticing but very unapproachable form of mysticism. They regard this section of the theory and practice of modern scientific inquiry as a kind of metaphysical theory-cracking rather than as a vehicle for the discovery of objective facts. Such a view can only be sustained on the basis of the strictest adherence to the doctrines of empiricism and fundamental materialism, respectively, and it is by no means fruitful for the real progress of science. Let me explain.

    The residual emphasis on irreducible subjectivity in modern Western thought (and only by the residual emphasis) is a legacy of the metaphysical concept of objective reality which, to my mind, is still embraced by the cognitivist movement of the 20th century. It is an unfortunate legacy. Irreducible subjectivity and subjectivity-dependent, subjective perceptions were at the root of the metaphysics of Buddhism, and these beliefs now constitute the dominant conception of reality in contemporary Western philosophy, particularly Kantian, Descartesian, and Hegelian thought, which are each deeply indebted to Aristotelian theories of subjectivity.

    While both these views of reality are patently absurd, and since the scientific revolutionary of the middle 19th century was able to disregard or repudiate them, the followers of Kant and Descartes were able to maintain that there is no need for a foundation for the science of knowledge. They could do this because they held to a primitive, problematic conception of objectivity, based on the notion of an objective, external world, in which human consciousness, thus independent of any particular body, mind, or culture, was inchoate, mutable, and subject to change or speculation. There was therefore no need to search for a theory of experience. Science and experience were simply different approaches, of which each was as good as the other, and they both...

  • macawfish 1707 days ago
    Prompt:

    The jig is up. And what now? Where do we go from here?

    Completion:

    Where Do We Go From Here: In the aftermath of the fall of the German Republican party, we now have a significant degree of instability across the earth’s systems of government and finance. The almost complete collapse of systemic forces in the Eurozone and limited success at stabilizing the system means the question is not if but when, what do we do next?

    The answer is simple. We must move beyond the localized, bubble-like, and short-termist “get involved,” tactic of getting into the scene and trying to control it in some way. We have to come up with a way of shifting the socio-political power in the world, the prime place for transformation is worldwide at the supra-system level and not just the economy and finance. We must cast out the old dominated system, of which we have been just a part and recognize that we need a new dominant system that serves human interests, and the meta-level global system must serve human interests.

    The fact that the global status quo is collapsing of its own weight shows us that the system is structured in such a way that the group of big players who have dominated and still dominate, are in ever-decreasing danger of losing both power and integrity.

    The question now is, how? How do we avoid degeneration into chaos and conflict when the anarchic nature of the system leads inevitably to greater and greater competition among and frustration and anger in the younger generations?

    This is a society, this is a planet and we live in the first global century of human history, which the young will pass from generation to generation in the next twenty years, or perhaps not. When we see the events of the last weeks and days, you can just imagine what will happen to this planet, to this planet and human society in the century ahead, and you can just imagine what the future will bring to this and subsequent generations.

    When history describes the past, it sees the collapse of an old political establishment, of the traditional hierarchies of power, of society, of economics and finance. It sees a collapse in the old order of power and in the equilibrium it has created, which was grounded in constant growing jobs and the prosperity it produced. We are in the middle of a permanent expansion of capitalism, which also creates ever-growing wealth and prosperity for a small population of wealthy earners, while social polarization and inequality increase and older people depend on each other more and more desperately.

    The forward and downward momentum of all these forces has created a situation in which there is almost no limit to the volume of the day to day, or minute to minute production and consumption, and in which there is no single concern about the future of the planet Earth. We have become so insatiable, in need, addicted to this ever increasing appetite for consumer goods, that we destroy the planet with it.

    You see this just from what we feed our children, the choices we make, and the products we consume. You see it in our greedy attempts to buy as much as we can, even if it leads to ecological ruin. You see it in our drive to consume new and ever more lavish luxury products, materials, tools, devices, insatiable lifestyles, modern-day imperialism, racism, cynicism, competition, greed, consumerism, hubris, and endless pursuit of personal ambitions and leisure.

    See how when push comes to shove, the social and economic growth created by the continued expansion of capitalism is now a life or death matter. See how the political establishment has failed us, all of us, and how we turned in desperation to another self-serving-self-protective-petty, self-interested-philistine mass-mediator, in the form of Mr. Romney, in order to maintain the old sources of power, to make the old social structures fit to serve human needs and the system could be kept going.

    And now he has packed his bags and wants to leave, so there we are, stuck here with those of us who have found a way to provide for ourselves and live peacefully and prosperously, without the brutality and violence visited on us by politicians and corrupted systems. That is, unless we fix these broken systems and deliver an alternative based on human needs and human compassion.

    How do we do it? How do we get there? Stay tuned and we’ll let you know.<|endoftext|>This exciting book is an overview of a phenomenon that started in the 1970’s, and became the most spectacular of all the urban myths. It combines all things the paranormal in this feature length book, from scientists to aliens to experimental reports.

    William Kean is an astronomer working for NASA. One evening he is on his way to a remote overlook on a Martian hill. Suddenly, he is teleported to the top of a fifty story building, two thousand feet in the air. The building

    • repolfx 1707 days ago
      The sudden lurch into paranormal book review at the end is interesting. I guess it's because it went down the path of, "How do we do it? How do we get there? Stay tuned and we’ll let you know." ... that last sentence is probably very likely to occur in conspiracy theory/paranormal/UFO texts.

      It's very interesting that the model generated a basically coherent speech that could have come from any left-wing event or politician, given nothing more than "things are bad, what next" as a starting point. GPT-2 has correctly learned that Marxist thought is based on a form of catastrophism, as anyone who has read Marx will confirm.

      It's going to be fascinating to see how people use this. My guess is "that sounds like an AI wrote it" will become an insult meaning predictable and content-free.

      Even more fun will be putting the model into reverse and calculating a predictability score - if given the starting point of a real human written speech, GPT-2 rates each next word as highly likely, the overall speech can be said to be only N% insightful, where N is an actual scientifically defined measurement.

      Many people seem to adopt dystopian catastrophism about AI but I feel somewhat optimistic. In the same way that automated spelling and grammar checkers can help people write better, a GPT-2 run in reverse could help people write clearer prose that gets to the point quicker, or perhaps even force people to accept when they don't really have anything new to say. If a speaker doesn't use it then someone in their audience will, after all.

      • macawfish 1707 days ago
        Notice that "<|endoftext|>" delimiter. A lot of the samples I generated had that, and would then rapidly switch into a whole different tone or style. Maybe there was an error in their training where they somehow didn't separate training samples properly? I don't know enough about machine learning to say.

        I also find it interesting that this sample got -4 points where the Sokal affair sample I posted got +4 points.

        I imagine it has more to do with the emotions each sample evokes in various hackernews readers. Could it be that hackernews readers are likely to have a distaste for postcolonialism, but are likely to be fans of materialist rationalism? I think so, based on years of reading their comments :)

        • AdamDKing 1707 days ago
          On the <|endoftext|>: GPT-2 and this model were trained by sampling fixed-length segments of text from a set of web pages. So if the sample happens to start near the end of one page then it will fill in the rest of the length with the beginning of another page. The model learns to do the same. TalkToTransformer.com hides this by not showing what comes after the <|endoftext|> token.
          • macawfish 1705 days ago
            That explains why sometimes the talktotransformer samples are so short!
    • jcims 1707 days ago
      Same prompt to OpenAI's 775M model at talktotransformer.com

      The jig is up. And what now? Where do we go from here? Where do we go if the Democratic Party cannot put together a winning coalition? How do we survive a 2018 presidential election that might not have Democrats at the top of either ticket?

      Well, there are two basic paths that could be explored. First, the party leadership in the Senate and House could continue to make the case for continued obstruction and that is what many in the party fear. They would do so under the assumption that this effort would keep Republicans from the White House in 2018. Republicans who fear Trumpcare and what it means for health insurance is likely to vote to continue Obamacare, which will lead them to lose seats as they did in the 2010 midterms. This would, of course, be one way the party could still maintain control in the Senate and, even with a Democratic filibuster, they might still be able to pass some of their legislation with a simple majority in favor.

      The other path would probably give Democrats more ammunition if the country's anger against Trump is greater than the party's concern with continuing their majority in the Senate. As a result of a Democratic

  • Chris2048 1707 days ago
    I skimmed the article and stated reading the last paragraph to get an idea of what it was about. I was v confused..