Dota 2 with Large Scale Deep Reinforcement Learning [pdf]

(cdn.openai.com)

116 points | by hongzi 1595 days ago

7 comments

gambler 1595 days ago
I'm as so tired of trying to read these "deep learning" AI papers that deliberately obfuscate what they did and didn't do. Often by using deliberately ambiguous terminology, over-explaining the domain and immediately flooding you with low-level detail even in high-level descriptions.
Each paper should start with unambiguous description of:
1. What are the inputs of the model.
2. What are the outputs of the model.
3. What is the overall size of the model. Size, not parameter count.
4. What part of the domain has been manually encoded into the architecture and what has been learned over the training period.
5. What are the restrictions on the domain compared to real life.
6. How the performance is evaluated.
This should be on the first few pages. I.e. the descriptions of what the model does should precede the description of how it does it.
[-]
- erlend_sh 1595 days ago
  Also, the following paragraph is very misleading:
  > OpenAI Five won 99.4% of over 7000 games.
  The players in those remaining percentages played repeated rounds against the AI and eventually started winning more often than not. The AI had only one strategy (deathball) and once top-skill-tier players learned how to play against it, they had a >50% chance of winning.
  [-]
  - ionforce 1595 days ago
    That's not misleading if it's literally true.
    [-]
    - buzzerbetrayed 1595 days ago
      Something being literally true but causing people to think something that isn’t true is pretty much the very definition of misleading.
      You don’t say something is misleading if it isn’t true. You say it is untrue.
    - Mirioron 1595 days ago
      It's misleading because many people will read that and assume that the AI is nigh-unbeatable by players. But that's only the case against players who haven't played against it before.
      [-]
      - rictic 1595 days ago
        FWIW I read it as just meaning that the AI is very impressive, but might or might not be competitive at the highest levels of play.
        Like, if you took a world-class team and had them play against random opponents I'd be surprised if they lost more than one game in a hundred.
        In comparison to chess, according to the ELO guidelines where a beginner has an ELO of 800, an average player is ~1500, a professional is ~2200, and only four people have a rating of ~2800, then we'd expect Magnus Carlsen to almost never lose against an average player, and to win around 99% of games against a low tier professional player.
      - gcarvalho 1595 days ago
        I don’t remember anymore, but were those 7000 matches against high level players? Even the default bots on DotA2 can beat some median and below parties. And they’re very very bad and outdated.
        [-]
        sorenn111 1594 days ago
        Yes, openai 5 played OG who won the international (biggest dota 2 tournament in the world) 2 years in a row. Openai 5 beat them 2 games in a row in a best of 3
        [-]
        josefx 1594 days ago
        As far as I remember it all openai 5 games were not following standard DotA 2 rules of the time. Things like one courier per player instead of players fighting over a single one and the same heroes on both teams. Even I can win a game if I get to set the rules beforehand.
- vagab0nd 1594 days ago
  Agreed. Often I find reading the accompanying source code is more useful to understand the model.
  > 3. What is the overall size of the model. Size, not parameter count.
  What's the difference?
hongzi 1595 days ago
I'm really impressed by the "surgery" operations to keep reusing models as opposed to toss old models and retrain when some small part of the game changes. Appendix B has some pretty good dive-in.
[-]
- raiman 1595 days ago
  We've also just released a "Surgery" specific paper going over the techniques used to determine what parameters to carry over (http://learningsys.org/neurips19/assets/papers/19_CameraRead...)
stargazing 1595 days ago
As a long time player and fan of Dota 2, it's really exciting to see a company like OpenAI taking an interest in the game. I still remember when they showcased their AI for the first time by beating some of the best players in the world 1v1. It was an unreal feat at the time.
minimaxir 1595 days ago
For context, this is the just-released paper from OpenAI about OpenAI Five: https://twitter.com/OpenAI/status/1205541134925099008
h0bzii 1594 days ago
I'm deeply disappointed in OpenAI for this. I was looking forward to them playing the full game against the world champions, instead they played a very limited version of the game, a version of the game that the world champions doesn't play before mind you. And then they call it a win and case solved.
1. They should play the full game without restrictions.
2. They should have the same input and output as humans. So a direct link to the graphic and sound cards outputs as input. And a direct link to USB as input. (Let the ai be a mouse and keyboard driver). I don't think bot should have any artificial delay under these circumstances.
reroute1 1595 days ago
> "By defeating theDota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcementlearning can achieve superhuman performance on a difficult task."
Isn't this a bit of a leap though, it came with massive caveats to the game:
1. Drafted from a 17 hero pool. 17 out of 115??
2. No summons or illusions. Again drastically reduce possibilities in game.
3. This AI is trained against real players for years, so it has enormous experience against this type of opponent. The opposite applies to humans who never compete against bots, so have no experience against this type of opponent. If I recall correctly the more people played against the bots the better humans performed in successive games. Winning two games with all these restrictions and caveats is still impressive, but it feels like they overstate things. Not to mention the flawless mechanics and communication between the bots...
[-]
- joefkelley 1595 days ago
  People tend to focus on what has been left out, but think about what they actually did learn about:
  Drafting from that pool, item and skill builds, last-hitting, creep aggro, laning in general, jungling, item and spell usage, ganking, team-fight positioning, pushing objectives, warding, map control, farm priority, when to retreat vs engage. All of these require an understanding of micro vs macro goals and how they relate.
  Surely this qualifies as "a difficult task."
  [-]
  - hychoi99 1594 days ago
    Dota's built-in bot can already do most of that
    [-]
    - me_me_me 1592 days ago
      Dota bots are useless.Can be easily exploited and defeated.
      And that's being said by someone who is on the worse send of bell curve :)
- the8472 1595 days ago
  > This AI is trained against real players for years,
  AIUI the bulk of the training was self-play.
  > Not to mention the flawless mechanics and communication between the bots...
  The bots had no communication channel.
  > The opposite applies to humans who never compete against bots
  That was covered by the openai open matches. humans could play against the AI for several days and find exploitable flaws. Most didn't, a handful did. That is pretty impressive considering that humans can learn while the AI is frozen during those matches.
  [-]
  - reroute1 1595 days ago
    > AIUI the bulk of the training was self-play.
    Maybe but it also has played against humans in many instances across many years time span.
    > The bots had no communication channel.
    The execution between them was flawless though, acting as one mind whereas a team of human has to communicate ideas. It's a clear advantage, but it just feels hacky in a way since it's not really comparing the same thing. 5 humans are a diverse group of people. Maybe it's not fair to knock the bots for this behavior though.
    > That was covered by the openai open matches. humans could play against the AI for several days and find exploitable flaws. Most didn't, a handful did. That is pretty impressive considering that humans can learn while the AI is frozen during those matches.
    I'm talking about playing against OG, who didn't spend days playing against bots. Beating regular players is great, but not the accomplishment they are portraying. The bots had a pretty specific playstyle, and if pros had dedicated time to beating them I think there would be different results.
    [-]
    - the8472 1595 days ago
      > Maybe but it also has played against humans in many instances across many years time span.
      Those are evaluation matches, they don't feed into the training data.
      > The bots had a pretty specific playstyle, and if pros had dedicated time to beating them I think there would be different results.
      Well, the pros could have joined the open games too, I don't know if any did.
      [-]
      - reroute1 1595 days ago
        Yea I think you are right about the training data. So that would be more impressive. I still would like to see a team dedicate some time to it, but that's really not in the interest of a pro team. Also the things that affect real tournaments like drafting would be really interesting to see some day also as that is a huge part and considered a more cerebral aspect of the game.
    - gwd 1595 days ago
      > The execution between them was flawless though, acting as one mind whereas a team of human has to communicate ideas.
      A team of humans that has played together extensively will often intuitively know what the other human will do in a given situation without explicit communication. The fact that humans can also coordinate in unusual circumstances where intuition is insufficient is an advantage that the current version of the AI lacks.
- dx87 1595 days ago
  I don't think it's overstating it too much. Even with the restrictions, the bots are still performing well above the capabilities of the vast majority of human players. The restriction on illusions and summons was also for the benefit of the players; the OpenAI team didn't want the bots to win through flawless micro skills. For point 3, even though the players didn't get to train against bots, they have the advantage of being able to learn and react accordingly. Since the bots can only learn when their models are being trained, they're trivial to beat if you use a novel technique that they haven't seen before.
  [-]
  - gdxhyrd 1595 days ago
    If you want the bot to avoid winning by micro, you add delays, cooldowns on interactions, imperfection on clicking, misclicks, etc. on the level of a human pro.
    In any case, simplifying the game is usually done to make training far cheaper.
    [-]
    - john-radio 1595 days ago
      But the human competitors also require large amounts of training in order to competently play Dota 2, and their training is not simplified that training in a similar way. I realize that "fairness" is not really the point of having humans play against bots, but doesn't it damage the usefulness of the comparison from having them perform a human activity?
  - reroute1 1595 days ago
    The paper mentions not using illusions and summons because of the added technical complexity required to make them work:
    "We removed these to avoid the added technical complexity of enabling the agent to control multiple units."
    So maybe it could be to their advantage, but it's also not something that could be easily technically accomplished at this time?
    [-]
    - willhk 1595 days ago
      The work I've seen recently from them on multi-agent play seems to indicate this is a problem they're very successfully working on. https://youtu.be/kopoLzvh5jY
  - swfsql 1595 days ago
    just that by not leaving players walking randomly is already an astonishing accomplishment to my simple eyes.
- justicezyx 1595 days ago
  Plus, the match happened right after OG had roster changes, probably the weakest point of the team's history, well below the later achieved TI champion status, which is long after this match was held.
  Well, anyway, I am OK with some degree of PR additions...
iamjudged 1595 days ago
I’ve found various recent complex-strategy-game AI efforts very interesting, but always have one key complaint: They don’t properly ground the mechanical execution of their AI to realistic human levels for comparison. And if you give the AI an unfair mechanical advantage, your model isn't going to have to learn nearly as good of strategy. I will say however - this is the closest to realistic I have seen - but is still lacking.
The two main measurable parameters of performance are: 1 - reaction time 2 - rate/volume of actions (i.e. Actions Per Minute) And I would argue some there should be an additional consideration of some form of: 3 - mouse-click accuracy
I read through the details of the implementation, and they did decent at 1, 2 but overall need to do better.
Their reaction times end up as a random draw between 170-270ms. I think raw, simple visual reaction times for a pro gamer could be ~200ms, BUT that’s just for a simple “click this button when the light changes” type of tests. There are “complex reaction time” tests where you sometimes click, but other times don’t (eg a red or green light), and reaction times in that case are around ~400ms. I think if a pro is in a game situation where they anticipate their opponent will take some action and are ready to immediately respond, 200ms is a fair reaction time. But that’s not the usual state through a game, and the bot effectively has that perfect anticipation mindset at all times. So not crazy, superhuman reactions, but definitely not completely realistic/fair either.
In regard to action rate, they allow the model to take 1 action every 7.5 ms - which translates to 450 APM. The very best pro gamers are in the 300-350 APM range. And i think a humans actions include various thoughtless click spamming (which AI doesn’t need to do), as well as visual map movement/unit examination that an AI would not need as much of with a direct, comprehensive feed available information. So the sustained 450 APM seems pretty superhuman to me - BUT dota 2 is much less of a APM intensive game, and certainly sustained APM isn’t as important. And humans get get higher APM in important burst moments whereas this AI is at an exact fixed rate of 450 APM. So all-in-all, the APM is maybe fair (at least close to fair)
The mouse click accuracy piece, however is pretty unfair if the ai can make precise clicks across the screen with no affect to reaction time. This factor isn’t considered at all by the AI team. I feel they should either add in some randomization to simulate inaccuracy, or cause delayed reaction time based on how far the mouse would have to move.
With all these factors combined - I still feel this is not quite a fair test. But it’s closer than other’s I’ve seen, and it’s still a very impressive overall achievement! I’d love to see them go the small extra distance of constraining these mechanical performance parameters just a bit more. I feel that would make a BIG difference in the level of strategy required to beat the best humans. They’re SOOO close to amazing me!
[-]
- orbital-decay 1595 days ago
  Yeah, the low-level motor and sensory part is what's actually hard to get. Current AI is good enough to figure out the sensory stuff, but still works with the game input much more directly than humans. However for that to change, it needs a precise bio-mechanichal model of what the human players use; is something like this available at all?
  [-]
  - chii 1595 days ago
    create an actuated finger/mouse+keyboard combination which moves with realistic human speeds (e.g., signal speed and actuation speed). Have the AI output controls for this device (so the mouse has to be moved, rather than allow for precise x-y coordinate inputs like they have in the bot).