I have been meaning to write up this fun little experiment for a while now, after working on previous articles related to Reinforcement Learning (RL) and Evolution Strategies (ES) that you may have read before from a previous HN discussion.
While I was trying to solve a few RL/ES problems, I found the BipedalWalkerHardcore task extremely frustrating to solve (actually much harder than all of the standard MuJoCo tasks that most papers are based on), but eventually was able to crack it  after some effort. I was thinking perhaps the agent's body was not really suited to solve this task, and even minor tweaks here and there would result in making the task easier for an RL algorithm to learn a good set of parameters of the agent's controller neural network to perform well on the task.
There has been an exciting line of work on Passive Robotics where researchers such as Tad McGeer and Steve Collins made walking robots that walked on their own naturally without using any external power, unlike complicated, inefficient robots like the Asimo that had motors everywhere for controlling each joint that is all managed by a central computer. In some ways, many standard RL tasks we see are similar to the Asimo model where we train a neural network to control a fixed, pre-determined robot. I thought it might be an interesing little experiment if I also allowed the RL algorithm to not only learn the parameters of the neural network controller, but also learn a set of parameters that describe the structure of the agent's body at the same time.
We also see work done using Evolution, such as Strandbeests, virtual creatures by Karl Sims and Soft Robots, where novel morphology is being discovered (an excellent course on evolutionary robotics by Josh Bongard ). While RL is great at many problems, I feel a limitation of RL is to discover novel structures, although there have been recent attempts. But at the same time, RL is also much more sample efficient at learning the search space of a pre-defined design, which is what this article tries to explore starting with using only the simplest of all RL algorithms. Hopefully it will spark more life and discussion in the area of morphology learning and generative design in the RL community.
Was wondering if you're also releasing your slightly modified version of the Open AI gym that lets you tweak the environment? I hacked something together for myself but am wondering if you have a cleaner solution.
There’s only 8 params for the biped walker, compared to thousands of params for the policy network. In this case it should be possible to scale to the thousands of environment parameters. For instance if we want to learn the material configuration of each voxel of a soft body robot assembled from thousands of voxels.