Training a Humanoid with the OpenAI baselines algorithms?

trougnouf · Post by **trougnouf** » Fri May 18, 2018 8:14 pm

Hello all,
I am trying to train the SimpleHumanoid to walk (although I will be happy to get it working with whatever the reward is set to now) using the OpenAI baselines algorithm, initially deepq since it is used in most examples (and ultimately I would compare different algorithms and actors).

I tried a basic

Code: Select all

    env = SimpleHumanoidGymEnv(renders=True)
    env.reset()
    print("as")
    print(env.action_space)
    #model = deepq.models.cnn_to_mlp(
    #    convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
    #    hiddens=[256],
    #    dueling=False
    #)
    model=deepq.models.mlp([64])
    act = deepq.learn(env,
                      q_func=model,
                      lr=1e-3,
                      max_timesteps=100000,
                      buffer_size=50000,
                      exploration_fraction=0.1,
                      exploration_final_eps=0.02,
                      print_freq=10,
                      callback=callback)

, however I get the following error:

self.motors
[0, 1, 3, 5, 6, 7, 9, 12, 13, 14, 16, 19, 20, 22, 24, 25, 27]
num motors
17
actions
0
Traceback (most recent call last):
File "doodlinghuman.py", line 84, in <module>
main()
File "doodlinghuman.py", line 48, in main
callback=callback)
File "/usr/lib/python3.6/site-packages/baselines/deepq/simple.py", line 244, in learn
new_obs, rew, done, _ = env.step(env_action)
File "/usr/lib/python3.6/site-packages/pybullet_envs/bullet/simpleHumanoidGymEnv.py", line 87, in _step
self._humanoid.applyAction(action)
File "/usr/lib/python3.6/site-packages/pybullet_envs/bullet/simpleHumanoid.py", line 124, in applyAction
forces[m] = self.motor_power[m]*actions[m]*0.082
IndexError: invalid index to scalar variable.

because the actions provided are a single integer instead of an array. I guess the model only computes one action and I don't know how to make generate do more. simpleHumanoidGymEnv.py's action_space is set to Discrete(9), I don't know if that's related and why that is.

I tried using the cnn-to-mlp deepq model that's shows in the racecarZED training example and got another error (ValueError: ('Convolution not supported for input with rank', 2), although I assume the convs parameter would need to be selected appropriately anyway. [unrelated] Note that after a few training iterations the racecarZED would stay in place and not learn anything so I couldn't train it either, the one that has access to the ball's coordinates did train properly but I feel like it's much less real-life-like and the task doesn't seem too crazy).

I'd greatly appreciate any help to train the humanoid, it must have been done because it completes some crazier tasks in the examples but its training code doesn't seem to be provided.

benelot · Post by **benelot** » Tue May 22, 2018 12:50 pm

Hi,

You should try to train the HumanoidBulletEnv-v0. This is the one that constitutes a proper gym env whereas the other is deprecated now. I will soon make a pull request on how to easily train all envs with tensorforce. So I will check if I can easily give an example on how to train it with baselines as well. I will keep you posted.

Real-Time Physics Simulation Forum

Training a Humanoid with the OpenAI baselines algorithms?

Training a Humanoid with the OpenAI baselines algorithms?

Re: Training a Humanoid with the OpenAI baselines algorithms?