I am trying to train the SimpleHumanoid to walk (although I will be happy to get it working with whatever the reward is set to now) using the OpenAI baselines algorithm, initially deepq since it is used in most examples (and ultimately I would compare different algorithms and actors).
I tried a basic
Code: Select all
env = SimpleHumanoidGymEnv(renders=True)
env.reset()
print("as")
print(env.action_space)
#model = deepq.models.cnn_to_mlp(
# convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
# hiddens=[256],
# dueling=False
#)
model=deepq.models.mlp([64])
act = deepq.learn(env,
q_func=model,
lr=1e-3,
max_timesteps=100000,
buffer_size=50000,
exploration_fraction=0.1,
exploration_final_eps=0.02,
print_freq=10,
callback=callback)
because the actions provided are a single integer instead of an array. I guess the model only computes one action and I don't know how to make generate do more. simpleHumanoidGymEnv.py's action_space is set to Discrete(9), I don't know if that's related and why that is.self.motors
[0, 1, 3, 5, 6, 7, 9, 12, 13, 14, 16, 19, 20, 22, 24, 25, 27]
num motors
17
actions
0
Traceback (most recent call last):
File "doodlinghuman.py", line 84, in <module>
main()
File "doodlinghuman.py", line 48, in main
callback=callback)
File "/usr/lib/python3.6/site-packages/baselines/deepq/simple.py", line 244, in learn
new_obs, rew, done, _ = env.step(env_action)
File "/usr/lib/python3.6/site-packages/pybullet_envs/bullet/simpleHumanoidGymEnv.py", line 87, in _step
self._humanoid.applyAction(action)
File "/usr/lib/python3.6/site-packages/pybullet_envs/bullet/simpleHumanoid.py", line 124, in applyAction
forces[m] = self.motor_power[m]*actions[m]*0.082
IndexError: invalid index to scalar variable.
I tried using the cnn-to-mlp deepq model that's shows in the racecarZED training example and got another error (ValueError: ('Convolution not supported for input with rank', 2), although I assume the convs parameter would need to be selected appropriately anyway. [unrelated] Note that after a few training iterations the racecarZED would stay in place and not learn anything so I couldn't train it either, the one that has access to the ball's coordinates did train properly but I feel like it's much less real-life-like and the task doesn't seem too crazy).
I'd greatly appreciate any help to train the humanoid, it must have been done because it completes some crazier tasks in the examples but its training code doesn't seem to be provided.