Hi,
I am using pybullet environments to benchmark some model-based RL algorithms. In my work, I access the reward function of the environments, so I usually write a function that takes as inputs a state and an action, and outputs the reward. Anyway for locomotion environments, it is not trivial to write this function on my own, looking at the pybullet github code. So my question is: has anyone implemented this kind of function for locomotion tasks?
Thank you!
Rewards function
-
- Posts: 4
- Joined: Wed Jun 03, 2020 2:13 pm
Re: Rewards function
I found another way to simulate the reward function of a MuJoCo environment without actually defining it. Basically, I create a new environment, as a "copy" of the first one, as explained here viewtopic.php?f=24&t=12855&p=42599, then I apply the action to the second env and collect the reward (to make this work, you also need to set the env.robot.pos_after equal to the one of the first env, otherwise the potential is different). After computing the reward, I close the env, since this step is repeated many times in my RL algorithm.
Anyway, after having computed 5 rewards, my loop is interrupted because pybullet is no longer connected to the physics server. The reason seems to be in a delay with env.close() as reported here viewtopic.php?t=12722.
How can I solve this bug?
I am using PyBullet 2.5.9 and Python 3.6.9
Anyway, after having computed 5 rewards, my loop is interrupted because pybullet is no longer connected to the physics server. The reason seems to be in a delay with env.close() as reported here viewtopic.php?t=12722.
How can I solve this bug?
I am using PyBullet 2.5.9 and Python 3.6.9
-
- Site Admin
- Posts: 4221
- Joined: Sun Jun 26, 2005 6:43 pm
- Location: California, USA
Re: Rewards function
That is a very old pybullet version, can you update to the latest and check again?
Code: Select all
pip3 install pybullet --upgrade --user
No, that is not the issue and not a bug.
The Gym env creates a connection during 'reset', not at gym.make, so you cannot use 'p' until after env.reset.
In that 'bug' he uses 'p' before calling reset.
Also, you should not mix a global pybullet 'p' with a bullet_client 'env._p'.
Can you create a small reproduction case and file it in https://github.com/bulletphysics/bullet3/issues
-
- Posts: 4
- Joined: Wed Jun 03, 2020 2:13 pm
Re: Rewards function
Hi,
Thank you for your reply. I downgraded to that version of pybullet because, when making an environment (e.g. env = gym.make('HalfCheetahMuJoCoEnv-v0')) using pybullet 2.8.1, I receive the error: ImportError: cannot import name 'bullet_client'. This does not happen with pybullet 2.5.9. Not sure why, but since I am rather new with pybullet, I trusted other people who suggested to downgrade the version.
Thank you for your reply. I downgraded to that version of pybullet because, when making an environment (e.g. env = gym.make('HalfCheetahMuJoCoEnv-v0')) using pybullet 2.8.1, I receive the error: ImportError: cannot import name 'bullet_client'. This does not happen with pybullet 2.5.9. Not sure why, but since I am rather new with pybullet, I trusted other people who suggested to downgrade the version.