Rewards function

Official Python bindings with a focus on reinforcement learning and robotics.
Post Reply
garcieri
Posts: 4
Joined: Wed Jun 03, 2020 2:13 pm

Rewards function

Post by garcieri »

Hi,

I am using pybullet environments to benchmark some model-based RL algorithms. In my work, I access the reward function of the environments, so I usually write a function that takes as inputs a state and an action, and outputs the reward. Anyway for locomotion environments, it is not trivial to write this function on my own, looking at the pybullet github code. So my question is: has anyone implemented this kind of function for locomotion tasks?

Thank you!
garcieri
Posts: 4
Joined: Wed Jun 03, 2020 2:13 pm

Re: Rewards function

Post by garcieri »

I found another way to simulate the reward function of a MuJoCo environment without actually defining it. Basically, I create a new environment, as a "copy" of the first one, as explained here viewtopic.php?f=24&t=12855&p=42599, then I apply the action to the second env and collect the reward (to make this work, you also need to set the env.robot.pos_after equal to the one of the first env, otherwise the potential is different). After computing the reward, I close the env, since this step is repeated many times in my RL algorithm.
Anyway, after having computed 5 rewards, my loop is interrupted because pybullet is no longer connected to the physics server. The reason seems to be in a delay with env.close() as reported here viewtopic.php?t=12722.

How can I solve this bug?

I am using PyBullet 2.5.9 and Python 3.6.9
User avatar
Erwin Coumans
Site Admin
Posts: 4221
Joined: Sun Jun 26, 2005 6:43 pm
Location: California, USA
Contact:

Re: Rewards function

Post by Erwin Coumans »

garcieri wrote: Fri Jun 05, 2020 12:13 pm I am using PyBullet 2.5.9 and Python 3.6.9
That is a very old pybullet version, can you update to the latest and check again?

Code: Select all

pip3 install pybullet --upgrade --user
>> delay with env.close() as reported here

No, that is not the issue and not a bug.
The Gym env creates a connection during 'reset', not at gym.make, so you cannot use 'p' until after env.reset.
In that 'bug' he uses 'p' before calling reset.
Also, you should not mix a global pybullet 'p' with a bullet_client 'env._p'.
Can you create a small reproduction case and file it in https://github.com/bulletphysics/bullet3/issues
garcieri
Posts: 4
Joined: Wed Jun 03, 2020 2:13 pm

Re: Rewards function

Post by garcieri »

Hi,
Thank you for your reply. I downgraded to that version of pybullet because, when making an environment (e.g. env = gym.make('HalfCheetahMuJoCoEnv-v0')) using pybullet 2.8.1, I receive the error: ImportError: cannot import name 'bullet_client'. This does not happen with pybullet 2.5.9. Not sure why, but since I am rather new with pybullet, I trusted other people who suggested to downgrade the version.
Post Reply