I'm working on opening a door with a humanoid robot using reinforcement learning. To this extent, I created a gym environment that contains a room with a door (URDF description in room_with_door.txt) in which a humanoid robot (URDF description in humanoid.txt, as taken from pybullet_data) is spawned. The environment is included in the runfile (runfile_pybullet_help.txt) as a class definition. The files are uploaded as .txt as the site does not allow me to upload .urdf or .py extensions. To run these files locally, change xml_path in the environment class definition to the local folder in which the URDF files are contained and change the file extensions back to the correct ones (.urdf and .py respectively).
The door joints are controlled by torque control to implement a hybrid feedback loop (handle should be pushed before door can be opened). The robot is controlled using pybullet.POSITION_CONTROL, in which an agent is able to choose the desired joint positions. When testing the functionality of the environment by running it for a random agent, the behaviour is usually as expected (the robot starts flaying its limbs around, as to be expected for a random agent). However, after a seemingly random amount of time has passed, the humanoid pulls a superman (state explodes, see https://vimeo.com/user104809910/review/ ... 6d3efba7cd ), after which the environment is reset due to violating the condition on the cartesian torso/base coordinates.
I hypothesized this has to do with collision forces that start to grow unbounded (providing an impulse to the body), so I tried playing around with the solver parameters, e.g. step size, solver iterations and the default contact ERP to try and get a more accurate calculation of the contact forces and thus a more stable simulation. While I subjectively experienced that it took longer for the state to explode (on average) for certain settings than for other settings, I did not manage to remove the problem completely.
So my questions are
- What is causing this simulated robot behaviour?
- Is tuning the physics parameters and adequate solution to the problem, and if so, are there general guidelines for tuning these? (In the user manual, it is only stated that changing the time step induces the need to change the ERP value, which is tuned by default to work for a broad range of problems). Or should I try solving the problem through some other way?
Thanks!