Page 1 of 1

How to implement a Continuous Control of a quadruped robot with Deep Reinforcement Learning in Pybullet and OpenAI Gym?

Posted: Thu Dec 27, 2018 5:11 pm
by rubencg195

I have designed this robot in URDF format and its environment in pybullet. Each leg has a minimum and maximum value of movement.

What reinforcement algorithm will be best to create a walking policy in a simple environment in which a positive reward will be given if it walks in the positive X-axis direction?

The expected output from the policy is an array in the range of (-1, 1) for each joint. The input of the policy is the position of each joint, the center of mass of the body, the difference in height between the floor and the body to see if it has fallen and the movement in the x-axis.


left_front_joint => lower="-0.4" upper="2.5" id=0

left_front_leg_joint => lower="-0.6" upper="0.7" id=2

right_front_joint => lower="-2.5" upper="0.4" id=3

right_front_leg_joint => lower="-0.6" upper="0.7" id=5

left_back_joint => lower="-2.5" upper="0.4" id=6

left_back_leg_joint => lower="-0.6" upper="0.7" id=8

right_back_joint => lower="-0.4" upper="2.5" id=9

right_back_leg_joint => lower="-0.6" upper="0.7" id=11

The code above is just a test of the environment with a manual set of movements hardcoded in the robot just to test how it could walk later. The environment is set to real time, but I assume it needs to be in a frame by frame lapse during the policy training.

A video of it can be seen in:


Code: Select all

    import pybullet as p
    import time
    import pybullet_data

    def moveLeg( robot=None, id=0, position=0, force=1.5  ):
        if(robot is None):

    pixelWidth = 1000
    pixelHeight = 1000
    camTargetPos = [0,0,0]
    camDistance = 0.5
    pitch = -10.0
    upAxisIndex = 2
    yaw = 0

    physicsClient = p.connect(p.GUI)#or p.DIRECT for non-graphical version
    p.setAdditionalSearchPath(pybullet_data.getDataPath()) #optionally
    viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
    planeId = p.loadURDF("plane.urdf")
    cubeStartPos = [0,0,0.05]
    cubeStartOrientation = p.getQuaternionFromEuler([0,0,0])
    #boxId = p.loadURDF("r2d2.urdf",cubeStartPos, cubeStartOrientation)
    boxId = p.loadURDF("src/spider.xml",cubeStartPos, cubeStartOrientation)
    # boxId = p.loadURDF("spider_simple.urdf",cubeStartPos, cubeStartOrientation)

    toggle = 1


    for i in range (10000):

        moveLeg( robot=boxId, id=0,  position= toggle * -2 ) #LEFT_FRONT
        moveLeg( robot=boxId, id=2,  position= toggle * -2 ) #LEFT_FRONT

        moveLeg( robot=boxId, id=3,  position= toggle * -2 ) #RIGHT_FRONT
        moveLeg( robot=boxId, id=5,  position= toggle *  2 ) #RIGHT_FRONT

        moveLeg( robot=boxId, id=6,  position= toggle *  2 ) #LEFT_BACK
        moveLeg( robot=boxId, id=8,  position= toggle * -2 ) #LEFT_BACK

        moveLeg( robot=boxId, id=9,  position= toggle *  2 ) #RIGHT_BACK
        moveLeg( robot=boxId, id=11, position= toggle *  2 ) #RIGHT_BACK
        toggle = toggle * -1

        #viewMatrix        = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
        #projectionMatrix  = [1.0825318098068237, 0.0, 0.0, 0.0, 0.0, 1.732050895690918, 0.0, 0.0, 0.0, 0.0, -1.0002000331878662, -1.0, 0.0, 0.0, -0.020002000033855438, 0.0]
        #img_arr = p.getCameraImage(pixelWidth, pixelHeight, viewMatrix=viewMatrix, projectionMatrix=projectionMatrix, shadow=1,lightDirection=[1,1,1])

    cubePos, cubeOrn = p.getBasePositionAndOrientation(boxId)

The complete code can be seen here: ... Bullet_ROS

PyBullet.png (94.99 KiB) Viewed 1873 times
spider(7).jpeg (91.42 KiB) Viewed 1873 times
youtube.png (171.82 KiB) Viewed 1873 times

Re: How to implement a Continuous Control of a quadruped robot with Deep Reinforcement Learning in Pybullet and OpenAI G

Posted: Mon Dec 31, 2018 9:27 pm
by Erwin Coumans
Thanks for sharing, that looks pretty cool!

You could try to use ARS, augmented random search or alternatively PPO.
Implement your environment as a Gym environment, with a reset, step etc.

Here is a simple ARS implementation: ... t_envs/ARS

Also there is a PPO implementation: ... nvs/agents
Edit the accordingly.

You can also use OpenAI baselines instead, it has a PPO implementation.