How to implement a Continuous Control of a quadruped robot with Deep Reinforcement Learning in Pybullet and OpenAI Gym?

rubencg195 · Post by **rubencg195** » Thu Dec 27, 2018 5:11 pm

Description

I have designed this robot in URDF format and its environment in pybullet. Each leg has a minimum and maximum value of movement.

What reinforcement algorithm will be best to create a walking policy in a simple environment in which a positive reward will be given if it walks in the positive X-axis direction?

The expected output from the policy is an array in the range of (-1, 1) for each joint. The input of the policy is the position of each joint, the center of mass of the body, the difference in height between the floor and the body to see if it has fallen and the movement in the x-axis.

Limitations

left_front_joint => lower="-0.4" upper="2.5" id=0

left_front_leg_joint => lower="-0.6" upper="0.7" id=2

right_front_joint => lower="-2.5" upper="0.4" id=3

right_front_leg_joint => lower="-0.6" upper="0.7" id=5

left_back_joint => lower="-2.5" upper="0.4" id=6

left_back_leg_joint => lower="-0.6" upper="0.7" id=8

right_back_joint => lower="-0.4" upper="2.5" id=9

right_back_leg_joint => lower="-0.6" upper="0.7" id=11

The code above is just a test of the environment with a manual set of movements hardcoded in the robot just to test how it could walk later. The environment is set to real time, but I assume it needs to be in a frame by frame lapse during the policy training.

A video of it can be seen in:

https://youtu.be/j9sysG-EIkQ

Code:

Code: Select all

    import pybullet as p
    import time
    import pybullet_data

    def moveLeg( robot=None, id=0, position=0, force=1.5  ):
        if(robot is None):
            return;
        p.setJointMotorControl2(
            robot,
            id,
            p.POSITION_CONTROL,
            targetPosition=position,
            force=force,
            #maxVelocity=5
        )

    pixelWidth = 1000
    pixelHeight = 1000
    camTargetPos = [0,0,0]
    camDistance = 0.5
    pitch = -10.0
    roll=0
    upAxisIndex = 2
    yaw = 0

    physicsClient = p.connect(p.GUI)#or p.DIRECT for non-graphical version
    p.setAdditionalSearchPath(pybullet_data.getDataPath()) #optionally
    p.setGravity(0,0,-10)
    viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
    planeId = p.loadURDF("plane.urdf")
    cubeStartPos = [0,0,0.05]
    cubeStartOrientation = p.getQuaternionFromEuler([0,0,0])
    #boxId = p.loadURDF("r2d2.urdf",cubeStartPos, cubeStartOrientation)
    boxId = p.loadURDF("src/spider.xml",cubeStartPos, cubeStartOrientation)
    # boxId = p.loadURDF("spider_simple.urdf",cubeStartPos, cubeStartOrientation)



    toggle = 1



    p.setRealTimeSimulation(1)

    for i in range (10000):
        #p.stepSimulation()
        

        moveLeg( robot=boxId, id=0,  position= toggle * -2 ) #LEFT_FRONT
        moveLeg( robot=boxId, id=2,  position= toggle * -2 ) #LEFT_FRONT

        moveLeg( robot=boxId, id=3,  position= toggle * -2 ) #RIGHT_FRONT
        moveLeg( robot=boxId, id=5,  position= toggle *  2 ) #RIGHT_FRONT

        moveLeg( robot=boxId, id=6,  position= toggle *  2 ) #LEFT_BACK
        moveLeg( robot=boxId, id=8,  position= toggle * -2 ) #LEFT_BACK

        moveLeg( robot=boxId, id=9,  position= toggle *  2 ) #RIGHT_BACK
        moveLeg( robot=boxId, id=11, position= toggle *  2 ) #RIGHT_BACK
        #time.sleep(1./140.)g
        #time.sleep(0.01)
        time.sleep(1)
        
        toggle = toggle * -1

        #viewMatrix        = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
        #projectionMatrix  = [1.0825318098068237, 0.0, 0.0, 0.0, 0.0, 1.732050895690918, 0.0, 0.0, 0.0, 0.0, -1.0002000331878662, -1.0, 0.0, 0.0, -0.020002000033855438, 0.0]
        #img_arr = p.getCameraImage(pixelWidth, pixelHeight, viewMatrix=viewMatrix, projectionMatrix=projectionMatrix, shadow=1,lightDirection=[1,1,1])

    cubePos, cubeOrn = p.getBasePositionAndOrientation(boxId)
    print(cubePos,cubeOrn)
    p.disconnect()

The complete code can be seen here:

https://github.com/rubencg195/WalkingSp ... Bullet_ROS

Images

PyBullet.png

spider(7).jpeg

youtube.png

Erwin Coumans · Post by **Erwin Coumans** » Mon Dec 31, 2018 9:27 pm

Thanks for sharing, that looks pretty cool!

You could try to use ARS, augmented random search or alternatively PPO.
Implement your environment as a Gym environment, with a reset, step etc.

Here is a simple ARS implementation:
https://github.com/bulletphysics/bullet ... t_envs/ARS

Also there is a PPO implementation:
https://github.com/bulletphysics/bullet ... nvs/agents
Edit the config.py accordingly.

You can also use OpenAI baselines instead, it has a PPO implementation.

Real-Time Physics Simulation Forum

How to implement a Continuous Control of a quadruped robot with Deep Reinforcement Learning in Pybullet and OpenAI Gym?

How to implement a Continuous Control of a quadruped robot with Deep Reinforcement Learning in Pybullet and OpenAI Gym?

Re: How to implement a Continuous Control of a quadruped robot with Deep Reinforcement Learning in Pybullet and OpenAI G