Camera Intrinsics & Extrinsics, Depth to Point Cloud

ASmartRobot · Post by **ASmartRobot** » Thu May 20, 2021 3:33 am

Hello,

Recently I've been working on a project which requires simulating a camera and obtaining the point cloud information from PyBullet. I have looked into the PyBullet way of handling cameras, and have got some initial results, but the results don't quite make sense. The following are the screenshots

Screenshot from 2021-05-19 23-11-55.png

This shows my task of trying to capture point clouds from the depth image. I have traced the rays from the camera center to each of the point as shown in the white lines, but strangely, they don't intersect with the table.

Screenshot from 2021-05-19 23-12-21.png

This one shows the point cloud visualization in Open3D. The extracted point cloud looks fine, although a bit tilted, and the table is successfully extracted.

My approach is the following:
0. obtain the intrinsics matrix and extrinsics matrix by the following equation: (assuming image has same height and weight, denoted by W)
fx = fy = W / tan(fov * PI / 180 / 2)
cx = cy = W/2 (center x, y)
(this is the standard intrinsics matrix as defined in http://ksimek.github.io/2013/08/13/intrinsic/)
obtain the extrinsics matrix by following:
http://ksimek.github.io/2012/08/22/extrinsic/
and
https://www.scratchapixel.com/lessons/m ... t-function.
Since I just need the pose/transform of the camera in the world frame, I simply obtain the pose matrix as:

T =
[
right_x, up_x, forward_x, Px
right_y, up_y, forward_y, Py
right_z, up_z, forward_z, Pz
0, 0, 0, 1
]

where right, up and forward are the corresponding vectors of the camera (also shown in the RGB colors in the screenshot), and [Px, Py, Pz] is the camera center position in the world frame.

1. from depth image pixel index, obtain the x, y and z locations in the camera frame by:
x_cam = -(j - cx) / dx * depth_image
y_cam = -(i - cx) / dx * depth_image
z_cam = -depth_image

2. transform from the locations in the camera frame to the world frame by:
T * [x_cam[k], y_cam[k], z_cam[k], 1]
(this is done for each of the point k, and then concatenated together.

I think my approach is reasonable, and should work if the intrinsic matrix and extrinsic matrix are given, which is usually the case when using ROS. However, as shown in the screenshot, the traced rays should go much further to intersect with the table.

I have spent several days on this problem, and it seems the resources online are limited regarding this issue. Any help or recommendation of resources would be much appreciated! Thank you!