Relation between image point , 3D counterpart and the general plane

Question

So I'm a bit stuck on a problem that wants me to show the relation between any image point and it's corresponding 3D point can be represented by a 3x3 matrix. My idea was to use the general form of the camera model which has the 3D point transposed multiplied with the extrinsic and intrinsic factors to get the image point, but there are quite a few unknowns(image center, focus length and the like). I'm having issues with trying to put the general form of a plane and using that to transform the image point into it's 3D counterpart. I'm not entirely sure about how the general form of a plane can fit in into this sort of transformation, so I do believe that's the sticking issue that I'm having trouble with.

Update: I'm using the pinhole camera model in the most general form: (x1,x2,x3)T = MintMext (Xw, Yw, Zw, 1)T

I’m going to guess that the camera is at the origin in the scene, otherwise this can’t be described by a $3\times3$ matrix. When back-projecting from the image, the best you can do is map an image point to a ray emanating from the camera, which can be represented by a 3-D vector. I suggest starting with the projection matrix $P=K[I\mid0]$ and working out the inverse map piece by piece. For a camera that isn’t sighting along the $z$-axis, $I$ gets replaced by some orthogonal transformation, which should be easy to accommodate once you’ve worked out the simpler case. — amd, Apr 04 '18 at 00:44
So I understand the basic idea of mapping an image point to a ray emanating from the camera, its the central point of what I want to do. The projection matrix there I don't believe I've seen that used before as a concept, it's understandable. I'm just trying to think through the inverse map with using the projection matrix alongside the general form of a plane, I couldn't really find info on the plane and make it connect in my head. — Marorin Q, Apr 04 '18 at 03:04
It would be helpful if you updated your question with more detail about the camera model that you’re working with. Then someone ought to be able to point you in the right direction. — amd, Apr 04 '18 at 03:57

amd · Answer 1 · 2018-04-05T01:30:52.627

First, some notation: upper-case bold letters for homogeneous coordinate vectors of points in $\mathbb{RP}^3$ and lower-case bold for points in $\mathbb{RP}^2$; a tilde over the symbol will indicate the corresponding inhomogeneous Cartesian coordinate vector in $\mathbb R^3$ and $\mathbb R^2$, respectively. We have the projection $\mathbf x = \mathtt P\mathbf X$ from the world to the image. I’m assuming a finite camera, so that $\mathtt P$ is a full-rank $4\times3$ matrix. The columns of this matrix are designated $\mathbf p_1$ through $\mathbf p_4$.

The back-projection of an image point $\mathbf x$ is a world ray that emanates from the camera center $\mathbf C$. (If you don’t have the center handy, you can compute it from $\mathtt P$ using the fact that $\mathtt P\mathbf C=0$.) By decomposing $\mathtt P$ into $[\mathtt M\mid\mathbf p_4]$, we find that $[(\mathtt M^{-1}\mathbf x)^T; 0]^T$ is the point at infinity that projects to $\mathbf x$. The back-projected ray is then the join of this point and the camera center, $\tilde{\mathbf C}+\lambda\mathtt M^{-1}\mathbf x = \mathtt M^{-1}(\lambda \mathbf x-\mathbf p_4)$ in inhomogeneous Cartesian coordinates. This back-mapping can’t be represented by a $3\times3$ matrix, but if you assume that $\tilde{\mathbf C}$ is the origin, the inhomogeneous direction vector of the ray is enough to describe it, and that’s just $\mathtt M^{-1}\mathbf x$.

There is a different decomposition of $\mathtt P$ that connects it more transparently to the image plane, although it’s not nearly as convenient as the above decomposition. In case you didn’t know, a plane with implicit Cartesian equation $ax+by+cz+d=0$ can be represented by the homogeneous vector $\mathbf\Pi=[a,b,c,d]^T$ in $\mathbb{RP}^3$: just write the equation as $\mathbf\Pi^T\mathbf x=0$. Central projection onto $\mathbf\Pi$ relative to the viewpoint $\mathbf C$ is given by the matrix $$\mathtt M=\mathbf C\mathbf\Pi^T-(\mathbf C^T\mathbf\Pi)\mathtt I_4.$$ (When $\tilde{\mathbf C}=0$ this matrix has a particularly simple form.) The camera projection transformation can then be viewed as central projection onto the image plane $\mathbf\Pi$ followed by an affine transformation $\mathtt A$ that maps the image plane onto the $x$-$y$ plane, and finally deletion of the $z$-coordinate, i.e., $$\mathtt P = \begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&0&1\end{bmatrix} \mathtt A \mathtt M.$$ To back-project an image point $\mathbf x$, we can reverse the last two steps, producing a point on the image plane and then, assuming again that the camera is at the world origin, delete the last coordinate of the result to get the ray’s direction vector in $\mathbb R^3$. (Technically, we should project the point on the image plane onto the plane at infinity first, but that projection is just a matter of setting the last coordinate to zero.) This transformation cascade is accomplished by the $3\times3$ matrix $$\begin{bmatrix}1&0&0&0\\0&1&0&0\\0&0&1&0\end{bmatrix} \mathtt A^{-1} \begin{bmatrix}1&0&0\\0&1&0\\0&0&0\\0&0&1\end{bmatrix},$$ which is just $\mathtt A^{-1}$ with its last row and third column deleted. $\mathtt A$ can be derived from the world-to-camera transformation and the camera’s intrinsic matrix, but I won’t go into the details here.

Relation between image point , 3D counterpart and the general plane

1 Answers1