3D projection

A 3D projection is a mathemetical process to project a series of 3D shapes to a 2D surface, usually a computer monitor.

This particular algorithm is often referred to using the word render, even if usually this word refers to the more general task of taking some data from the computer memory and drawing it, in whatever way, on the computer screen.

The algorithm was a standard on early computer simulations and videogames, and it is still in use with heavy modifications for each particular case. This article describes the simple, general case.

Data necessary for projection
Data about the objects to render is usually stored as a collection of points, linked together in triangles. Each point is a series of three numbers, representing its X,Y,Z coordinates from an origin relative to the object they belong to. In addition, the object has three coordinates X,Y,Z and three angles alpha,theta and gamma, describing its position and orientation relative to a "world" reference frame.

Last come the observer (the term camera is the one commonly used). The camera has a second set of three X,Y,Z coordinates and three alpha,theta and gamma angles, describing the observer's position and the direction along which it is looking.

All this data is usually stored in floating point, even if many programs convert it to integers at various points in the algorithm, to speed up the calculations.

Mathematical tools
The 3D transformation makes heavy use of square matrices, with 4x4 dimensions, and trigonometric functions. Each step of the algorithm is a matrix multiplication, where the elements of the matrices are derived from the coordinates and angles listed above, and various combinations of sines and cosines. Matrices have 4 rows and 4 colums, and use homogenous coordinates, where a vector of three elements must be extended to four added a "1" element at its end.

Thanks to the associativity property of matrix multiplication, a program can pre-calculate many matrices, for example if it knows that some coordinate will never change.

Sometimes, a final "transformation matrix" valid for all points can be calculated, and then applied. This saves considerable time, since applying a matrix to a point uses only three or four multiplications, instead of the dozens necessary to multiply matrices together.

At the very least, a transformation matrix can be calculated for a single object and then applied to all points in that object.

First step: world transform
The first step is to transform the points coordinates taking into account the position and orientation of the object they belong to. This is done using a set of four matrices:

$$ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ x & y & z & 1 \end{bmatrix} $$ Object translation

$$ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & \cos \alpha & -\sin \alpha & 0 \\ 0 & \sin \alpha & \cos \alpha & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ Rotation about the X axis

$$ \begin{bmatrix} \cos \beta & 0 & \sin \beta & 0 \\ 0 & 1 & 0 & 0 \\ -\sin \beta & 0 & \cos \beta & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ Rotation about the Y axis

$$ \begin{bmatrix} \cos \gamma & -\sin \gamma & 0 & 0 \\ \sin \gamma & \cos \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $$ Rotation about the Y axis

The four matrices are multiplied together, and the result is the world transform matrix: a matrix that, if a point's coordinates where multiplied by it, would result in the point's coordinates being expressed in the "world" reference frame.

Second step: camera transform
The second step is virtually identical to the first one, except for the fact that it uses the six coordinates of the observer instead of the object. The resulting matrix can trasform coordinates from the world reference frame to the observer's one.

The two matrices obtained from the first two steps can be multiplied together to get a matrix capable of transforming a point's coordinates from the objet's reference framee to the observer's reference frame.

Third step: perspective transform
The resulting coordinates would be already good for an isometric projection or something similar, but realistic rendering requires an additional step to correctly simulate the perspective distortion. Indeed, this simulated perspective is the main aid for the viewer to judge distances in the simulated view.

A perspective distorsion can be generated using the following 4x4 matrix:

$$ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1/D & 1 \end{bmatrix} $$

where D is a positive number representing the distance of the observer from the focal plane. Larger values of D will result in flatter perspective.

All the calculated matrices can be multiplied together to get a final transformation matrix. One can multiply each of the points (represented as a vector of three coordinates) by this matrix, and directly obtain the screen coordinate at which the point must be drawn. The vector must be extended to four dimensions using homogenous coordinates:

$$ \begin{bmatrix} x \\ y \\ z \\ 1 \\ \end{bmatrix} \times \begin{bmatrix} \\ \mbox{...transformation matrix...} \\ \end{bmatrix} $$

Other algorithms required
The resulting array of points is OK from the point of view of simply drawing them on the screen, but more processing is needed to get a good image. See for example:


 * hidden face removal
 * Z buffer