top of page

Building and Optimizing a Path Tracer for Fun

I feel like I take game engines for granted. It's doing all the rendering work for me. And as I delve deeper into the world of game development, I want to ensure I have a strong understanding of fundamental concepts such as path tracing. In this post I walk through my time building a basic path tracer, for fun!

Understanding Path Tracing

Path tracing is a type of ray tracing algorithm used in computer graphics to simulate global illumination. The algorithm works by tracing rays of light from the camera through each pixel in the image plane and simulating their interactions with the scene's objects. This process generates realistic images by accounting for the complex behavior of light, such as reflection, refraction, and scattering.

Ray Tracing vs Path Tracing

Ray tracing simulates light by tracing individual rays from a camera to objects in a scene, calculating reflections and refractions. Path tracing is a more advanced form of ray tracing, accounting for a wider range of light interactions and producing more realistic global illumination by tracing multiple light paths per ray.

Learnings from "Physically Based Rendering: From Theory to Implementation"

Rendering Equation

Understanding the rendering equation was crucial for grasping the concept of light transport in a scene. This understanding formed the basis for the path tracing algorithm, guiding the implementation of the ray tracing loop and the calculation of the final color at each pixel.

The outgoing radiance at a point p in direction ωo is equal to the integral over the hemisphere of the product of three terms:

  1. The BRDF (Bidirectional Reflectance Distribution Function) f at p for the pair of directions ωo and ωi (where ωi is the direction of incoming light).

  2. The incoming radiance Li at p from direction ωi.

  3. The cosine of the angle θi between the direction ωi and the surface normal at p.

This integral is then summed over all light paths, with the directions ωi sampled from a distribution with respect to solid angle that has a probability density function p(ωi).

Monte Carlo Integration

Learning about Monte Carlo integration allowed me to tackle the complex lighting and materials present in real-world scenes. The technique was employed to approximate the integral form of the rendering equation, enabling the generation of realistic images with global illumination and accurate reflections.

BRDFs and Material Models

Incorporating the knowledge of Bidirectional Reflectance Distribution Functions (BRDFs) and various material models, such as Lambertian, Phong, and GGX, helped me create a diverse set of realistic materials for the path tracer. This resulted in visually appealing images and facilitated the exploration of different material properties and their impact on the rendered output.

Acceleration Structures

Applying the concepts of acceleration structures, like BVH, significantly improved the performance of the path tracer. By optimizing ray-object intersection tests, the rendering time was reduced, allowing for faster iterations and more complex scenes.

Project Structure

I chose C++ as my programming language and the linear algebra library Eigen. The basic structure of my project included the following components:

  • Main.cpp

  • Scene: Represents the 3D world containing the camera and objects.

  • Camera: Defines the perspective from which the image is rendered.

  • Ray: Represents a ray of light.

  • Object: A base class for all 3D objects in the scene.

  • Sphere, Plane, Triangle: Different types of objects, inheriting from Object.

  • Material: Defines how an object interacts with light.

  • Intersection: Holds information about the intersection of a ray with an object.

Implementing the Path Tracer

The core of the path tracer algorithm can be broken down into the following steps:

1. Generate rays from the camera:

For each pixel in the output image, generate a ray that originates from the camera's position and passes through the corresponding point on the image plane.

Ray Camera::generateRay(int x, int y, int width, int height) const {
    float u = (float(x) + 0.5f) / width;
    float v = (float(y) + 0.5f) / height;
    Vector3f dir = (u * right + v * up + distance * forward).normalized();
    return Ray(position, dir);

2. Trace the rays through the scene:

Rays are traced through the scene to determine their interactions with objects recursively, accounting for multiple bounces of light.

Color3f traceRay(const Scene& scene, const Ray& ray, int depth) {
    Intersection intersection;
    if (!scene.intersect(ray, intersection)) {
        return scene.getBackgroundColor();

    Color3f radiance(0);
    if (depth < MAX_DEPTH) {
        Ray scatteredRay = intersection.material->scatter(ray, intersection);
        radiance = intersection.material->emission + 
                   traceRay(scene, scatteredRay, depth + 1) *
                   intersection.material->BRDF(ray, scatteredRay, intersection);
    return radiance;

3. Accumulate radiance values:

Radiance values from all traced rays are accumulated and averaged to compute the final pixel color.

Color3f renderPixel(const Scene& scene, int x, int y, int width, int height, int samples) {
    Color3f accumulatedRadiance(0);
    for (int s = 0; s < samples; s++) {
        Ray ray =, y, width, height);
        accumulatedRadiance += traceRay(scene, ray, 0);
    return accumulatedRadiance / samples;

4. Results and Optimizations

I tested the path tracer on a simple scene containing spheres with different materials, such as diffuse, reflective, and refractive. The results were visually good but the rendering was very slow, as path tracing is computationally expensive.

I was curious on how to improve the performance of my renderer. I wanted to improve the rendering capibility .

To illustrate the impact of the optimizations, I conducted a series of benchmark tests on a scene containing multiple spheres with different materials. The tests were performed on a computer with an Intel Core i7-9700K CPU and 16 GB of RAM. The resolution of the output image was set to 800x600 pixels, and the maximum ray depth was set at 5.

Without any optimizations, the path tracer took approximately 1800 seconds (30 minutes) to render the scene with 256 samples per pixel.

a. Multithreading

Multithreading enables concurrent execution of multiple threads within a single process, allowing tasks to be efficiently distributed across multiple CPU cores.

std::vector<std::thread> threads;
for (int t = 0; t < NUM_THREADS; t++) {
    threads.push_back(std::thread(renderThread, std::ref(scene), std::ref(image), t, NUM_THREADS, width, height, samples));
for (auto& thread : threads) {

By utilizing 8 threads for rendering the image in parallel, the rendering time was significantly reduced to 225 seconds (3 minutes and 45 seconds), representing an 8-fold speedup.

b. Russian Roulette

To reduce the number of ray bounces and improve rendering speed, I implemented Russian Roulette, a technique that probabilistically terminates rays based on their contribution to the final image.

bool russianRoulette(const Color3f& radiance, float& survivalProbability) {
    float maxRadiance = std::max(std::max(radiance[0], radiance[1]), radiance[2]);
    survivalProbability = std::min(1.0f, maxRadiance);
    return rand01() < survivalProbability;

To reduce the number of ray bounces and improve rendering speed, I implemented Russian Roulette, a technique that probabilistically terminates rays based on their contribution to the final image. Implementing Russian Roulette led to an average reduction of 30% in the total number of rays traced, resulting in a rendering time of approximately 157 seconds (2 minutes and 37 seconds). This represents an additional 1.43x speedup compared to the multithreaded implementation without Russian Roulette.

c. Bounding Volume Hierarchy (BVH)

To speed up ray-object intersection tests, I implemented a BVH. This data structure organizes the scene objects in a tree, enabling faster intersection tests by testing only a subset of objects.

The BVH consists of two types of nodes - internal nodes and leaf nodes. An internal node has two child nodes, while a leaf node stores one or more objects from the scene. Each node also stores an AABB that encloses its contents.

struct BVHNode {
    AABB boundingBox;
    std::unique_ptr<BVHNode> left;
    std::unique_ptr<BVHNode> right;
    std::vector<Object*> objects;

To build the BVH, we use a top-down approach, recursively dividing the scene objects into two groups and creating child nodes for each group. The division can be based on different heuristics, such as the median or the surface area of the bounding boxes. The process continues until a stopping criterion is met, such as a maximum depth or a minimum number of objects per leaf node.

std::unique_ptr<BVHNode> buildBVH(std::vector<Object*>& objects, int start, int end, int depth) {
    if (end - start <= MIN_OBJECTS_PER_LEAF || depth >= MAX_DEPTH) {
        return createLeafNode(objects, start, end);

    AABB currentBoundingBox = calculateBoundingBox(objects, start, end);

    int splitIndex = partitionObjects(objects, start, end);

    std::unique_ptr<BVHNode> left = buildBVH(objects, start, splitIndex, depth + 1);
    std::unique_ptr<BVHNode> right = buildBVH(objects, splitIndex, end, depth + 1);

    return createInternalNode(currentBoundingBox, std::move(left), std::move(right));

To perform a ray-object intersection test using the BVH, we start at the root node and traverse the tree, testing the ray against the bounding boxes of the child nodes. If the ray intersects a node's bounding box, we continue the traversal with its child nodes. If the node is a leaf, we perform intersection tests with the objects contained in the node.

bool BVHNode::intersect(const Ray& ray, Intersection& intersection) const {
    if (!boundingBox.intersect(ray)) {
        return false;

    bool hit = false;
    if (isLeaf()) {
        for (const auto& object : objects) {
            hit |= object->intersect(ray, intersection);
    } else {
        hit |= left->intersect(ray, intersection);
        hit |= right->intersect(ray, intersection);

    return hit;

With the BVH optimization in place, the average number of ray-object intersection tests was reduced by 70%, leading to a rendering time of 94 seconds (1 minute and 34 seconds). This indicates a further 1.67x speedup compared to the multithreaded implementation with Russian Roulette.

Combining all optimizations, the rendering time was reduced from 1800 seconds (baseline) to 94 seconds, resulting in a total speedup of approximately 19x.



bottom of page