Hardware support for CUDA is ubiquitous. Most gamer rigs have an NVIDIA GPU:
And nearly every current NVIDIA GPU supports CUDA.
But somehow CUDA projects remain obscure.
I was searching for CUDA projects a while back and kept finding scattered bits & pieces of information without any cohesive organization. It reminded me a bit of the era of x2ftp.oulu.fi, when finding comprehensive information about game development and demoscene techniques was an ordeal.
So I’m collecting some CUDA projects here to provide a breadcrumb trail for the next person to go down this path.
At some point I might write a bit of a UI to make it easier to search & filter these. But for now I’ve only got time for traditional HTML: just one long static vertical page of text and images.
(Note that this list leans toward real-time graphics, not simulations, AI/ML, crypto, or offline rendering.)
Contents
Particles
Igor Ševo’s A million particles in CUDA and OpenGL, running on a GeForce GTX 570:
Path tracing / ARTiOW
Simon Brown’s classic path-tracer with explicit direct lighting.
Adventures in CUDA Path Tracing: Part 1
Adventures in CUDA Path Tracing: Part 2
Roger Allen’s Accelerated Ray Tracing in One Weekend in CUDA.
CUDA port of Aras’s ‘toy path tracer’. Described in this blog series.
Henrik Dahlberg’s CUDA / OpenGL path tracer. With real-time update (here’s a blog post about implementing this with GLFW):
Beautiful. Here's some early footage from own #CUDA path tracer with camera movement. Not as optimized ;) https://t.co/kWDY3ill9k
— Henrik Dahlberg (@hdahlb) October 2, 2017
Andy Eder’s path tracer, running on a GTX 1080 Ti:
And here it is running much faster on a Turing-class RTX 2080 Ti (no RTX features were used):
Cyrille Favreau’s Sol-R supports CUDA or OpenCL.
Voxels
Dave Kotfis and Jiawei Wang’s voxel rendering class project compares CUDA rendering perf vs. the VoxelPipe library.
Sparse voxel octree in CUDA. Student work by Dave Kotfis and Jiawei Wang. https://t.co/4UZB8BaPSy pic.twitter.com/VFMVa50GI6
— Patrick Cozzi (@pjcozzi) December 8, 2014
Sven Forstmann’s RLE-based-Voxel-Raycasting / Voxlap method
@voxel-tracer has shared many CUDA rendering projects:
v-elev, render voxel elevation models (heightfields?).
a sphere tracer. Has the interesting constraint that it only draws scenes that fit in the 64KB constant memory on the GPU.
CUDA port of ‘Ray Tracing in One Weekend’. Achieves 0.55sec (non-realtime) render on a GTX 1050, with 10 bounces. With a single bounce, this can run real-time at 17 FPS.
Another CUDA port of ‘Ray Tracing in One Weekend’.
With CUDA and SDL, I can now change the position of the camera in real-time
— voxel raytracer (@voxel_tracer) October 2, 2017
based on @Peter_shirley raytracer
source https://t.co/0ebJZfwIx8 pic.twitter.com/LshxGfowLM
Fractals
Ashley Hauck’s Clam
floating point rounding shenanigans pic.twitter.com/kzquGwJU0A
— Ashley (@khyperia) July 27, 2019
Clam4 lives! I rewrote, for 8th or 9th time, my fractal raytracer in C# (with CUDA rendering engine) to make it more portable. pic.twitter.com/UTDiWr3IVO
— Ashley (@khyperia) March 25, 2017
Sergii Kharagorgiev’s fractal_demo described here is a 3D fractal renderer. There’s not much about it online (Sergii’s site is down), no screenshots.
Clouds, volumes, fluids
Peter Whidden’s Fat-Clouds, animations here, a fluid simulator in about 600 lines of code.
Raymarching
Miles Lacey’s basic ray marcher (depends on his cuda_math.h).
Even though the filename of the image this snippet generates is cuda_sphere.ppm
, it appears to render a torus:
Jonathan Granskog’s simpleCudaRayMarcher, described here is an unoptimized practice raymarcher that renders 3D fractals.
Jarl Larsson‘s KernelSanders is a DX11 raytracer and raymarcher.
Tokaspt family tree
Thierry Berger-Perrin is the originator of the ompf2 real-time raytracing form. Thierry wrote tokaspt (“The Once Known as SmallPT”), a CUDA port of SmallPT.
Tokaspt has had several offshoots:
Sam Lepere (RayTracey) based his 2011 tokap (“The Once Known as Pong”) on tokaspt. Tokap is a real-time path-traced Pong:
Optix-based projects
If I understand correctly, NVIDIA Optix is built on top of CUDA. So technically projects using Optix are running CUDA – albeit through Optix’s abstractions and not calling CUDA directly.
Jacco Bikker’s Lighthouse 2 real-time raytracing framework. LH2 relies on Optix 6.0 for its raytracing (and eventually other raytracing hardware / APIs).
Achieving 33 FPS on a 1060m:
Poor 1060 laptop running Lighthouse2 path tracer with Optix Prime / SVGF.
— Jacco Bikker (@j_bikker) March 1, 2019
New demo now on ompf2:https://t.co/4cFSGFZ34c pic.twitter.com/thYxEQzLik
Relying on Optix makes LH2’s raytracing infrastructure ‘optimal’ to a large degree - in the sense that it’s difficult to achieve faster performance with your own code:
"The ray tracing infrastructure (with related scene management acceleration structure maintenance) should be close to optimal." That's a bold statement or I'm reading it wrong.
— Dominik Susmel (@Keyframe) July 11, 2019
@ProgrammerLin’s voxel w/ GI demo, via OptiX 6.0 running on an NVIDIA GTX 2070 (w/ RTX):
Ingo Wald’s series on an Optix version of Raytracing in One Weekend:
OpenCL projects
OpenCL is the portable analogue to CUDA. It’s reportedly slower than CUDA (although some project report OpenCL to be faster).
Sven Forstmann’s Voxel Splatting using OpenCL. Achieves 2 Billion splats / second (!), about 20-30 FPS.
Sven Forstmann’s “Sparse Voxel Octree Raycasting with Image Warping exploiting Frame-to-Frame Coherence”. Achieves 30-50 FPS on a GTX 580M:
A slower, OpenGL-based version of Sven’s voxel splatting engine.
David Bucciarelli wrote several OpenCL demos, which he compared to their CPU counterpart:
SmallptGPU, described here. Performance comparison of SmallptGPU vs. the original Smallpt shows:
- 0.45M samples/second for Smallpt (CPU)
- 0.42M samples/second for SmallptGPU (running on CPU only, single-threaded)
- 4.5M samples/second for SmallptGPU (running on GPU)
The GPU runtime is 10x faster than single-threaded CPU. However, multi-threading would close the gap quite a bit. An 8 core machine would otensibly achieve nearly the same performance.
SmallptGPU (OpenCL) from David Bucciarelli on Vimeo.
MandelGPU (OpenCL) from David Bucciarelli on Vimeo.
JuliaGPU described here, based on QJulia.
JuliaGPU (OpenCL) from David Bucciarelli on Vimeo.
SmallLuxGPU 2.0 Preview from David Bucciarelli on Vimeo.
Official NVIDIA CUDA projects
There’s not a great guide to the CUDA toolkit samples, so I made my own by crawling the sources. It’s implemented as a (barebones) Datasette browser.
(That web app is running on a free Heroku Dyno instance which goes to sleep, so give it 5-20 seconds to boot if it’s slow to launch)
Voxels, no source code
There are several recent projects without source code available. They’re still worth checking out because they’re newer and show the current state of the art.
Jacco Bikker recently (first half of 2019) wrote a voxel raytracer. Achieves 200+ FPS on a 1060m.
This is a CUDA port of his earlier CPU voxel raytracer.
Voxel sphere ray tracing in CUDA, with a larger dataset. Running on a mobile 1060 here. Framerate should allow for a shadow ray per pixel, maybe a diffuse bounce? With some filtering this could work. :) pic.twitter.com/dhJSpFF5XY
— Jacco Bikker (@j_bikker) March 29, 2019
Ray tracing a voxel sphere. Article on doing so efficiently with bent rays / 3DDDA through voxel chunks organized in shells around the sphere:https://t.co/eW6pmpXG8f pic.twitter.com/JcSotnxsBT
— Jacco Bikker (@j_bikker) March 29, 2019
@ProgrammerLin dropped RTX / Optix support and implemented this directly in CUDA, achieving 100 FPS for single-bounce lighting:
Decided a while ago to drop Optix, so here's my first-ever Cuda program - a voxel path tracer with single-bounce lighting. Runs at ~100fps on my RTX 2070 and isn't using any RTX features. Not optimized yet either. Not bad for the first day! #gamedev #voxels #indiegamedev pic.twitter.com/sBx0e4aaEN
— Lin (@ProgrammerLin) January 6, 2019
Later, @ProgrammerLin ported this to RTX / Optix 6.0:
Got back into Optix with the recent release of 6.0. Here's an experiment with multiple materials. The jagged edges are because the diamond is voxelized without sharp features. pic.twitter.com/hydFnygAsu
— Lin (@ProgrammerLin) April 26, 2019
A more recent version, running at 30 FPS:
Spent a little bit of time today working on this project again. Now voxels emit light properly. There are still a couple minor issues to fix but it's actually usable! This was running constantly at 30 fps at 720p, 10 spp. #rtx #voxels #pathtracing #gamedev #indiegamedev pic.twitter.com/qcbYrwoiKN
— Lin (@ProgrammerLin) July 23, 2019