Thursday, December 30, 2010

2010, an excellent year for raytracing!

What an exciting year this has been, for raytracing at least. There has been a huge buzz around accelerated ray tracing and unbiased rendering, in which the GPU has played a pivotal role. A little overview:

- Octane Render is publicly announced. A demo is released which lets many people experience high quality unbiased GPU rendering for the first time. Unparallelled quality and amazing rendertimes on even a low-end GTX8800, catch many by surprise.

- Arion, the GPU sibling of Random Control's Fryrender, is announced shortly after Octane. Touts hybrid CPU+GPU unbiased rendering as a distinguishing feature. The product eventually releases at a prohibitively expensive price (1000€ for 1 multi-GPU license)

- Luxrender's OpenCL-based GPU renderer SmallLuxGPU integrates stochastic progressive photon mapping, an unbiased rendering method which excels at caustic-heavy scenes

- Brigade path tracer is announced, a hybrid (CPU+GPU) real-time path tracer aimed at games. Very optimized, very fast, user-defined quality, first path tracer with support for dynamic objects. GI quality greatly surpasses virtual point light/instant radiosity based methods and even photon mapping, can theoretically handle all types of BRDF, is artefact free (except for noise) and nearly real-time. No screen-space limitations. The biggest advantage over other methods is progressive rendering which instantly gives a good idea of the final converged image (some filtering and LOD scheme, similar to VoxLOD, could produce very high quality results in real-time). Very promising, it could be the best option for high-quality dynamic global illumination in games in 2 to 3 years.

- release of Nvidia Fermi GPU: caches and other enhancements (e.g. concurrent kernel execution) give ray tracing tasks an enormous boost, up to 3.5x faster in scenes with many incoherent rays compared to the previous architecture. Design Garage, an excellent tech demo featuring GPU path tracing is released alongside the cards

- Siggraph 2010 puts heavy focus on GPU rendering

- GTC 2010: Nvidia organizes a whole bunch of GPU ray tracing sessions covering OptiX, iray, etc.

- John Carmack re-expresses interest in real-time ray tracing as an alternative rendering method for next-generation games (besides sparse voxel octrees). He even started twittering about his GPU ray tracing experiments in OpenCL:

- GPU rendering gets more and more criticized by the CPU rendering crowd (Luxology, Next Limit, their userbase, ...) feeling the threat of decreased revenue

- release of mental ray's iray

- release of V-Ray RT GPU, the product that started the GPU rendering revolution

- Caustic Graphics is bought by Imagination Technologies, the maker of PowerVR GPU. A surprising and potentially successful move for both companies. Hardware accelerated real-time path tracing at very high sampling rates (higher than on Nvidia Fermi) could become possible. PowerVR GPUs are integrated in Apple TV, iPad, iPhone and iPod Touch, so this is certainly something to keep an eye on in 2011. Caustic doesn't disappoint when it comes to hype and drama :)

- one of my most burning questions since the revelation of path tracing on GPU, "is the GPU capable of more sophisticated and efficient rendering algorithms than brute force path tracing?" got answered just a few weeks ago, thanks to Dietger van Antwerpen and his superb work on GPU-based Metropolis light transport and energy redistribution path tracing.

All in all, 2010 was great for me and delivered a lot to write about. Hopefully 2011 will be at least equally exciting. Some wild speculation of what might happen:

- Metropolis light transport starts to appear in commercial GPU rendering software (very high probability for Octane)
- more news about Intel's Knight's Corner/Ferry with maybe some perfomance numbers (unlikely)
- Nvidia launches Kepler at the end of 2011 which offers 3x path tracing performance of Fermi (to good to be true?)
- PowerVR GPU maker and Caustic Graphics bring hardware accelerated real-time path tracing to a mass audience through Apple mobile products (would be great)
- Luxology and Maxwell Render reluctantly embrace GPU rendering (LOL)
- finally a glimpse of OTOY's real-time path tracing (fingers crossed)
- Brigade path tracer gains exposure and awareness with the release of the first path traced game in history (highly possible)
- ...


Monday, December 27, 2010

Global illumination with Markov Chain Monte Carlo rendering in Nvidia Optix 2.1 + Metropolis Light Transport with participating media on GPUs

Optix 2.1 was released a few days ago and includes a Markov Chain Monte Carlo (MCMC) sample, which only works on Fermi cards (New sample: MCMC - Markov Chain Monte Carlo method rendering. A global illumination solution that requires an SM 2.0 class device (e.g. Fermi) or higher).

MCMC rendering methods, such as MLT (Metropolis light transport) and ERPT (energy redistribution path tracing) are partially sequential because each path of a Markov chain depends on the previous path and is therefor more difficult to parallellize for GPUs than standard Monte Carlo algorithms. This is an image of the new MCMC sampler included in the new Optix SDK, which can be downloaded from

There is also an update on the Kelemen-style Metropolis Light Transport GPU renderer from Dietger van Antwerpen. He has released this new video showing Metropolis light transport with participating media running on the GPU:

This scene is straight from the original Metropolis light transport paper from Veach and Guibas ( Participating media (like fog, smoke and god rays) are one of the most difficult and compute intensive phenomena to simulate accurately with global illumination, because it is essentially a volumetric effect in which light scattering occurs. Subsurface scattering belongs to the same category of expensive difficult-to-render volumetric effects. The video shows it can now be done in almost real-time with MLT. which is pretty impressive!

Friday, December 24, 2010

Move over OTOY, here comes the new AMD tech demo!

June 2008: Radeon HD 4870 launches with the OTOY/Cinema 2.0/Ruby tech demo featuring voxel raytracing. It can't get much closer to photorealism than this... or can it?

December 2010: Radeon HD 6970 launches with this craptastic tech demo. Talk about progress. Laughable fire effects, crude physics with only a few dozen dynamic objects, pathetic Xbox 1 city model and lighting, uninspired Mecha design. It may be just a tech demo but this is just a disgrace for a high-tech GPU company. Well done AMD! Now where the hell is that Cinema 2.0 Ruby demo you promised dammit? My HD 4890 is almost EOL and already LOL :p

Sunday, December 19, 2010

GPU-accelerated biased and unbiased rendering

Since I've seen the facemeltingly awesome youtube video of Kelemen-style MLT+bidirectional path tracing running on a GPU, I'm quite convinced that most (if not all) unbiased rendering algorithms can be accelerated on the GPU. Here's a list of the most common unbiased algorithms which have been ported successfully to the GPU:

- unidirectional (standard) path tracing: used by Octane, Arion, V-Ray RT GPU, iray, SmallLuxGPU, OptiX, Brigade, Indigo Render, a bunch of renderers integrating iray, etc. Jan Novak is one of the first to report a working implementation of path tracing on the GPU (implemented with CUDA on a GTX 285, The very first paper reporting GPU path tracing is "Stochastic path tracing on consumer graphics cards" from 2008 by Huwe and Hemmerling (implemented in GLSL).
- bidirectional path tracing (BDPT):, I think Jan Novak, Vlastimil Havran and Carsten Dachsbacher made this work as well in their paper "Path regeneration for interactive path tracing"
- Metropolis Light Transport (MLT)+BDPT:
- energy redistribution path tracing (ERPT):,
- (stochastic) progressive photon mapping (SPPM): used by SmallLuxGPU, there's also a GPU-optimised parallellised version on Toshiya Hachisuka's website, CUDA, OpenCL

Octane, my fav unbiased GPU renderer, will also implement an MLT-like rendering algorithm in the next verion (beta 2.3 version 6), which is "coming soon". I gathered some interesting quotes from radiance (Octane's main developer) regarding MLT in Octane:

“We are working on a firefly/caustic capable and efficient rendering algorithm, it's not strictly MLT but a heavily modified version of it. Trust me, this is the last big feature we need to implement to have a capable renderer, so it's our highest priority feature to finish.”

“MLT is an algorithm that's much more efficient at rendering complex scenes, not so efficient at simple, directly lit scenes (eg objects in the open). However MLT does sample away the fireflies.”

“The fireflies are a normal side effect of unbiased rendering, they are reflective or refractive caustics. We're working on new algorithms in the next version that will solve this as it will compute these caustics better.”

“they are caustics, long paths with a high contribution, a side effect of unbiased path tracing. MLT will solve this problem which is in development and slated for beta 2.3”

“the pathtracing kernel already does caustics, it's just not very efficient without MLT, which will be in the next 2.3 release.”

“lights (mesh emitters) are hard to find with our current algorithms, rendertimes will severely improve with the new MLT replacement that's coming soon.”

“it will render more efficiently [once] we have portals/MLT/bidir.”

All exteriors render in a few minutes clean in octane currently. (if you have a decent GPU like a medium range GTX260 or better). Interiors is more difficult, requires MLT and ultimately bidir path tracing. However, with plain brute force pathtracing octane is the same or slightly faster than a MLT/Bidir complex/heavily matured [CPU] engine, which gives good promise for the future, as we're working on those features asap.

With all unbiased rendering techniques soon possible and greatly accelerated on the GPU, what about GPU acceleration for biased production rendering techniques (such as photon mapping and irradiance caching)? There have been a lot of academic research papers on this subject (e.g. Purcell, Rui Wang and Kun Zhou, Fabianowski and Dingliani, McGuire and Luebke, ...), but since it's a lot trickier to parallellize photon mapping and irradiance caching than unbiased algorithms while still obtaining production quality, it's still not quite ready for integration in commercial software. But this will change very soon imo: on the ompf forum I've found a link to a very impressive video showing very high-quality CUDA-accelerated photon mapping

This is a render of Sponza, 800x800 resolution, rendered in 11.5 seconds on 1 GTX 470! (image taken from

11 seconds for this quality and resolution on just one GPU is pretty amazing if you ask me. I'm sure that further optimizations could bring the rendertime down to 1 second. The video also shows real-time interaction (scale, rotate, move, delete) with objects from the scenery (something that could be extended to support many dynamic objects via HLBVH). I could see this being very useful for real-time production quality global illumination using a hybrid of path tracing for exteriors and photon mapping for interiors, caustics, point lights.

Just like 2010 was the year of GPU-accelerated unbiased rendering, I think 2011 will become the year of heavily GPU-accelerated biased rendering (photon mapping in particular).

Wednesday, December 15, 2010

Real-time Metropolis Light Transport on the GPU: it works!!!!

This is probably the most significant news since the introduction of real-time path tracing on the GPU. I've been wondering for quite a while if MLT (Metropolis Light Transport) would be able to run on current GPU architectures. MLT is a more efficient and more complex algorithm than path tracing for rendering certain scenes which are predominantly indirectly lit (e.g. light coming through a narrow opening, such as a half-closed door, and illuminating a room), a case in which path tracing has much difficulty to find "important" contributing light paths. For this reason, it is the rendering method of choice for professional unbiased renderers like Maxwell Render, Fryrender, Luxrender, Indigo Render and Kerkythea Render.

Dietger van Antwerpen, an IGAD student who co-developed the Brigade path tracer and who also managed to make ERPT (energy distribution ray tracing) run in real-time on a Fermi GPU, has posted two utterly stunning and quite unbelievable videos of his latest progress:

- video 1 showing a comparison between real-time ERPT and path tracing on the GPU:

ERPT on the left, standard path tracing (PT) on the right. Light is coming in from a narrow opening, a scenario in which PT has a hard time to find light paths and converge, because it randomly samples the environment. ERPT shares properties with MLT: once it finds an important light path, it will sample nearby paths via small mutations of the found light path, so convergence is much faster.

- video 2 showing Kelemen-style MLT (an improvement on the original MLT algorithm) running in real-time on the GPU. The video description mentions Kelemen-style MLT on top of bidirectional path tracing (BDPT) with multiple importance sampling, pretty amazing.
Kelemen-MLT after 10 seconds of rendering at 1280x720 on a single GTX 470. The beautiful caustics are possible due to bidirectional path tracing+MLT and are much more difficult to obtain with standard path tracing.

These videos are ultimate proof that current GPUs are capable of more complex rendering algorithms than brute-force standard path tracing and can potentially accelerate the very same algorithms used in the major unbiased CPU renderers. This bodes very well for GPU renderers like Octane (which has its own MLT-like algorithm), V-Ray RT GPU, SmallLuxGPU and iray.

If Dietger decides to implement these in the Brigade path tracer we could be seeing (quasi) noise-free, real-time path traced (or better "real-time BDPT with MLT" traced) games much sooner than expected. Verrrry exciting stuff!! I think some rendering companies would hire this guy instantly.

Friday, December 10, 2010

Voxels again

Just encountered a very nice blog about voxel rendering, sparse voxel octrees and massive procedural terrain rendering:

The author has made some video's of the tech using OpenCL, showing the great detail that can be achieved when using voxels:, It does look a bit like the atomontage engine.

Wednesday, December 1, 2010

OnLive just works, even for those darn Europeans!

I think this deserves it's own post. Someone (Anonymous) told me that the OnLive service can be accessed and played from EU countries as well, so I gave it a try and downloaded and installed the tiny OnLive plug-in. To my surprise, I actually got it working on a pretty old PC (just a Pentium 4 at 3GHz). I was flabbergasted. I'm about 6000 miles away from the OnLive servers and it's still running! The quality of the video stream was more than decent and smoother than when I try to decode 720p Youtube videos which my system just cannot handle.

My first impression: I love it, now I'm absolutely positive that this is the very near future for video games. It's a joy to watch others play and to start playing the same game within seconds! I've tried some Borderlands, Splinter Cell Conviction and FEAR2. There is some lag, because I'm about 6000 miles away from the OnLive server (I got a warning during log-in that my connection has huge latency), but I could nevertheless still enjoy the game. About half a second (or less) passes between hitting the shoot key and seeing the gun actually shoot, and when moving your character . I must say though that I got used to the delay after a while, and I anticipated my moves by half a second. My brain notices the delay during the first minutes of play, but I forgot about it after a while and just enjoyed the game. I think that if I can enjoy an OnLive game from 6000 miles away, then US players, who live much closer to the OnLive servers, have got to have an awesome experience. The lag could also be due to my own ancient PC (which is not even dual core) or to the local network infrastructure here in Belgium even though I have a pretty big bandwidth connection. I can't wait until they deploy their EU servers. Image quality is very variable, I guess it's partly because of my PC, which cannot decode the video stream fast enough. FEAR 2 looked very sharp though. The image looks best when you're not moving the camera and just stare at the scene. The recently announced MicroConsole seems to offer very good image quality from what I've read.

I think that cloud gaming will give an enormous boost to the graphics side of games and that photorealistic games will be here much sooner thanks to cloud rendering and it's inherent rendering efficiency (especially when using ray tracing, see the interview with Jules Urbach). My biggest gripe with consoles like Xbox and Playstation is that they stall graphics development for the duration of the console cycle (around 5 years), especially the latest round of consoles. With the exception of Crysis, PC games don't make full use of the latest GPUs which are much more powerful than the consoles. I just ran 3DMark05 some days ago, and it's striking me that this 5-year old benchmark still looks superior than any console game on the market. I truely hope that cloud gaming will get rid of the fixed console hardware and free up game developers (and graphics engineers in particular) to go nuts, because I'm sick of seeing another Unreal Engine 3 powered game.

I also think that OnLive will not be the only player and that there will be fierce competition between several cloud gaming services, each with their own exclusive games. I can imagine a future with multiple cloud gaming providers such as OnLive, OTOY, Gaikai, PlayStation Little Big Cloud, Activision Cloud of Duty, EA Battlefield Cloud of Honor, UbiCloud, MS Red Ringing Cloud of Death (offering Halo: RROD exclusively), Valve Strrream... To succeed they would have to be accessible for free (just like OnLive is now), without monthly subscription fees.

All in all, it's an awesome experience and it's going to open up gaming for the masses and will give a new meaning to the word "video" game. The incredible ease of use (easier than downloading a song from the iTunes Store) will attract vast audiences and for this reason I think it's going to be much bigger than Wii and will completely shake up the next-gen console landscape (Wii2, PS4 and Xbox 2.5/720/RROD2/...). MS, Sony and Nintendo better think twice before releasing a brand new console.

Be it OnLive, OTOY, Gaikai or any other service, I, for one, welcome our new cloud gaming overlords!

Friday, November 26, 2010

Enter the singularity blog

This is just an awesome blog: (originally located at and moved since June 2010).

The author currently has a job at OnLive, the cloud-gaming service, but before working there he used to write about voxel raytracing, sparse voxel octrees and the like. I assume he's working on some super-secret game graphics technology involving voxels and ray tracing, targeted to run on the cloud. Another main theme of the blog is how the brain works and how and when artificial intelligence could match and even surpass the capacity of the human brain (something which I find even more interesting than real-time path tracing). To achieve this dream (or nightmare, depending on your view), you would need a gigantic amount of computing power and cloud servers will probably first to fulfill that requirement. Could it be that one day OnLive will turn into SkyNet? ;-)

Such a suprahuman intelligence could help scientists think about cures for cancer, stem cell research, nanotechnology, the Palestine-Israelian conflict and could help them understand their wives ;-).

Thursday, November 18, 2010

The OnLive MicroConsole, a prelude to the death of the classic console?

OnLive is launching its MicroConsole on Dec 2, and the few reviews that are out are all raving about the little thing.

Lag is apparently not an issue at all and unnoticeable. Compression artefacts are there, but no dealbreaker and image quality will steadily improve in the coming years with better network speeds.

It's easy to see the huge possibilities that cloud gaming could offer: when it becomes successful, big publishers like EA and Activision could completely by-pass the classic consoles from money sucking Microsoft and Sony by hosting their own cloud game servers and stream their Battlefield and Call of Duty (2014 edition) directly to consumers, saving a shitload of dollars in the process. They could even host their games exclusively in the cloud, effectively rendering the Xbox and PlayStation obsolete.

And when everything has moved to the cloud, path traced games could finally make a breakthrough and become mainstream, due to the greater efficiency and "unlimited" rendering power of the cloud architecture.

Good times ahead!

Wednesday, November 10, 2010

New interview with Jules Urbach (in German)

Link to an interview with Jules Urbach from October:

Google translation from German

There's an interesting part about latency:

For encoding and decoding of 1080p we just need a millisecond, which is negligible. A data packet travels the nearly 4,000 miles between New York and Los Angeles in 85 milliseconds, in addition there are about eight milliseconds input delay. So we are below our targeted 100 milliseconds which doesn't matter for a 3D shooter like Crysis. If we distribute our hosting offerings to five strategically placed data centers in the U.S., the total latency drops to around 30 milliseconds. Our biggest concern is therefor not so much the connection speed, but server-scaling. After all, it costs a lot of money to provide one GPU per user. This is where virtualization comes into play: for example, on a modern graphics card we can run eight instances of the CryEngine. We also do not stream games 24 hours a day. Only five to ten minutes go by, until enough data is present in the local cache (on the client side) to run it from there. A title such as Lego Batman is streamed in the background in a single minute on the client computer, which again frees up resources on the server. This works on both the Mac and the PC - and we are working on a special mini-hardware that does the job without a local computer.

So one of OTOY's goals is to stream the game data to your local PC or Mac, but you can instantly start playing the game remotely, while it's streaming to your local system until it can run locally.

Tuesday, November 2, 2010

A GPU friendly Metropolis algorithm

Just came across this paper on parallelizing the Independent Metropolis Hastings (IMH) algorithm:
The fundamental idea in the current paper is that one can take advantage of the parallel abilities of arbitrary computing machinery, from cloud computing to graphical cards (GPU), in the case of the generic IMH algorithm, producing an output that corresponds to a much improved Monte Carlo approximation machine at the same computational cost.
This could be very interesting for all the GPU renderers out there.

Nvidia GTX 580 to launch on Nov 9!

It looks like this card came out of nowhere, I hadn't heard much about it until a few weeks ago. Hopefully this launch event will include a new and exciting ray tracing tech demo à la Design Garage (after the rather disappointing launch of the Radeon HD 6800 cards with no tech demo's whatsoever). Maybe something OTOY/voxel ray tracing related ;D?

Saturday, October 30, 2010

In-depth interview with Jules Urbach about OTOY

An anonymous reader of my blog, called RC, (yes, there are actually people who read my blog ;-) pointed me to this very interesting interview with Jules Urbach conducted by Research 2.0:
Many thanks for this, RC!

Some very interesting excerpts from the interview:
We can run 48 first person shooters at 60 fps on a single 1U server through ORBX. That is with legacy games that have not been optimized for our service (i.e. games that we run out of the box, without any modifications). When developers target our platform (through tools such as our raytracing pipeline), concurrent usage gets closer to 100 users per GPU. That is before you factor in local rendering power offloaded on the client.

The greater efficiency for rendering games in the cloud when using ray tracing could be the killer argument to start using ray tracing instead of rasterization (besides the simplicity of the code and the greater realism)!

OTOY's patnership with AMD, Nvidia and Intel:
SW: OTOY has been working closely with AMD. What are the major advantages of AMD’s technology relative to Nvidia and Intel?

JU: We were very deliberate in choosing to go down this path with AMD. We tested early versions of ORBX on Nvidia GPUs, x86 CPUs, and AMD GPUs. We settled on CAL as our core development platform (CAL is AMD‟s low level computing language). It was very challenging to program a GPU using CAL, which is not officially supported by AMD. But we were seeing amazing speeds which we could not replicate on other architectures.

As our company has evolved, so have our relationships with other major hardware vendors. This year, we‟ve added Intel and Nvidia as partners. More will follow. We announced Intel this summer. Obviously, we need CPUs on our servers as well as GPUs. And, in that respect, Intel has a very compelling offering. Intel is also developing hardware cards made from densely packed x86 cores which we may use in the future.

We are officially announcing our partnership with Nvidia in a few weeks. We have been working with them on a version of ORBX that will be deployed on Nvidia hardware in 2011. This is not trivial, given that ORBX‟s speed has, up until now, come from functionality that is specific to AMD hardware. But, from a practical perspective, we would be ignoring a significant portion of the professional graphics market if we didn‟t support CUDA applications. Adobe Photoshop, CS5, and countless other apps only support CUDA.

and about unbiased rendering:
Just as I see rendering moving towards an unbiased model that becomes as simple as photography, I can imagine high performance computing and software development becoming equally democratized.

Monday, October 25, 2010

What happened to all the great graphics blogs?

I'm talking about farrarfocus (Timothy Farrar), repiblog, level of detail, enter the singularity, too many pixels... no more updates in ages. Did Twitter kill blogging? Or is interesting graphics development just slowing down? I think it's the former :( On the other hand, Carmack uses Twitter to chat about his graphics endeavours, so all is not bad.

Tuesday, October 19, 2010

OTOY and path tracing for games in the cloud UPDATE: video presentation!

Here's a link to Jules Urbach's presentation about OTOY at GTC 2010:

Some interesting stuff in there:

- "Games/Apps –100% in the cloud by 2014" !!! this doesn't sound too far-fetched considering the success of OnLive and the growing interest from major game publishers and developers in the cloud platform.

- "High level web services enable path-tracing and LightStage rendering in any 3D engine"

- "Crytek engine on Facebook"

Real-time path tracing and GPU cloud servers, it's a match made in heaven (or in the cloud actually) .

UPDATE: Found a link to the full video presentation from Jules Urbach at GTC in HD + there's also a Q&A session afterwards:

Notice Jules talking about unbiased rendering at 04:26 and path tracing at around 08:20!

Monday, October 18, 2010

Minecraft path traced in real-time with Brigade!

Very nice work again from Bikker and co. A level from Minecraft, the current hype in indie games, that is path traced in near real-time using the Brigade path tracer! Enjoy:

The path traced lighting does look completely different from the lighting in the original Minecraft game, which isn't bad but not nearly as realistic.

UPDATE: here's another video with much improved importance sampling:

This new video shows that even with the current hardware, there is still a lot of potential left to reduce noise and improve the image quality of real-time path traced graphics through better algorithms (importance sampling, maybe ERPT or something similar to MLT) and filtering methods.

Thursday, September 30, 2010

Octane Render preparing to smite the competition with its MLT equivalent

Octane Render, the ultra-fast unbiased GPU renderer (made in Belgium just like me :-)) is soon going to introduce a new MLT-(Metropolis light transport)-like algorithm, which will make the rendering of certain difficult scenes with small light sources much more efficient: the scene will converge much faster, with less noise and will kill fireflies (bright pixels as a consequence of long paths from reflective caustics).

MLT is the base rendering algorithm used by unbiased CPU renderers like LuxRender, Maxwell Render, Fryrender, Indigo Renderer and Kerkythea.

Making Metropolis light transport (or an equivalent) work on current GPUs was thought by many to be impossible and it was one of the main criticisms from GPU rendering skeptics such as Luxology (Modo) and Next Limit (Maxwell Render), who believe that GPUs can only do dumb, inefficient path tracing and nothing more. Luckily there's Octane Render to prove them wrong. The fact that it has taken the developer such a long time to make it work shows that it's quite tricky to develop. Octane Render is currently also the only renderer (to my knowledge) that will utilise a more sophisticated rendering algorithm.

On a sidenote, ERPT (energy redistribution path tracing) is also possible on the GPU, as described in one of my previous posts. It combines the advantages of Monte Carlo path tracing and Metropolis light transport to allow faster convergence with less noise and can achieve fantastic results, which look indistinguishable from the path traced reference (see Timo Aila, a graphics researcher at Nvidia and GPU ray tracing genius, is also working on real-time Metropolis light transport (

Octane's MLT-like algorithm has been hinted at by its developer since the unveiling of the software in January 2010, and it should be here very soon (within a couple of weeks, post will be updated when that happens). I'm very curious to see the first results.

Future GPU architectures, like Kepler and Maxwell, should make the implementation of MLT-like algorithms on the GPU much easier, but it's nice to see at least one developer trying to squeeze the maximum out of current GPUs, bending their compute capability until it breaks.

Saturday, September 25, 2010

Kepler and Maxwell: ray tracing monsters thanks to CPU and GPU cores on the same chip?

At GTC 2010, Nvidia announced their future GPUs named Kepler and Maxwell. One of the more interesting quotes:
"Between now and Maxwell, we will introduce virtual memory, pre-emption, enhance the ability of the GPU to autonomously process, so that it's non-blocking of the CPU, not waiting for the CPU, relies less on the transfer overheads that we see today. These will take GPU computing to the next level, along with a very large speed up in performance," said Jen-Hsun Huang.

Pre-emption was already revealed in a slide from a presentation by Tony Tomasi at Nvision08 (, depicting a timeline showing pre-emption, full support for function pointers, C++, etc. :

The part about "the ability of the GPU to autonomously process, so that it's non-blocking of the CPU, not waiting for the CPU, relies less on the transfer overheads that we see today" is very interesting and suggests the incorporation of CPU cores on the GPU, as shown in a slide from an Nvidia presentation at SC09 (

There's also this live chat with Bill Dally:
We all know that Intel and AMD are looking at merging CPU cores and GPUs on the same die.
In my mind, the future is for hybrid computing, where different kind of processors working together and find their own kind of tasks to work on. Currently, multi-core CPU and many-core GPU are working together, tasks are distributed by software schedulers. Data parallel tasks are assigned to GPUs and task-parallel jobs are assigned to GPUs. However, communication between these two kinds of processors is the performance bottleneck. I hope NVIDIA can provide a solution on their desktop GPU product line too.

Bill Dally:
That's exactly right. The future is heterogeneous computing in which we use CPUs (which are optimized for single-thread performance) for the latency sensitive portions of jobs, and GPUs (which are optimized for throughput per unit energy and cost) for the parallel portions of jobs. The GPUs can handle both the data parallel and the task parallel portions of jobs better than CPUs because they are more efficient. The CPUs are only needed for the latency sensitive portions of jobs - the serial portions and critical sections.

Do you believe a time will come when GPU and CPU are on the same chip or "board" it seems the logical next step to avoid the huge PCI-E latency and have a better GPU-CPU interactivity ? i know there is ongoing research in this area already ...but what is your personal opinion on the possibility and benefits of this ?"

Bill Dally:
Our Tegra processors already combine CPUs and a GPU on a single chip. For interactivity what's important is not the integration but rather having a shared memory space and low latency synchronization between the two types of cores.
I don't see convergence between latency-optimized cores and throughput optimized cores. The techniques used to optimize for latency and throughput are very different and in conflict. We will ultimately have a single chip with many (thousands) of throughput cores and a few latency-optimized cores so we can handle both types of code.

From the above slide, Nvidia expects to have 16 CPU cores on the GPU by 2017, deducing from that you would get:

- 2017: GPU with 16 CPU cores
- 2015: GPU with 8 CPU cores
- 2013: Maxwell with 4 CPU cores
- 2011: Kepler with 2 CPU cores

My bet is that Kepler will at least have one and probably two (ARM based) CPU cores and Maxwell will probably have 4 CPU cores on the GPU. The inclusion of true CPU cores on the GPU will make the CPU-GPU bandwidth problem of today obsolete and will enable smarter ray tracing algorithms like Metropolis light transport and bidirectional path tracing on the GPU. Biased rendering methods such as photon mapping and irradiance caching will be easier to implement. It will also give a tremendous performance boost to the (re)building of acceleration structures and to ray tracing of dynamic geometry, which will no longer depend on the slow PCIe bus. Apart from ray tracing, most other general computation tasks will also benefit greatly. I think this CPU/GPU combo chip will be Nvidia's answer to AMD's oft-delayed Fusion and Intel's Sandy Bridge.

Wednesday, September 22, 2010

OTOY at GPU Technology Conference, partnering with Nvidia, Intel and AMD

It's been a long time since OTOY was in the news, but the company will resurface at GTC. Jules Urbach of OTOY will be speaking in a session about emerging companies. Apparently there's no exclusive deal any longer between OTOY and AMD according to this article at Venturebeat.

OTOY will also make use of CUDA in the future which is great news!!! Hopefully this will speed up adoption of the technology by a factor of 10 to 50x ;-)

UPDATE: here's the full PR release:

OTOY to Present Enterprise Cloud Platform at NVIDIA GPU Technology Conference

OTOY will unveil its Enterprise Cloud platform at the GPU Technology Conference this week. The platform is designed to enable developers to leverage NVIDIA CUDA, PhysX and Optix technologies through the cloud.

Santa Clara, CA (PRWEB) September 23, 2010

OTOY announced that it will unveil its Enterprise Cloud platform at the GPU Technology Conference this week. The platform is designed to enable developers to leverage NVIDIA CUDA, PhysX and Optix technologies through the cloud. OTOY's proprietary ORBX GPU codec will enable high performance 3D applications to render on a web server and instantly stream to any thin client.

OTOY is participating in the GTC “Emerging Companies Summit,” a two-day event for developers, entrepreneurs, venture capitalists, industry analysts and other professionals.

OTOY Enterprise Cloud platform
The OTOY Enterprise Cloud platform sandboxes an application or virtual machine image without degrading or limiting GPU performance. CUDA-powered applications, such as Adobe's Creative Suite 5, will be able to take full advantage of GPU acceleration while streaming from a virtual OTOY session.

OTOY bringing GPGPU to the browser
In addition to supporting CUDA through its server platform, OTOY's 4k web plug-in adds CUDA and OpenCL compliant scripting across all major web browsers, including Internet Explorer, Mozilla FireFox, Google Chrome, Apple Safari and Opera. GPU web applets that cannot run locally are executed and rendered using OTOY server side rendering. This ensures that GPU web applets can be viewed on any client, including HTML 4 browsers.

Next generation rendering tools coming to developers
OTOY enables server hosted game engines to render LightStage assets and leverage distributed GPU ray-tracing or path-tracing in the cloud. The OTOY Enterprise Cloud platform can host complete game engine SDKs, making game deployment to Facebook or other web portals simple and instantaneous.

OTOY will add native support for CryEngine content in 2011, starting with Avatar Reality's Blue Mars. Blue Mars is the first virtual world built using the Crytek engine. It is currently in beta testing on the OTOY platform.

About OTOY
OTOY is a leading developer of innovative software solutions for GPU and CPU hardware, as well as a provider of convergence technologies for the video game and film industries. OTOY works with a wide range of movie studios, game developers, hardware vendors and technology companies.

OTOY integrated in CryEngine and supporting distributed GPU ray tracing and path tracing in the cloud!! The dream of real-time ray traced or even path traced games is getting closer every day! I do hope that OTOY will deliver this dream first, they have all the right technology and partners now.

Friday, September 10, 2010

Small update on the Brigade real-time path tracer

Jacco Bikker has released two new video's of the progress with his real-time path tracer named Brigade, demonstrating some kind of game where a truck has to push gold containers or something. Looks fun:

1. direct lighting (32 samples per pixel):
2. one bounce of indirect lighting (16 spp):

There is also an update from Dietger van Antwerpen on the GPU path tracer (subsystem of Brigade path tracer) running with the more advanced ERPT (energy redistribution path tracing) algorithm. He has improved the ERPT code to produce virtually identical results to the path traced reference and released a high quality image with it (ERPT on the left and path tracing on the right):

Explanation from Dietger van Antwerpen in the description at youtube:
"After some complains pointing out that in the movie, ERPT is significantly darker then path tracing , I fixed the darkening effect of the ERPT image filter, solving the difference in lighting quality. I made an image ( ) using ERPT for the left half, while using path tracing for the right half and waited until the path tracing noise almost vanished. As you can see, the lighting quality between the left and right half is pretty much the same. (The performance and convergence characteristics remain unchanged)"
It would be interesting to know the time for ERPT and for path tracing to achieve these results.

He also released a new video showing improvements to the GPU ERPT code:

As the videos show, ERPT converges considerably faster than standard path tracing and the noise is significantly reduced. Very cool and very impressive. I wonder if the optimized ERPT code will be used in Brigade for real-time animations and games.

Wednesday, September 8, 2010

VoxLOD: interactive ray tracing of massive polygon models with voxel based LOD + Monte Carlo global illumination!

This is really amazing technology:


Some quotes from the author's blog:
"a real-time massive model visualization engine called VoxLOD, which can handle data sets consisting of hundreds of millions of triangles.

It is based on ray casting/tracing and employs a voxel-based LOD framework. The original triangles and the voxels are stored in a compressed out-of-core data structure, so it’s possible to explore huge models that cannot be completely loaded into the system memory. Data is fetched asynchronously to hide the I/O latency. Currently, the renderer runs entirely on the CPU.


I’ve implemented shadows in VoxLOD, which has thus become a ray tracer. Of course, level-of-detail is applied to the shadow rays too.

While shadows make the rendered image a lot more realistic, the parts in shadow are completely flat, devoid of any details, thanks to the constant ambient light. One possible solution is ambient occlusion, but I wanted to go further: global illumination in real-time.

GI in VoxLOD is very experimental and unoptimized for now. It’s barely interactive: it runs at only 1-2 fps at 640×480 on my dual-core Core i7 notebook. Fortunately, there are lots of optimization opportunities. Now let’s see an example image:

Note that most of the scene is not directly lit, and color bleeding caused by indirect lighting is clearly visible. There are two light sources: a point light (the Sun) and a hemispherical one (the sky). I use Monte Carlo integration to compute the GI with one bounce of indirect lighting. Nothing is precomputed (except the massive model data structure of course).

I trace only two GI rays per pixel, and therefore, the resulting image must be heavily filtered in order to eliminate the extreme noise. While all the ray tracing is done on the CPU, the noise filter runs on the GPU and is implemented in CUDA. Since diffuse indirect lighting is quite low frequency, it is adequate to use low LODs for the GI rays."
The author of this engine has also written a paper entitled: "Interactive Out-of-Core Ray Casting of Massive Triangular Models with Voxel-Based LODs"

There is a very interesting graph in this paper, which shows that when using LOD, the cost of ray casting remains constant once a certain number of triangles (0.5M) is reached:

Quoting the paper,
"by using LODs, significantly higher frame rates can be achieved with minimal loss of image quality, because ray traversals are less deep, intersections with voxels are implicit, and memory accesses are more coherent. Furthermore, the LOD framework can also reduce the amount of aliasing artifacts, especially in case of highly tesselated models."

"Our LOD metric could be also used for several types of secondary rays, including shadow, ambient occlusion, and planar reflection rays. One drawback of this kind of metric is that it works only with secondary rays expressible as linear transformations. Because of this, refraction and non-planar reflection rays are not supported."
Implemented on the GPU, this tech could be the ideal solution for real-time raytracing in games:

- it makes heavy use of LOD for primary, shadow and GI rays which greatly reduces their tracing cost

- LOD is generated automatically by the voxel data structure

- nearby geometry is represented by triangles, so there isn't any voxel blockiness on close inspection

- characters and other dynamic geometry can still be represented as triangles and as such avoid the difficulties with animating voxels

- huge immensely detailed levels are possible thanks to the out-of-core streaming of the voxels and triangles

- it uses Monte Carlo GI, which scales very easily (number of bounces + number of samples per pixel) and can be filtered, while still giving accurate results

This is certainly something to keep an eye on!

Saturday, September 4, 2010

Nvidia research chats about GPU ray tracing

A couple of days ago there was a live chat with David Luebke and Bill Dally from Nvidia, on Nvidia's nTersect blog with GPU ray tracing as one of the main subjects ( Below are some of the questions and answers related to GPU ray tracing and rendering:

Are there any plans to add fixed function raytracing hardware to the GPU?

David Luebke:
Fixed-function ray tracing hardware: our group has definitely done research in this area to explore the "speed of light", but my sense at this time is that we would rather spend those transistors on improvements that benefit other irregular algorithms as well.

ray-triangle intersection maps well to the GPU already, it's basically a lot of dot products and cross products. ray traversal through an acceleration structure is an interesting proxy for lots of irregular parallel computing workloads : there is abundant parallelism but it is branchy and hard to predict. Steps like Fermi's cache and unified memory space are great examples of generic hardware improvements that benefit GPU ray tracing as well as many other workloads (data mining, tree traversal, collision detection, etc)

When do you think real-time ray tracing of dynamic geometry will become practical for being used in games?
David Luebke:
ray tracing in games: I think Jacopo Pantaleoni's "HLBVH" paper at High Performance Graphics this year will be looked back on as a watershed for ray tracing of dynamic content. He can sort 1M utterly dynamic triangles into a quality acceleration structure at real-time rates, and we think there's more headroom for improvement. So to answer your question, with techniques like these and continued advances in GPU ray traversal, I would expect heavy ray tracing of dynamic content to be possible in a generation or two.

Currently there is a huge interest in high quality raytracing on the GPU. The number of GPGPU renderers has exploded during the last year. At the same time there are critics saying that GPU rendering is still not mature enough to be used in serious work citing a number of limitations such as not enough memory, shaders are too simple and that you can only do brute force path tracing on the GPU, which is very inefficient compared to the algorithms used in CPU renderers. What is your take on this? Do you think that these limitations are going to be solved by future hardware or software improvements and how soon can we expect them?
David Luebke:
re offline renderers - I do think that GPU performance advantages are becoming too great for studios to ignore. You can definitely get way past simple path tracing. I know of a whole bunch of studios that are doing very deep dives. Stay tuned!

Do you think rasterization is still going to be used in 10 years?
David Luebke:
re rasterization: yes, forward rasterization is a very energy-efficient way to solve single-center-of-projection problems (like pinhole cameras and point light shadow maps) which continue to be important problems and subproblems in rendering. So I think these will stick around for at least another 10 years

There have been a lot of papers about reyes style micropolygon rasterizing at past graphics conferences with the feasibility of hardware implementation. Do you think this is a good idea?
David Luebke:
re: micropolygons - I think all the work on upolys is incredibly interesting. I still have some reservations about whether upolys are REALLY the final answer to rendering in the future. They have many attractive attributes, like the fact that samples are glued to parametric space and thus have good temporal coherence, but they seem kind of ... heavyweight to me. There may be simpler approaches.

am I wrong in thinking that game graphics are limited more by the artist than the graphics or are the game companies just trying to reach a broader market?
David Luebke:
you are not wrong! game developers are limited by artists, absolutely. But this translates to graphics - better graphics means simpler, more robust, easier to control and direct graphics. A good example is cascaded shadow maps, used very widely in games today. These are notoriously finicky and artists have to keep a lot of constraints in their head when designing levels etc. Looking forward, increased GPU performance and programmability both combine to make simpler approaches - like ray tracing - practical. On the flip side, graphics is certainly not solved and there are many effects that we can't do at all in real-time today, so you will continue to see games push forward on graphics innovation, new algorithms, new ways to use the hardware

UPDATE: Some tech websites such as xbitlabs, have taken the comment from the live chat about rasterization out of context, stating that we will have to wait at least another 10 years before ray tracing will be used in games. Apparently, they didn't read the answers from David Luebke very well. The way I understood it, is that rasterization will be used in conjunction with ray tracing techniques, both integrated in novel rendering algorithms such as image space photon mapping. From the ISPM paper:
Image Space Photon Mapping (ISPM) rasterizes a light-space bounce map of emitted photons surviving initial-bounce Russian roulette sampling on a GPU. It then traces photons conventionallyon the CPU. Traditional photon mapping estimates final radiance by gathering photons from a k-d tree. ISPM instead scatters indirect illumination by rasterizing an array of photon volumes. Each volume bounds a filter kernel based on the a priori probability density of each photon path. These two steps exploit the fact that initial path segments from point lights and final ones into a pinhole camera each have a common center of projection.

So ray tracing in games will definitely show up well within 10 years and according to Luebke you can expect "heavy ray tracing of dynamic content to be possible in a generation or two". Considering that Fermi was a little bit behind schedule, I would expect Nvidia's next generation GPU to come around March 2011 (on schedule), and the generation after that around September 2012. So only 2 years before real-time raytracing is feasible in games ;-D.

Sunday, August 22, 2010

Is Carmack working on a ray tracing based game engine?

At least that's what his tweets ( seem to suggest:

# There has GOT to be a better way to exactly test triangle-in-AABB than what I am currently doing.

# Idea: dynamically rewrite tree structures based on branch history to linearize the access patterns. Not thread safe…

# The valuation strategy behind the Surface Area Heuristic (SAH) used in ray tracing is also adaptable for rasterization engines.

# It is often better to use a global spherical sampling pattern and discard samples behind a plane instead of local hemispheres.

# @Wudan07 equal area but not shape. Got a better non iterative one? Poisson disc for higher quality.

# To uniformly sample a sphere, pick sample points on the enclosing cylinder and project inwards. Archimedes FTW!

# I need to decide if I am going to spend some quality time with CUDA or Android. Supercomputing and cell phones, about equally interesting…

# @castano triangle intersection is 33% of the wall clock time and a trace averages 12 triangle tests, so gather SIMD tests could win some.

# Doing precise triangle tests instead of just bounds tests when building KD trees makes a large difference with our datasets.

# For our datasets, my scalar KD tree trace code turned out faster than the SSE BVH tracer. Denser traces would help SSE.

If Carmack's pioneering work still has the same impact and influence on the game industry as it had in the 90's, ray traced games could become the standard "soon" (which in id terms means about 5-8 years :-).

UPDATE: I wasted a couple of hours and transcribed the part of Carmack's keynote at Quakecon 2010, where he's specifically talking about raytracing (from time 3:17 ) English is not my mother tongue so there are some gaps here and there, this part is not about real-time raytracing, but about precomputing the lightmaps and megatextures with raytracing instead of rasterization:

"We were still having precision issues with shadow buffers and all this and that, and finally I just sort of bit the bullet and said “Alright, we’re on the software side, let’s just stop rasterizing, let’s just raytrace everything. I’m gonna do a little bit of rasterizing for where I have to, but just throw lots and lots of rays." And it was really striking from my experience how much better a lot of things got. There are a lot of things that we live with for rasterization as we do in games with making shadows, trying to get shadow buffers working right, finding shadow acné vs Peter Pan effect on here and this balance that never quite get as things move around. Dealing with having to vastly oversize this trying to use environment maps for ambient lighting. And these are the things that people just live and breathe in games, you just accept it, this is the way things are, these are the issues and things will always be like this just getting higher and higher resolution. But it was pretty neat to see a lot of these things just vanish with ray tracing, where the shadows are just, the samples are right. We don’t have to worry about the orientation of some of these things relative to the other things. The radiosity gets done in a much better way and there are far less artifacts. And as we start thinking about things in those terms, we can start thinking about better ways to create all of these different things. So that experience has probably raised in my estimation a little bit the benefit of raytracing in the future of games. Again, there’s no way it’s happening in this coming generation, the current platforms can’t support it. It’s an open question about whether it’s possible in the generation after that, but I would say that it’s almost a foregone conclusion that a lot of things in the generation after that will wind up being raytraced, because it does look like it’s going to be performance reasonable on there and it makes the development process easier, because a lot of problems that you fight with just aren’t there. There’s a lot of things that, yeah, it’s still you could go ahead and render five times or ten times as many pixels, but we’re gonna reach this point where our ten times as many pixels or fragment samples going to give us that much more benefit or would we really like to have all the local shadows done really really right and have better indirect ambient occlusion. Or you can just have the reflections and highlights go where they’re supposed to, instead of just having a crummy environment map that reflects on the shiny surfaces there. So, I can chalk that up as one of those things where I definitely learned some things in the last six months about this and it modified some of my opinions there and the funny coming back around about that is, so I’m going through a couple of stages of optimizing our internal raytracer, this is making things faster and the interesting thing about the processing was, what we found was, it’s still a fair estimate that the gpu’s are going to be five times faster at some task than the cpu’s. But now everybody has 8 core systems and we’re finding that a lot of the stuff running software on this system turned out to be faster than running the gpu version on the same system. And that winds up being because we get killed by Amdahl’s law there where you’re throwing the very latest and greatest gpu and your kernel amount (?) goes ten times faster. The scalability there is still incredibly great, but all of this other stuff that you’re dealing with of virtualizing of textures and managing all of that did not get that much faster, so we found that the 8 core systems were great and now we’re looking at 24 thread systems where you’ve got dual thread six core dual socket systems (?) it’s an incredible amount of computing power and that comes around another important topic where pc scalability is really back now where we have, we went through sort of a ..."

Friday, August 13, 2010

Arnold is back!

During the last Siggraph, a lot of presentations were centered around Arnold, the production renderer which is going to kick Renderman's and mental ray's collective asses. The VFX industry is slowly discovering this marvel, which is doing Monte Carlo path tracing at incredible speeds. It eats huge complex scenes with motion blur and DOF for breakfast.

The argument that ray tracing is only used sparingly in the movie industry no longer holds, as almost all big film companies are shifting towards more and more (and sometimes full) raytracing as the method of choice. I'm sure this industry wide shift will trickle down to the game industry as well

Some interviews and presentations:

interview Marcos Fajardo:

HPG2009 Larry Gritz talking about Arnold:

more on Larry Gritz' talk:

interesting interview wiht Larry Gritz about the decision to move to Arnold:

basic features of Arnold:

many details on Arnold (don't miss this one):

Thursday, August 12, 2010

Carmack loves ray tracing

Some excerpts from his QuakeCon keynote (different live blogs):

"There are a lot of things we live with as rasterization in games...Ray tracing was the better way to go. The shadows and samples were right.. the radiosity gets done in a much better way."

Ray Tracing definitely not coming in this generation, but it's definitely the future of gaming once the technology becomes performance reasonable. Will make development easier. A lot of his opinions on ray tracing has changed in the past six months. Now that multicore CPUs are mainstream they're starting to catch back up to rendering performance compared to GPUs.

and from Carmack's own twitter:
"Reading Realistic Ray Tracing 2nd Ed. I discourage the use of .x .y .z style members and favor arrays. Many bugs due to copy-paste."

Wednesday, August 4, 2010

Faster raytraced global illumination by LOD tracing

Just read this Siggraph presentation about PantaRay, a GPU-accelerated raytracer to compute occlusion, which was used in Avatar and developed by Nvidia and Weta:

The idea is simple and elegant: use the full res geometry for tracing primary rays and use lower LOD geometry for tracing secondary rays. The amount of triangles to test intersection with is significantly reduced, which speeds up the GI computation considerably.

These papers use the same principle:

"R-LODs: Fast LOD based raytracing for massive models" by Yoon
"Two-Level Ray Tracing with Reordering for Highly Complex Scenes" by Hanika, Keller and Lensch

The idea is similar to Lightcuts and point based color bleeding (which both use a hierarchy to cluster sets of lights or points to reduce the computational cost of tracing) but is used for geometry instead of lights. It is also being used by Dreamworks:

Picture from Siggraph presentation by Tabellion (PDI Dreamworks)


We describe here one of the main ways in which we deal with geometric complexity. When rays are cast, we do not attempt to intersect the geometry that is finely tessellated down to pixel-size micropolygons - this would be very expensive and use a lot of memory. Instead we tessellate geometry using a coarser simplified resolution and the raytracing engine only needs to deal with a much smaller set of polygons. This greatly increases the complexity of the scenes that can be rendered with a given memory footprint, and also helps increase ray-tracing efficiency. To avoid artifacts due to the offset between the two geometries, we use a “smart bias” algorithm described in the next slides.

Since rays originate from positions on the micropolygon tessellation of the surface and can potentially intersect a coarser tessellation of the same surface, self intersection problems can happen.

The image above illustrates cross-section examples of a surface tessellated both into micropolygons and into a coarse set of polygons. It also shows a few rays that originate from a micropolygon whith directions distributed over the hemisphere. To prevent self intersection problems to happen, we use a ray origin offsetting algorithm. In this algorithm, we adjust the ray origin to the closest intersection before / after the ray origin along the direction of the ray, within a user-specified distance. The ray intersection returned as a result of the ray intersection query is the first intersection that follows the adjusted ray origin. The full algorithm is described in [Tabellion and Lamorlette 2004].

Here is a comparison between a reference image that was rendered while raytracing the micropolygons micropolygons. The image in the center was rendered with our technique while raytracing the coarser polygon tessellation illustrated in the image on the right.

It has been shown in [Christensen 2003] that it is possible to use even coarser and coarser tessellations of the geometry to intersect rays with larger ray footprints. This approach is also based on using only coarser tessellation and is not able to provide very coarse tessellations with much fewer polygons than there are primitives (fur, foliage, many small overlapping surfaces, etc.), which would be ideal for large ray footprints. This problem is one of the main limitations of ray-tracing based GI in very complex scenes and is addressed by the clustering approach used in point-based GI, as discussed in later slides.

With this technique, you could have a 1 million polygon scene to trace primary rays, while secondary rays are traced against a 100K or a 50K polygon LOD version of the scene. When using voxels, this becomes a walk in the park, since LOD is automatically generated (see for an awesome example)

Thursday, July 22, 2010

CUDA 3.1 out

Just read that cuda 3.1 toolkit is available (has been for some time in fact)

As you can see from the release notes, CUDA 3.1 also gives 16-way kernel concurrency, allowing for up to 16 different kernels to run at the same time on Fermi GPUs. Banks said a bunch of needed C++ features were added, such as support for function pointers and recursion to allow for more C++ apps to run on GPUs as well as a unified Visual Profiler that supports CUDA C/C++ as well as OpenCL. The math libraries in the CUDA 3.1 SDK were also goosed, with some having up to 25 per cent performance improvements, according to Banks.

The support for recursion and concurrent kernels should be great for CUDA path tracers running on Fermi and I'm curious to see the performance gains. Maybe the initial claims that Fermi will have 4x the path tracing performance of GT200 class GPUs could become true after all.

Thursday, July 15, 2010

OnLive + Mova vs OTOY + LightStage

I've just read Joystiq's review of OnLive, which is very positive regarding the lag issue: there is none...

As it stands right now, the service is -- perhaps shockingly -- running as intended. OnLive still requires a faster than normal connection (regardless of what the folks from OnLive might tell you), and it requires a wired one at that, but it absolutely, unbelievably works. Notice I haven't mentioned issues with button lag? That's because I never encountered them. Not during a single game (even UE3).

A related recent Joystiq article about OnLive mentions Mova, a sister company of OnLive developing Contour, a motion capture technology using a curved wall of camera's, very reminiscent of OTOY's LightStage (although the LightStage dome is bigger and can capture the actor from 360 degrees at once). The photorealistic CG characters and objects that it produces are the real advantage of cloud gaming (as was being hinted at when the Lightstaged Ruby from the latest Ruby demo was presented at the Radeon HD 5800 launch):

What he stressed most, though, was Perlman's other company, Mova, working in tandem with OnLive to create impressive new visual experiences in games. "This face here," Bentley began, as he motioned toward a life-like image that had been projected on a screen before us, "is computer generated -- 100,000 polygons. It's the same thing we used in Benjamin Button to capture Brad Pitt's face. Right here, this is an actress. You can't render this in real time on a standard console. So this is the reason OnLive really exists." Bentley claims that Mova is a big part of the reason that a lot of folks originally got involved with OnLive. "We were mind-boggled," he exclaimed. And mind-boggling can be a tremendous motivator, it would seem -- spurring Bentley to leave a successful startup for a still nascent, unknown company working on the fringes of the game industry.

In fairness, what we saw of Mova was terrifyingly impressive, seemingly crossing the uncanny valley into "Holy crap! Are those human beings or computer games?" territory. Luckily for us, someone, somewhere is working with Mova for games. Though Bentley couldn't say much, when we pushed him on the subject, he laughed and responded, "Uhhhh ... ummm ... there's some people working on it." And though we may not see those games for quite some time, when we do, we'll be seeing the future.

Just like OTOY, I bet that OnLive is developing some voxel ray tracing tech as well, which is a perfect fit for server side rendering due to it's massive memory requirements. Now let's see what OTOY and OnLive with their respective cloud servers and capturing technologies will come up with :-)

Wednesday, July 14, 2010

Real-time Energy Redistribution Path Tracing in Brigade!

A lot of posts about Brigade lately, but that's because the pace of development is going at break neck speeds and the intermediate updates are very exciting. Jacco Bikker and Dietger van Antwerpen, the coding brains behind the Brigade path tracer, seem unstoppable. The latest contribution to the Brigade path tracer is the implementation of ERPT or Energy Redistribution Path Tracing. ERPT was presented at Siggraph 2005 and is an unbiased extension of regular path tracing which combines Monte Carlo path tracing and Metropolis Light Transport path mutation to obtain lower frequency noise and converge faster in general. Caustics benefit greatly as well as scenes which are predominantly lit by indirect lighting. The original ERPT paper can be found at and offers a very in-depth and understandable insight into the technique. A practical implementation of ERPT can be found in the paper "Implementing Energy Redistribution Path Tracing" (

The algorithm seems to be superior than (bidirectional) path tracing and MLT in most cases, while retaining it's unbiased character. And they made it work on the GPU! You could say that algorithm-wise, the addition of ERPT makes Brigade currently more advanced than the other GPU renderers (Octane, Arion, LuxRays, OptiX, V-Ray RT, iray, SHOT, Indigo GPU, ...) which rely on "plain" path tracing.

The following video compares path tracing to ERPT in Brigade at a resolution of 1280x720(!) on a GTX 470:

This image directly compares path tracing on the left with ERPT on the right (the smeary pixel artefacts in the ERPT image are mostly due to the youtube video + JPEG screengrab compression, but I presume that there are also some noise filters applied as described in the ERPT paper):
ERPT seems to be a little bit darker than regular path tracing in this image, which seems to be a by product of the noise filters according to

On a side note, the Sponza scene in the video renders very fast for the given resolution and hardware. When comparing this with the video of Sponza rendering in the first version of SmallLuxGPU on a HD 4870 (which I thought looked amazing at the time), it's clear that GPU rendering has made enormous advancements in just a few months thanks to more powerful GPU's and optimizations of the path tracing code. I can hardly contain my excitement to see what Brigade is going to bring next! Maybe Population Monte Carlo energy redistribution for even faster convergence? ;)

Monday, July 12, 2010

Sparse voxel octree and path tracing: a perfect combination?

I have been wondering for some time if SVO and path tracing would be a perfect solution for realtime GI in games. Cyril Crassin has shown in his paper "Beyond triangles: Gigavoxels effects in videogames" that secondary ray effects such as shadows can be computed very inexpensively by tracing a coarse voxel resolution mipmap without tracing all the way down to the finest voxel resolution, something that is a magic intrinsic property of SVO's and which is to my knowledge not possible when tracing triangles (unless there are multiple LOD levels), where all rays have to be traced to the final leaf containing the triangle, which is of course very expensive. A great example of different SVO resolution levels can be seen in the video "Efficient sparse voxel octrees" by Samuli Laine and Tero Karras (on , video link at the right, a demo and source code for CUDA cards is also available).

I think that this LOD "trick" could work with all kinds of secondary ray effects, not just shadows. Particularly ambient occlusion and indirect lighting could be efficiently calculated in this manner, even on glossy materials. One limitation could be high frequency content, because every voxel is an average of the eight smaller voxels that it's constituted of, but in such a case the voxels could be adaptively subdivided to a higher res (same for perfectly specular reflections). For diffuse materials, the cost of computing indirect lighting could be drastically reduced.

Another idea is to compute primary rays and first bounce secondary rays at full resolution, and all subsequent bounces at lower voxel LODs with some edge-adaptive sampling, since the 2nd, 3rd, 4th, ... indirect bounces contribute relatively little to the final pixel color compared to direct + 1st indirect bounce. Not sure if this idea is possible or how to make it work.

Voxelstein3d has already implemented the idea of pathtracing SVOs for global illumination ( with some nice results. Once a demo is released, it's going to be interesting to see if the above is true and doesn't break down with non-diffuse materials in the scene.

UPDATE: VoxLOD actually has done this for diffuse global illumination and it seems to work nicely:

Thursday, July 8, 2010

Tokaspt, an excellent real-time path tracing app

Just stumbled upon this very impressive CUDA based path tracer: for exe and source (The app itself has been available since January 2009)

Although the scenes are quite simple (spheres and planes only), it's extremely fast and it converges literally in a matter of milliseconds to a high quality image. Navigation is as close to real-time as it gets. There are 4 different scenes to choose from (load scene with F9) and they can be modified at will: parameters are sphere size, color, emitting properties, 3 material BRDFs (diffuse (matte), specular (mirror) and refractive (glass)) and sphere position. Path trace depth and spppp (samples per pixel per pass) can also be altered on the fly thanks to the very convenient GUI with sliders. When moving around and ghosting artefacts appear, press the "reset acc" button to clear the accumulation buffer and get a clean image. Definitely worth messing around with!

Wednesday, July 7, 2010

New version of Brigade path tracer

Follow the link in this post to download. There's some new features + performance increase. Rename cudart.dll to cudart32_31_9.dll to make it work.

The next image demonstrates some of the exceptional strenghts of using path tracing:

- indirect lighting with color bleeding: notice that every surface facing down (yellow arrows) picks up a slightly greenish glow from the floor plane, due to indirect light bouncing off (this picture uses path trace depth 6)
- soft shadows
- indirect shadows
- contact shadows (ambient occlusion)
- superb anti-aliasing
- depth of field
- natural looking light with gradual changes

all of these contribute to the photorealistic look and it's all interactive (on high end cpu+gpu)!

Saturday, July 3, 2010

Friday, July 2, 2010

Brigade path tracer comparison

The following screenshots are taken from the Brigade real-time path tracer demo, available at
Rendered with CPU only at resolution 832x512
Images with 100 and 800 spp were taken without frame averaging (only 1 iteration)
Images with 2, 8, 16, 32 spp taken with frame averaging (averaging samples of several frames)

2 spp

8 spp

16 spp

32 spp

100 spp

800 spp

To top it off, one big image comparing 800, 8, 16 and 32 spp. It amazes me that the quality of just 8 samples is already great and with some filtering it could rival the quality of the 800 spp image: