Here are the links to our report, video, and code
report
video
git repo
here this the link to Yuanhui's blog
Yuanhui
Wednesday, April 25, 2012
Saturday, April 21, 2012
some comparison result
In recent weeks, I was working on how to improve the GPU implementation performance and assisting Yuanhui in rendering.
Let's talk about the GPU implementation of SPH. I have tested against the CPU implementation.
the FPS of CPU implementation is 9
. Remember this value is get when we set the NUM_CPU_THREADS to 1, which means we only want one CPU thread to run the program.
I also have the FPS of the GPU implementation
the FPS is 32, it's about 2.7x compared to CPU. If we turn the light source on and display the shadow(such function is already implemented in the framework I use), the FPS of CPU implementation will not drop, but the GPU's will drop to 26.
Since I optimized the CPU implementation of OpenMP, I also try running the CPU code by using 8 CPU threads....
the FPS is 40.. it out performs the GPU implementation...
I was quite confused about this comparison result and looked into my code to see what causes this. I think this should be related to memory access issues and i am still working on it to see if I can improve the performance of the GPU implementation.
Since the framework I use have implemented the fluid simulation in both CPU and GPU, I have checked their performance. To my surprise, the GPU implementation is defeated by the CPU implementation while that CPU implementation is even not optimized by OpenMP...
I have checked the code, then I realize that the framework's original implementation is not computation intensive. In fact, it has significantly shorter loops than mine but it needs more memory accesses. It has a global vector call neighbor table storing the neighbor information of each particle, this neighbor table help reduce the loop length but the trade off is it increase the memory accesses. That's why the GPU implementation of this method is defeated by the CPU implementation. This result strengthen my idea that the optimization of my GPU implementation should be focused on the memory access issue.
Let's talk about the GPU implementation of SPH. I have tested against the CPU implementation.
. Remember this value is get when we set the NUM_CPU_THREADS to 1, which means we only want one CPU thread to run the program.
I also have the FPS of the GPU implementation
the FPS is 32, it's about 2.7x compared to CPU. If we turn the light source on and display the shadow(such function is already implemented in the framework I use), the FPS of CPU implementation will not drop, but the GPU's will drop to 26.
Since I optimized the CPU implementation of OpenMP, I also try running the CPU code by using 8 CPU threads....
I was quite confused about this comparison result and looked into my code to see what causes this. I think this should be related to memory access issues and i am still working on it to see if I can improve the performance of the GPU implementation.
Since the framework I use have implemented the fluid simulation in both CPU and GPU, I have checked their performance. To my surprise, the GPU implementation is defeated by the CPU implementation while that CPU implementation is even not optimized by OpenMP...
I have checked the code, then I realize that the framework's original implementation is not computation intensive. In fact, it has significantly shorter loops than mine but it needs more memory accesses. It has a global vector call neighbor table storing the neighbor information of each particle, this neighbor table help reduce the loop length but the trade off is it increase the memory accesses. That's why the GPU implementation of this method is defeated by the CPU implementation. This result strengthen my idea that the optimization of my GPU implementation should be focused on the memory access issue.
Sunday, April 8, 2012
GPU implementation done
I worker out the GPU implementation this week, it's much faster than the CPU one. here is the comparison:
(the left is CPU and the right is GPU, both 2k particles)
(the left is CPU and the right is GPU, both 2k particles)
The problem is that I am using a "straight forward" implementation whose complexity is O(n^2). If I increase the number of particles to 4k, the GPU implementation looks slow, though it is still much faster than the CPU one.
I will try to figure out how to implement SPH by using another method presented in "Simulation and Rendering of a Viscous Fluid using Smoothed Particle Hydrodynamics".
I may also use OpenMP to optimize the CPU implementation and make a comparison between these 3 implementations .
Sunday, April 1, 2012
CPU implementation of SPH
I finally got the CPU implementation run, but there are two main problems
first is that the speed is too slow, there are 4k particles in this picture, and the FPS I guess is below 1.0...I used several "for loop within for loop" which I believe is the main reason for such a low speed. But, if I implement this on GPU, I believe the speed will be much faster. If i use OpenMP to deal with those loops, the speed should also be somehow faster..
Another thing need to be improved is that the movement of particles doesn't look like fluid... Since SPH method require that the fluids must simulate at real-world, which means the value or units must be physically right. I think that I gave some parameters wrong values, which result in the strange behavior of particles...
density, pressure, viscosity calculation method
In SPH method, we basically calculate each particle's pressure and viscosity, then using the gradient of the pressure and the Laplacian of the viscosity to get the force at each particle, once we get the force at each particle, we can then get the acceleration, velocity and ultimately, using the velocity to update each particles' position.
After I studied the paper mentioned in the last post, I believed the CPU serial implementation of SPH method should be quite straight forward.
Either density, pressure or viscosity can be calculated by
To implement this equation, a naive method is to use two for loops.
for example, if we want to calculate the density:
After I studied the paper mentioned in the last post, I believed the CPU serial implementation of SPH method should be quite straight forward.
Either density, pressure or viscosity can be calculated by
and this equation's gradient, Laplacian.
the gradient or Laplacian of this equation will only affect the smoothing kernel function W(r - rj,h).
for example, if we want to calculate the density:
for ( data1 = particle.data; dat1 < dat1_end; dat1 += particle.width ) {
p = data1;
sum = 0.0;
data2_end = particle.data + NUM_PARTICLES*particle.width;
for ( data2 = particle.data; dat2 < dat2_end; dat2 += particle.width ) {
q = data2;
if ( p==q ) continue;
dx = ( p->pos.x - q->pos.x)*d;
dy = ( p->pos.y - q->pos.y)*d;
dz = ( p->pos.z - q->pos.z)*d;
d2 = (dx*dx + dy*dy + dz*dz);
if ( sRadius2> d2 ) {
c = sRadius2 - d2;
sum += c * c * c;
}
}
p->density = sum * PARTICLE_MASS* kernFun1 ;
SPH method study
SPH method is a particle simulation method. It can be use to simulate fluid. We can assume that fluid is consist of large amount of particles. Every particle will interact with its neighbors.Actually, the particle do not need to interact with every other particles, We can define a radius and let a particle only interact with the particles within this radius.
I am currently studying this paper http://www8.cs.umu.se/education/examina/Rapporter/MarcusVesterlund.pdf
and working on a naive CPU implementation of SPH
Tuesday, March 13, 2012
Ocean Simulation and Rendering Proposal
Ocean Simulation and Render Proposal Yuanhui Chen, Tao Lei
For the CIS565 final project, we are going to take ocean simulation and rendering. Yuanhui Chen, 1st year master student in CGGT, will mainly work on rendering part. Tao Lei, 1st year master student in EE, will focus on simulation. Fluid animation if popular in games, special effects. However, it is hard to get desired effects due to its computing complexity. So exploring the computing power of GPU becomes an effective solution.
We are going to use Smoothed Particles Hydrodynamics(SPH) method to do the ocean simulation. SPH's drawback over grid-based method is that it requires large number of particles to produce simulation of equivalent resolution. But, since we are using GPU, which is good at dealing with computation intensive tasks, such drawback should be no longer exist. Since SPH method is inherently parallelism, less data dependent, which is perfect to be implement on GPU. Although physically based fluids animation has historically been the domain of high-quality offline rendering due to great computational cost(GPU Gem Chap 30.1 pg633), we are going to simulate it in real-time.
For the rendering part, we will try several methods then determine which one to implement at last. First option is to use marching method to generate isosurface from density field and then use volume rendering to visualize isosurface. The marching method is based on a paper "Using the CPU programmable Geometry Pipeline". It combines the marching cubes and tetrahedra. The second option is photon mapping. Photon mapping is a good choice for add refractions, reflection and global illumination, however it is expensive, especially when we need to change the view frequently.
We also plan to add interactions:interaction between ocean and coast, interaction between water and floating objects. We refer "Animating the Interplay Between Rigid Bodies and Fluid". It developments the SPH model by adding rigid body forces and enforcing rigid body motion. We will also take the wind effects into consideration.
Although this project is ocean simulation and rendering, we are not going to confine our work in ocean. We may also cover cloud and terrain simulation and rendering in our work to make our picture rich and full.
Video:
http://www.youtube.com/watch?v=oNPJKBjuHIY (ocean simulation)
http://www.youtube.com/watch?v=d818Bjef6Yc (ocean render)
Subscribe to:
Posts (Atom)