Here are the links to our report, video, and code
report
video
git repo
here this the link to Yuanhui's blog
Yuanhui
Wednesday, April 25, 2012
Saturday, April 21, 2012
some comparison result
In recent weeks, I was working on how to improve the GPU implementation performance and assisting Yuanhui in rendering.
Let's talk about the GPU implementation of SPH. I have tested against the CPU implementation.
the FPS of CPU implementation is 9
. Remember this value is get when we set the NUM_CPU_THREADS to 1, which means we only want one CPU thread to run the program.
I also have the FPS of the GPU implementation
the FPS is 32, it's about 2.7x compared to CPU. If we turn the light source on and display the shadow(such function is already implemented in the framework I use), the FPS of CPU implementation will not drop, but the GPU's will drop to 26.
Since I optimized the CPU implementation of OpenMP, I also try running the CPU code by using 8 CPU threads....
the FPS is 40.. it out performs the GPU implementation...
I was quite confused about this comparison result and looked into my code to see what causes this. I think this should be related to memory access issues and i am still working on it to see if I can improve the performance of the GPU implementation.
Since the framework I use have implemented the fluid simulation in both CPU and GPU, I have checked their performance. To my surprise, the GPU implementation is defeated by the CPU implementation while that CPU implementation is even not optimized by OpenMP...
I have checked the code, then I realize that the framework's original implementation is not computation intensive. In fact, it has significantly shorter loops than mine but it needs more memory accesses. It has a global vector call neighbor table storing the neighbor information of each particle, this neighbor table help reduce the loop length but the trade off is it increase the memory accesses. That's why the GPU implementation of this method is defeated by the CPU implementation. This result strengthen my idea that the optimization of my GPU implementation should be focused on the memory access issue.
Let's talk about the GPU implementation of SPH. I have tested against the CPU implementation.
. Remember this value is get when we set the NUM_CPU_THREADS to 1, which means we only want one CPU thread to run the program.
I also have the FPS of the GPU implementation
the FPS is 32, it's about 2.7x compared to CPU. If we turn the light source on and display the shadow(such function is already implemented in the framework I use), the FPS of CPU implementation will not drop, but the GPU's will drop to 26.
Since I optimized the CPU implementation of OpenMP, I also try running the CPU code by using 8 CPU threads....
I was quite confused about this comparison result and looked into my code to see what causes this. I think this should be related to memory access issues and i am still working on it to see if I can improve the performance of the GPU implementation.
Since the framework I use have implemented the fluid simulation in both CPU and GPU, I have checked their performance. To my surprise, the GPU implementation is defeated by the CPU implementation while that CPU implementation is even not optimized by OpenMP...
I have checked the code, then I realize that the framework's original implementation is not computation intensive. In fact, it has significantly shorter loops than mine but it needs more memory accesses. It has a global vector call neighbor table storing the neighbor information of each particle, this neighbor table help reduce the loop length but the trade off is it increase the memory accesses. That's why the GPU implementation of this method is defeated by the CPU implementation. This result strengthen my idea that the optimization of my GPU implementation should be focused on the memory access issue.
Sunday, April 8, 2012
GPU implementation done
I worker out the GPU implementation this week, it's much faster than the CPU one. here is the comparison:
(the left is CPU and the right is GPU, both 2k particles)
(the left is CPU and the right is GPU, both 2k particles)
The problem is that I am using a "straight forward" implementation whose complexity is O(n^2). If I increase the number of particles to 4k, the GPU implementation looks slow, though it is still much faster than the CPU one.
I will try to figure out how to implement SPH by using another method presented in "Simulation and Rendering of a Viscous Fluid using Smoothed Particle Hydrodynamics".
I may also use OpenMP to optimize the CPU implementation and make a comparison between these 3 implementations .
Sunday, April 1, 2012
CPU implementation of SPH
I finally got the CPU implementation run, but there are two main problems
first is that the speed is too slow, there are 4k particles in this picture, and the FPS I guess is below 1.0...I used several "for loop within for loop" which I believe is the main reason for such a low speed. But, if I implement this on GPU, I believe the speed will be much faster. If i use OpenMP to deal with those loops, the speed should also be somehow faster..
Another thing need to be improved is that the movement of particles doesn't look like fluid... Since SPH method require that the fluids must simulate at real-world, which means the value or units must be physically right. I think that I gave some parameters wrong values, which result in the strange behavior of particles...
density, pressure, viscosity calculation method
In SPH method, we basically calculate each particle's pressure and viscosity, then using the gradient of the pressure and the Laplacian of the viscosity to get the force at each particle, once we get the force at each particle, we can then get the acceleration, velocity and ultimately, using the velocity to update each particles' position.
After I studied the paper mentioned in the last post, I believed the CPU serial implementation of SPH method should be quite straight forward.
Either density, pressure or viscosity can be calculated by
To implement this equation, a naive method is to use two for loops.
for example, if we want to calculate the density:
After I studied the paper mentioned in the last post, I believed the CPU serial implementation of SPH method should be quite straight forward.
Either density, pressure or viscosity can be calculated by
and this equation's gradient, Laplacian.
the gradient or Laplacian of this equation will only affect the smoothing kernel function W(r - rj,h).
for example, if we want to calculate the density:
for ( data1 = particle.data; dat1 < dat1_end; dat1 += particle.width ) {
p = data1;
sum = 0.0;
data2_end = particle.data + NUM_PARTICLES*particle.width;
for ( data2 = particle.data; dat2 < dat2_end; dat2 += particle.width ) {
q = data2;
if ( p==q ) continue;
dx = ( p->pos.x - q->pos.x)*d;
dy = ( p->pos.y - q->pos.y)*d;
dz = ( p->pos.z - q->pos.z)*d;
d2 = (dx*dx + dy*dy + dz*dz);
if ( sRadius2> d2 ) {
c = sRadius2 - d2;
sum += c * c * c;
}
}
p->density = sum * PARTICLE_MASS* kernFun1 ;
SPH method study
SPH method is a particle simulation method. It can be use to simulate fluid. We can assume that fluid is consist of large amount of particles. Every particle will interact with its neighbors.Actually, the particle do not need to interact with every other particles, We can define a radius and let a particle only interact with the particles within this radius.
I am currently studying this paper http://www8.cs.umu.se/education/examina/Rapporter/MarcusVesterlund.pdf
and working on a naive CPU implementation of SPH
Subscribe to:
Posts (Atom)