Sunday, April 8, 2012

GPU implementation done

I worker out the GPU implementation this week, it's much faster than the CPU one. here is the comparison:
(the left is CPU and the right is GPU, both 2k particles)



The problem is that I am using a "straight forward" implementation whose complexity is O(n^2).  If I increase the number of particles to 4k, the GPU implementation looks slow, though it is still much faster than the CPU one.

I will try to figure out how to implement SPH by using another method presented in "Simulation and Rendering of a Viscous Fluid using Smoothed Particle Hydrodynamics".

I may also use OpenMP to optimize the CPU implementation and make a comparison between these 3 implementations .

2 comments:

  1. Good progress. Do you have hard numbers for this? What is the performance improvement? 10x? 100x? How does it scale with the number of particles?

    ReplyDelete
    Replies
    1. the performance improvement is only 3x....
      I have optimized the CPU implementation by using OpenMP
      I have a quad-core CPU and the performance improvement by using OpenMP is almost 4x.. It outperformed the GPU implementation.. So I think I need to further optimize the GPU implementation..It should run faster than CPU..

      Delete