Collision%20Detection%20Design%20 - PowerPoint PPT Presentation

About This Presentation

Title:

Collision%20Detection%20Design%20

Description:

To avoid creating a huge array, I chose the second method: 1st ... Use in sports medicine & surgery. To study impact of DNA change on bone formation/ growth ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 56

Provided by: Bee83

Learn more at: https://sbel.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Collision%20Detection%20Design%20

1
Collision Detection Design Final Project Topic

Brandon Smith
November 5, 2008
ME 964

2
contact_data Allocation

Possible ways to allocate the contact_data array
Allocate contact_data N(N-1)/2
Allocate contact_data n_contacts
To avoid creating a huge array, I chose the
second method
1st Kernel Call
Find the number of contacts.
2nd Kernel Call
Calculate the contact_data for each contact.

3
Kernel Call Setup

The total number of contact tests is
n_tests N(N-1)/2
The total number of concurrent threads is
n_concurrent_threads N_SMs BLOCKS_PER_SM
THREADS_PER_BLOCK
Each thread will perform several tests
n_test_per_thread n_tests /
n_concurrent_threads 1

4
Collide Kernel Indexing

Given the block number and thread number, a range
of test numbers (ki,kf) are generated
thread_id bxTHREADS_PER_BLOCK tx
ki tests_per_threadthread_id 1
kf ki tests_per_thread - 1

Given a test number k, the indices (i,j) can be
calculated
k ( (j-1)2-(j-1) )/2 I
k lt (j2-j )/2

Body 1 2 3 4 j
1 1 2 4 7
2 3 5 8
3 6 9
4 k
i
5
Collide Kernel Contact Testing

__global__ function calls __device__ test to
actually perform the contact test
In the first pass it simply tests for contact
In the second pass it calculates contact_data.
atomicAdd is used to count the number of contacts
Keeps one contact tall for all concurrent threads
No need for condensation of results from each
thread
Hassle to compile
nvcc.exe -ccbin "C\Program Files\Microsoft
Visual Studio 8\VC\bin" -c -arch sm_11
-D_CONSOLE -Xcompiler "/EHsc /W3 /nologo /Wp64
/O2 /Zi /MT " - I"C\CUDA\include"
-I"C\Program Files\NVIDIA Corporation\NVIDIA
CUDA SDK\common\inc" -o Release\collide.obj
collide.cu

6
Final Project Monte Carlo Radiation Transport

Objective
Compute radiation flux or derived quantities over
a spatial/temporal domain.
Method
Follow the life of individual particles through
the domain.

Quality of Results
Statistical error is proportional to
1/sqrt(n_particles)
Difficult to get even particle distribution
across the domain
Many particles are required to achieve low
statistical error

7
Example Fusion Reactor Shielding

The GPU Advantage
Increase the number of simulated particles
Decrease statistical error

8
Tasks during a Particles Life

Birth particles are created at a source
Ray-cast the distance to the next surface is
calculated
Collision the particle interacts with matter
Next volume the particle crosses a boundary into
another material
Death if the particle is absorbed, it is killed.

9
Existing Fortran Code

Geometry
3-D geometry supporting boxes and spheres
Physics
Only neutral particles (neutrons, photons)
No energy dependence
No time dependence
Materials
Simple materials (only a few isotopes)
Sources
point, line, area, volume
Results
mesh tallies and volume tallies

10
Potential for Parallelism

Usually we can assume each particle is
independent, unless
criticality, weight windows, etc
Each thread could calculate independent particle
trajectories
embarrassingly parallel
When enough particles are simulated, condense the
results from each thread

11
Implementation Challenges

Current code is in Fortran 90
1700 lines
Has anyone tried F2C?
Designed for Fortran 77
Particles are tracked on a large mesh
1 M mesh elements, accessed once per particle
Mesh will need to be in global memory
Mesh will be accessed with an atomic function for
data sharing?
Ensure that random numbers are not repeated
Use a pseudo-random number generator for each
thread
Each thread will need a different random seed
Check to ensure sufficiently large stride
Could schedule rendezvous to check for solution
convergence
Stop simulation once statistical error falls
below a set value ( 5 )

12
ME 964 Project ProposalVikalp Mishra
13
Collision Detection

Aim
Solve collision detection problem given N rigid
spheres in 3D space
Approach
Brute Force
Compare each sphere with every other sphere
O(n2)
If distance between centers is
more than sum of radii ? No collision
Less than sum of radii ? Collision
When collision detected
compute normal and object IDs

14
Final Project Bone FEA

Title
GPU based Finite Element Analysis of Femur
Femur
Thigh bone Bone between hip and knee joint
Longest/ strongest bone in the body

15
Why study femur ?

To better understand bone mechanics/ properties
Across species
To understand the impact extent of injury under
various loading
Use in sports medicine surgery
To study impact of DNA change on bone formation/
growth
Improve the process of cloning to develop better
species
To study effect of nutrition cycle on bone
development

16
Background

In past
Experiments were done to study bone behavior /
material properties
Test performed
Fracture test
Bending test
Torsion test
Experiments on mouse / pig
Costly and time consuming
Only one experiment per sample possible
Alternative
Capture bone geometry and material properties
Use computational tools for various analysis
Saves time/ money

17
Typical approach

Given
CT scan data of bone (geometry)
Material property distribution
Loading scheme
3 or 4 point loading / Torsion test / Bending
test

18
Use of FEA

Use Finite Element Method
To capture geometry
Physical properties
Hexahedral elements
Tetrahedral elements
Formulate FE problem
Use boundary conditions to define element level
stiffness matrix (Ke)
load vector (Fe)
Assemble elements in global matrix (Kg, Fg)
Solve FE problem
Obtain deflection (u Kg-1Fg)
Compare with experimental results
Verify model

19
Bottleneck

Bone geometry is complex
Large number of elements required
For pig bone 0.5 1 million elements (coarse
mesh)

20
GPU based approach

Potential for GPU based computation
Same set of computation for each element
Stiffness matrix computation (Ke)
Load vector computation (Fe)
Different data sets for each element
SIMD
Approach
Use GPU for element level computation
Account for 67 of total time
Use CPU for global matrix inversion
Compare results with MATLAB based model

21
ME 964 Midterm and Final Projects

Saigopal Nelaturi

22
CUDA Collision detection

Problem Given n spheres in 3d space, compute
all pair-wise collisions
Approach Brute force algorithm with quadratic
complexity
Idea every pair of spheres can be tested
independently, and in parallel

23
Task Parallelism pseudo code
24
Final Project

Constructive operators in SE(3)
SE(3) is the group of 4x4 rigid transformation
matrices
Point in SE(3) matrix
Set in SE(3) set of matrices
Can devise operators using Boolean algebra and
matrix multiplication (group operation)

25
Example
How to compute workspace? Position orientation
of coordinate frame on coupler Use set
formulation in SE(3) Intersection of
sets Embarrassingly parallel process! Many
other applications in design/geometric modeling/
motion planning
26
Goals

For very large sets of 4x4 transformation
matrices , implement
Intersection pairwise comparison between
matrices
Convolution pairwise multiplication between
matrices
Show some workspace computations (hopefully in
3d)
If possible, implement
Deconvolution combination of pairwise
intersection/multiplication

27
Midterm Project

Ram Subramanian

28
The Task

To solve a collision detection problem Given an
arbitrary number of rigid spheres with known
radii, distributed in the 3D space, To find out
which spheres are in contact/penetration with
which other spheres.

29
The Algorithm

One pass over array to determine collisions.
One pass over all the collided bodies to compute
the values of collision required.
Two Kernel Calls.
O(n.(n-1)/2)

30
Indexing

Every Thread gets a Reference body (Body A) and a
Comparison body (Body B).
Each block has 512 threads (assumption 1).
Each row in a grid has 512 blocks (assumption 2).
Total number of threads is n(n-1)/2.
Compute the index value with the thread ID and
block ID.
Using this index value and the number of bodies
(using the div and mod) the index of the Body A
and Body B, respectively, can be determined.

31
Final Project - Image Processing on the GPU

Goal Implement Image Processing Algorithms for
the GPU. Eventually have an image processing
library for the GPUs using CUDA
Motivation Most image processing tasks involve
operating on individual pixels or a region of the
image. Many of these tasks are embarrassingly
parallel.

32
Proposed Implementations

Harris Corner Detector

Motivation This is an algorithm used in the
first stage processing of
many other Image Processing
and Computer Vision algorithms
(e.g. 3D reconstruction, Scene Stitching,
Object Tracking,
Visual Servoing, etc )
Ambitious Goal
Implement an image stitching algorithm or 3D
reconstruction algorithm that will stitch two
images together using the Harris Corner detector.
33
Harris Corner Detector

At every pixel in the image place a window
(larger the better, e.g. 5x5) call it W
Assume either 4 or 8 neighborhood of the current
pixel position
Slide the window to each neighboring pixel,
giving W1, W2 Wi (where i 4 or 8)

34
Harris Corner Detector Contd..

Compute the sum of squared differences (SSD)
between W and each Wi
A Corner is detected when all SSD values are
below a given threshold set by user (or the
smallest value is below a given threshold).

35
Midterm and Final Projects

Toby Heyn
ME 964
11/06/08

36
Midterm Project

Spatial Subdivision
Partition space into uniform grid (cells)
For each object, determine which cells the object
overlaps
Objects can only collide if they occupy the same
cell or adjacent cells

37
Midterm Project

Construct Cell ID Array
Each thread determines the cell IDs of the cells
its sphere occupies, loads into Cell ID Array
Sort Cell ID Array
Radix Sort Algorithm
Create Collision Cell List
Scan sorted Cell ID Array, look for changes in
cell ID
Write Collision Cell List with Cell ID Array
indices, number of objects in the cell
Traverse Collision Cell List
One thread per Collision Cell
Each thread checks all collision pairs in the
Collision Cell
Collisions are written to output

38
Midterm Project

Radix Sort
Sorts cell IDs in several passes
Sorts low order bits before higher order bits,
retaining order of IDs with same cell ID
This helps in a later step
Takes 4 passes to sort the 32 bit (4 byte)
integers
Makes use of parallel scan operation

39
Final Project

Default final project granular dynamics using
collision detection from midterm
Incorporate midterm collision detection into
ChronoEngine multibody dynamics engine
Simulate Mars Rover with many (millions) of bodies

40
Final Project

ChronoEngine
C API
Commands for creating simulation environment,
populating with bodies, creating constraints, etc
Uses Bullet for collision detection
Has been used to solve systems with 100,000
bodies
Has a CUDA parallelized dynamics solver (based on
LCP formulation)

41
Final Project

Each wheel is a union of primitives
Terrain consists of 5000 spheres (much too
coarse)
Obstacles
Non spherical bodies in wheels
Large mass difference between small grain and
large rover

42
Final Project

Handling non-spherical bodies
Represent the surface of the body as a composite
of smaller spheres
New representation has more bodies, but only
spheres
Maintain same dimensions, mass, inertia properties

43
Final Project

Parallelism
Collision detection
Many bodies/collision pairs to check
Spatial sub-division geometric decomposition,
task decomposition
Dynamics
Many equations of motion to solve
Geometric decomposition
Potentially many non-spherical bodies to process
in parallel

44
Final Project

Remaining Issues
Re-use of data
After solving the collision detection problem
once, can data be reused to reduce the size of
the problem to be solved in subsequent steps?
Automate handling of non-spherical geometry
Can an automated method be created to represent
arbitrary geometry with spheres?

45
ME 964 Midterm Final Project

Justin Madsen

46
Outline

Midterm final are the same project
default scheme
Collision detection method
Baraff
Brief overview of 2 phase algorithm
Ideas for CUDA implementation
Ideas for final project
Integrating CUDA collision detection with other
dynamics programs

47
Efficient collision detection

Baraff method
Axis Aligned bounding boxes (AABB)
Simple yet efficient
Only dealing with spheres
Can be extended to convex polyhedra
(actually dont need bounding boxes for spheres,
its a special case)

Figure 1. AABB size and orientation depends on
the local coordinate system
48
Overview of method

One dimensional case (x-axis)
Sort Sweep
Each object has a length along the axis according
to the AABB
Data beginning and end values (b and e) of each
box
Sorted lowest to highest according to these values

Figure 2. Six objects and their AABB axes 1
49
Determine possible contacts

After sorting, collision detection happens in two
phases
Phase 1 broad phase
Traverse the axis add objects to possible
contact list when bi is encountered
For one dimensional case, when bi added to the
list, it means contact occurs with all other
objects in the list

50
Three dimensional case

Phase 1 for 3-D
Extend one dimensional contact check by checking
b and e for values along the y and z axes of the
other objects in the list
If contact check comes back positive for all 3
axes, add the object to the possible contact
list
Possible because

51
Need to verify collision

Tested positive for collision along all 3 axes

Figure 3. Left to right XY, XZ and YZ axes
testing positive for collision
52
Verifying collision

Phase 2 narrow phase
Just because all 3 axes intersect does not
necessarily mean contact has occurred
Remember, checking bounding boxes, not actual
object
Using spheres check distance between spheres vs.
respective radii

53
Implementation in CUDA

Can parallelize both broad and narrow phase
Accomplish this by assigning each object a thread
Same method, but requires two broad phase sweeps
Sweep 1 determine save number of collisions,
but dont save collision pairs
Do a prefix sum to determine amount of memory and
memory location to store each collision pair
Sweep 2 determine collision pairs and save them
to the correct memory location

54
Extending midterm to final project

Collision detection to be used for granular
dynamics
Use existing parallel algorithms to determine
dynamics of a system with many contacts
Integrate my collision detection program into
existing software
Bullet, ChronoEngine

55
References