Title: Teradata%20Architectural%20Summit%202007
1Teradata Architectural Summit 2007
- Yasir Hassan Maken
- Professional Services EMEA (GCC Pakistan)
-
2- What Data Skew can cause?
Why I cannot insert 50 GB evenly distributed data
in a database having 500 GB unused space?
Why we are not able to move only 10 of the
unused space from a database?
Why most of my requests are Spooling out?
Why most of the queries running on this database
usually have high CPU SKEW?
3Because
- Most of the database space is being wasted
- Almost all allocated space for a database on at
least one of the AMPs has been occupied - The chunk of upcoming data that needs to go on
the highly congested AMP, after distribution, do
not finds a space for accommodation - If space needs to be moved, the chunk of space
that is supposed to be fetched from the congested
AMP would not be allowed. As, equal space will be
fetched from each of the AMPs and the activity
will fail if one of the AMPs does not have FREE
SPACETotal Space to be moved/ of AMPs - During processing of any request on the database,
there would be high chances of spooling out on
the congested AMP as MaxPerm CurrPerm will be
the available Spool which will be less due to
high Skew on the AMP - In some cases, the CPU Skew of the sql request on
the skewed tables of the database will be more
due to uneven distribution of data and different
CPU required on different AMPs
4How to make things work normally?
Identify the Skewed tables
Check the row distribution per amp and find the
Skew with the suggested Index
SkewFactor 100 - (AVG(CurrentPerm)/MAX(CurrentPe
rm)100)
Check for Hash collisions
5Improvements
Now I can easily insert enough data in the
database
Now the Spool out issue has improved a lot
Now I can easily move space from the database if
there is any available space
The CPU Skew has decreased a lot
BEFORE
AFTER
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB