Multi-tier Data Access and Hierarchical Memory Design: Performance Modeling and Analysis - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Multi-tier Data Access and Hierarchical Memory Design: Performance Modeling and Analysis

Description:

Multi-tier Data Access and Hierarchical Memory Design: Performance Modeling and Analysis Marwan Sleiman PHD Defense Department of Computer Science & Engineering – PowerPoint PPT presentation

Number of Views:247
Avg rating:3.0/5.0
Slides: 48
Provided by: Marw3
Category:

less

Transcript and Presenter's Notes

Title: Multi-tier Data Access and Hierarchical Memory Design: Performance Modeling and Analysis


1
Multi-tier Data Access and Hierarchical Memory
Design Performance Modeling and Analysis
  • Marwan Sleiman
  • PHD Defense
  • Department of Computer Science Engineering
  • University of Connecticut
  • 371 Fairfield Road Unit 2155
  • Storrs, CT 06269
  • Major advisor Dr. Lester Lipsky
  • Associate advisors
  • Dr. Reda Ammar
  • Dr. Swapna Gokhale
  • Dr. Chun-Hsi Huang

2
Overview of the Presentation
  • Introduction previous work
  • Motivation and Objectives
  • Markov-Chain model and performance metrics
  • Interdependence of the hit ratios between the
    levels
  • Design Constraints
  • Approximation Function
  • Power-Tailed aspect of the memory access
  • Effect of increasing the cost on performance
  • Improving the performance while maintaining a
    constant cost
  • Optimization techniques
  • Performance measures
  • Conclusion and Future work

3
Hierarchy of Storage Systems
  • Storage systems are present in several forms and
    on different hierarchical levels they expand the
    concept of the classical hierarchy beyond the
    local machine
  • Registers, caches, main memory (RAM), disks,
    tapes, middle-tiers, network storage, internet
    storage.
  • Storage systems provide the basic functions of
  • storing data permanently
  • holding data until it is accessed and processed

4
Hierarchical Memory Model
5
Storage Systems and Performance
  • Fast memory access is vital to achieving
    superior system performance
  • Because of the gap between the CPU speed and
    memory access time,
  • memory access time is increasingly becoming
    bottleneck to system performance
  • Thus the applications cannot benefit from a
    processor clock-speed upgrade
  • Speed is expensive! gt cost must be optimized

6
Solution
  • Increasing the speed and size of the existing
    levels.
  • Inserting smaller and faster intermediate memory
    levels
  • Which one is better?
  • We need to evaluate the cost and performance of
    each alternative

7
Previous Work
  • Du et. al. 00 showed the importance of the depth
    of the memory hierarchy as a primary factor on a
    cluster of workstations but their results are
    dependent on the workload type.
  • D. G. Dolgikh et al 01 show the importance of
    developing an analytical model to optimize the
    use of web cashes.
  • Jin et al. 02 developed a limited analytical
    model that captures only a two-level cache, but
    we see in their work a big discrepancy between
    the predicted and measured memory performance.
  • El-Zanfaly et. Al 04 presented an analytical
    model to study the performance of Multi-Level
    cashes in Distributed Database Systems.
  • Garcia Molina and Rege 76 and Nagi 06
    demonstrated that, in some cases, it is more
    suitable to use a slower CPU for effective
    utilization of memory.
  • E. Robinson and G. Cooperman 06 showed that, in
    certain conditions, it can be more efficient to
    discard the memory and use a disk-based
    architecture than using the memory which means
    reducing the memory hierarchy.

8
Motivation
  • Memory hierarchy is becoming more complex.
  • Memory access time differs from application to
    application
  • The average memory access time is a crucial
    factor in system performance, also other
    performance metrics and measures may be important
    and must be taken into consideration.
  • Despite having small mean time for certain
    cases, for an infinite hierarchy, the time may
    have unbounded higher jth moments ET- Tavg j
    ? gt Long queues of memory accesses and hence can
    take quite some time draining them out while
    affecting system performance severely (pipelined
    processors shared memory cases)
  • It is necessary to develop a universal model
    able to cover all possible cases.

9
Our Objectives
  • Several objective functions help us improve the
    performance of hierarchical memory systems
  • Minimizing the mean memory access time.
  • Minimizing the memory queueing time for a given
    arrival rate by using the P-K formula for an
    M/G/1 Queue.
  • Minimizing the probability of exceeding a long
    delay time
  • Do these objectives have the same optima? If not,
    what about a Trade-off? What is (are) the best
    optimization technique(s)?
  • Proposed an approximation to the above functions
    by maximizing the ratio of time lag to variance
    for a given objective time to minimize the width
    of the confidence interval and reduce the
    probability of exceeding the target time.

10
Moments of Memory Access Time MARKOV-CHAIN
Model
11
State Transition Diagram
12
Notation
  • n is the depth of the memory hierarchy, Ln1 is
    the total number of memory levels
  • P is the sub-stochastic matrix that corresponds
    to the transitions from one state to another one.
    Its dimension is (2n1)(2n1)
  • p is the entrance vector that corresponds to the
    state of the system at the first memory request.
    p is a row vector of size 2n1, where n is the
    number of intermediate levels.
  • p 1 0 0 .0
  • is the unit column vector of size 2n1.
  • M is the transition rate matrix it corresponds
    to the rates of leaving the state. M is a
    diagonal matrix of dimension (2n1)(2n1).
  • I is the identity matrix of the same dimension as
    P and M.
  • h is the hit ratio, h 1-h.

13
Sub-stochastic Matrices
B M(I P) and V B-1.
14
Access Time Calculations
  • Let X be the random variable denoting the memory
    access time.
  • Assuming we have memories with exponential
    service times, the pdf of the ith memory level is
    given by
  • The jth moment of the access time is given by
  • The mean access time is given by the first moment
    and does not depend on whether the memory service
    time is exponential or not.
  • The variance of the access time is given by

15
Non Exponential Memories
  • Each node is represented by the vector-matrix
    pair ltpi,Bigt
  • its pdf is
  • The mean access time remains the same.
  • However the variance differs by a correction
    term compared to the exponential case ?e

Cvi2 is the coefficient of variation of the
non-exponential stage i. For Exponential
memories, This is an innovation! (CATA 2006)
16
More performance metrics
  • Let Ty be the random variable denoting the mean
    system time.
  • The Pollaczek-Khintchine formula (called P-K
    formula) is used to calculate the mean waiting
    time spent by a customer in an M/G/1 Queue
  • Where,
  • is coefficient of variation,
  • is the utilization factor,
  • is the arrival rate.
  • The probability of exceeding a long delay time,
    Pr(Xgtz), is given by the reliability function for
    our hierarchical memory system

17
Interdependence of the Hit Ratios
  • Let Y be the Random Variable representing the
    data fetched in memory and Mi the dataset in
    memory level i, Mi? Mi1. We define the following
    additional terms
  • the probability of finding data in the
    intermediate memory level i
  • the size of memory level i.
  • the cost per unit of size of each memory level
    i.
  • We assume that
  • , where is a constant
  • The total cost of the L-level hierarchical
    system becomes

18
Interdependence of the Hit Ratios (continued)
  • is the local hit ratio at memory level i,

19
Design Constraints
  • If we consider any L-level hierarchical
    memory with total cost C, there are constraints
    on the sizes of the levels we can select
  • For simplicity of calculations, we assume in
    what follows that the memory access time
    increases geometrically from one level to the
    next and the cost decreases geometrically

20
Power-tailed Aspect of the Memory Access Time
  • A power-Tailed (also called Pareto Distribution)
    function with parameter is a function that
    has infinite high moments.
  • Its reliability function,
  • We showed that, if we have the same hit ratio, h,
    at all levels, the moments become unbounded as
    the number of hierarchies goes to infinity
  • For (1-h)? j ?1, iff
  • where,

21
Simulations
  • Three plots for the reliability function with
    power tails obtained by simulating 100, 000
    memory accesses
  • The system has 10 memory levels with hit-ratios h
    0.3, 0.5 and 0.7 with ยต 1, ?1 and ? 2
  • The slopes of the plots are equivalent to the
    slopes (? gt 0) in the reliability function for
    the power-tailed distribution

22
Log R(z) vs Log(z) plot shows power-tailed aspect
of access time
23
Two-Level Cache Memory
1- h1
XL2
XL1
L2
L1
1-h2
1
M
S0
h2
h1
1
D1
D2
1
1
XD2
XD1
24
Effect of doubling the Cost on Exponential
Non-Exponential Memories in a 2-level cache memory
  • Plots of the mean and variance for exponential
    and non-exponential 2-level memory hierarchies
    versus the size of the outer memory. The lines
    correspond to the original system and the dashes
    correspond to a system with double cost. The
    non-exponential has a gamma of 4.

25
Behavior of the Hit Ratios in a 2-Level Cache
Memory.
  • Behavior of the memory hit ratios as we change
    the size of the lower/outer memory level and
    increase the cost of the memory system. A level
    may become obsolete because it has a low hit
    ratio.

26
Queueing Time vs Access Time
  • Mean memory access time E(X) and mean queuing
    time E(Tl) versus the size S of the Outer-level
    memory in a 2-Level hierarchical memory system.
    E(X) has its minimum for S 71, while E(Tl) has
    different minima depending on the value of l.

There is a difference of 8 between MinE(T)
and its value at MinE(X) for the same outer
memory size! This difference increases as the
arrival rate increases.
27
Inserting an Upper Faster Level
XL2
XL1
1- h1
XL0
C1
C2
1- h0
1-h2
1
C0
Cm
h1
h2
a1
S0
a2
h0
a0
XD0
XD1
XD2
D0
1
D1
D2
1
1
28
Increasing the Size vs Inserting Intermediate
Levels
  • Increasing the size of the exiting levels versus
    inserting intermediate memory levels.

29
Exceeding a Long Delay Time
The probability of exceeding a long delay time is
given by the reliability of our hierarchical
memory system
  • Plots of the mean, variance and reliability
    function for exponential 2-level and 3-level
    memory hierarchies versus the size of the outer
    memory. The straight lines correspond to 2-level
    memory systems and the dotted lines correspond to
    3-level memory system. The mean is plotted in
    blue and the probability of exceeding a target
    time is plotted in green.

30
Effect of the memory levels on the probability of
exceeding a long delay time
  • Effect of the memory levels on the probability
    of exceeding a long delay time on a log scale
    The curve of the reliability is steeper when the
    system includes an upper level memory system.
    E(X) 4.78 for the 3-level memory, E(X)6.91 for
    the 2-level memory with the upper level removed,
    and E(X)5.81 for the 2-level memory with the
    upper level removed.

31
Probability of Exceeding a Long Delay (continued)
  • Probability of exceeding a long delay time for
    2-Level and 3-Level hierarchical memories on a
    log scale As the access time becomes greater
    than 100ns, the reliability curves become tangent
    to their asymptotes. E(X) 4.78 for the 3-level
    memory, and E(X)6.91 for the 2-level memory.

32
Exceeding a Long Delay Time Asymptotic Behavior
  • From the spectral decomposition theorem, R(z) is
    given by


  • (1)
  • Where
  • is the ith Eigenvalue of the matrix B
  • is the ith column Eigenvector of the matrix B,
    that is
  • is the ith row Eigenvector of the matrix B, that
    is
  • The probability of getting memory requests that
    take a relatively long time along this stochastic
    hierarchy is given by finding the limit of R(z)
    as z becomes very high and it is dominated by the
    mth term of R(z) having the smallest Eigen-Value.
  • Let thus ,
    and (2)
  • So if we plot the probability of exceeding time x
    on a semi-log scale, we find out that it
    approaches the curve that intercepts
    the y-axis on a semi-log graph at the value

33
Exceeding a Long Delay Time 3-D
  • 3-D plot of the probability of exceeding a Long
    Delay Time R(z) for a three-level memory versus
    the size of the intermediate memory levels Sb is
    the index of the upper memory level and Sc is the
    index of the lower level. We remark here that the
    curve of R(z) is steeper with respect to the
    upper level growth because R(z) is more sensitive
    to it, however it is more flat with respect to
    the lower level because it is less sensitive to
    it.

34
Optimization techniques
  • Local search
  • Lagrange Multipliers Method
  • We assume that

35
Analytic Solution for Minimizing E(X)
  • Optimizing E(X) versus the total cost
  • Because our model is a Feed-forward Network, the
    total access time for this memory system is given
    as a function of the intermediate memory sizes
    by
  • So we have to optimize subject to the
    following total cost constraint
  • By using Lagrange Multipliers method, we will
    have

36
Lagrange Multipliers method and constant hit
ratios
  • By solving these equations, we get

37
Plot of the hit ratios at steady state (PT)
  • Hit Ratios versus cost for a three-Level
    Hierarchical memory h2 and h3converge to a
    constant determined by

38
Difference between E(X) and E(Tl) for a 3-level
hierarchy
  • Mean memory access time E(X) and mean queuing
    time E(Tl) versus the total memory cost for a
    3-Level Hierarchical memory. The difference
    between the minimal queueing time and the value
    of queueing time at the optimal mean memory time
    is more significant here and is of the order of
    15.

39
Difference between E(x) and E(Tl) for a 3-level
hierarchy
  • Optimal system time, MinE(Tl) versus the value
    of E(Tl) at the optimal mean system time, E(X),
    versus the total memory cost for a 3-Level
    Hierarchical memory. The relative difference
    between the minimal queueing time and the value
    of queueing time at the optimal mean memory time
    decreases as we decrease the cost.

40
Performance Measures
  • Different hierarchical memory architectures with
    intermediate levels at different locations The
    closer the memory is to the CPU, the smaller and
    faster it is.

41
Performance Measurements
C C1 C2 C3 Architecture Min(X) StdDev _at_ minX h1 R(8)
512 32 8 2 3-LVL 4.82 22.26 0.75 0.0958
512 32 8 2 2-LVL, L1 rem 6.96 19.66 N/A 0.231
512 32 8 2 2-LVL, L2 rem 6.13 31.74 0.93 0.0628
512 32 8 2 2-LVL, L3 rem 5.13 28.41 0.83 0.0557
512 50 10 2 3-LVL 5.6 23.29 0.75 0.1152
512 50 10 2 2-LVL, L1 rem 7.13 19.87 N/A 0.2348
512 50 10 2 2-LVL, L2 rem 6.74 29.8 0.88 0.0947
512 50 10 2 2-LVL, L3 rem 6.32 24.06 0.71 0.0895
512 72 12 2 3-LVL 7.35 29.27 0.75 0.1334
512 72 12 2 2-LVL, L1 rem 7.31 20.07 N/A 0.2382
512 72 12 2 2-LVL, L2 rem 8.0749 30.62 0.81 0.1381
512 72 12 2 2-LVL, L3 rem 7.72 33.78 0.59 0.1352
1024 32 8 2 3-LVL 3.73 18.16 0.75 0.0758
1024 32 8 2 2-LVL, L1 rem 6.49 19.11 N/A 0.217
1024 32 8 2 2-LVL, L2 rem 4.19 26.14 0.96 0.0316
1024 32 8 2 2-LVL, L3 rem 3.6 23.64 0.94 0.0312
1024 50 10 2 3-LVL 3.9171 18.9313 0.75 0.0789
1024 50 10 2 2-LVL, L1 rem 6.2 15.59 N/A 0.219
1024 50 10 2 2-LVL, L2 rem 4.04 22.11 0.94 0.0475
1024 50 10 2 2-LVL, L3 rem 3.79 21.46 0.84 0.053
1024 72 12 2 3-LVL 4.09 19.18 0.75 0.0833
1024 72 12 2 2-LVL, L1 rem 6.28 15.71 N/A 0.221
1024 72 12 2 2-LVL, L2 rem 4.69 22.74 0.91 0.0669
1024 72 12 2 2-LVL, L3 rem 4.43 23.92 0.79 0.0695
42
Observations
  • Observation 1 for the same cost, inserting an
    intermediate memory at the upper level results in
    a system with a lower mean time.
  • Observation 2 for the same cost, inserting an
    intermediate memory at the upper level results in
    a system with a smaller probability of exceeding
    a small delay time and higher probability of
    exceeding a high delay time.
  • Observation 3 for the same cost, inserting an
    intermediate memory at the upper level results in
    a system with a worse variance regardless of the
    distribution of the service time of the
    intermediate memory levels.
  • Observation 4 a higher variance corresponds to a
    lower hit ratio at the upper memory levels.
  • Observation 5 the variance of the memory access
    time is relatively high. Such a high variance can
    dramatically affect the performance of some
    architectures sensitive to a high access time
    such as pipelined, decoupled, and multi-grid
    architectures. So it is important to consider
    optimizing the variance of hierarchical storage.
  • Observation 6 doubling the cost of the
    hierarchical memory has a positive effect on all
    the performance metrics but in different ratios.
    Each of the performance metrics improves in a
    different way than the others to the modification
    of the memory architecture (number of levels,
    size, cost, etc) because these performance
    metrics have different optimal points.
  • Observation 7 the probability of exceeding a
    target time is more sensitive to the upper memory
    level than the lower level and it improves at a
    faster rate by optimizing the upper level size
    than by optimizing the lower levels.
  • Observation 8 if cost and speed are proportional
    (i.e. there is a geometric relationship between
    the levels), we get an optimal access time when
    we have CiSi Ci1Si1that is when we invest the
    same in each level.
  • Observation 9 there is a linear relationship
    between the probability of going to the main
    memory, Pm, and the value of am in equation 2. We
    have found that the ratio is a constant

43
Conclusions
  • Markov-Chains can model the access time of
    hierarchical memories.
  • Our analytical model is very powerful and
    universal and very flexible.
  • The hierarchical memory access can be
    power-tailed.
  • The variance is not the same for non-exponential
    memory stages.
  • The different performance metrics dont have the
    same optima gt designing an optimal system is
    application dependent.

44
Contributions
  • Robust Analytical Model (Independent of the
    application, number of levels and architecture)
  • New performance Metrics
  • Effect of location and proximity of the Memory
    Levels
  • Power tailed aspect of the memory access

45
Future Work
  • Running more simulations to validate our model
    and make sure it is realistic and reflects the
    real computing environment.
  • Using memory profiling-related tools (PurifyPlus,
    Valgrind, Insure, VTune)
  • Including models that account for localities and
    working sets.
  • Studying the sensitivity to each performance
    metric and finding its effect on performance.
  • Trying different architectures (like decoupled
    architecture, dual-processor and shared memory).
  • Studying memory hierarchies with more levels.
  • More optimization techniques like NN.

46
Publications
  • Moments of Memory Access Time for Systems with
    Hierarchical Memories, 21st International
    Conference on Computers and Their Applications
    (CATA-2006), Seattle WA, March 2006. With Lester
    Lipsky and Kishori Konwar.
  • Performance Modeling of Hierarchical Memories,
    19th international conference on computer
    applications in industry and engineering
    (CAINE-2006), Las Vegas, Nevada USA, November
    13-15, 2006. With Lester Lipsky and Kishori
    Konwar.
  • Multi-channel Software-Oriented Pulse Width
    Modulation (SPWM),21st International Conference
    on Computers and Their Applications (CATA-2006),
    Seattle WA, March 2006.
  • Dynamic Resource Allocation of Computer Clusters
    with Probabilistic Workloads, Marwan Sleiman,
    Lester Lipsky, and Robert Sheahan in the
    proceedings of the 20th IEE International
    Parallel Distributed Processing Symposium,
    April 25-29 Rhodes Island, Greece.
  • Multi-Tier Data Access Hierarchical Memory
    Optimization, submitted to the 20th
    International Conference on Parallel and
    Distributed Computing Systems. With Lester
    Lipsky.
  • Moments and Distributions of Response Time for
    Systems with Hierarchical Memories, submitted to
    the International Journal of computers and Their
    Applications.
  • Performance Metrics of Hierarchical Memories,
    to be submitted to the International Journal of
    computers and Their Applications.

47
The End
  • Questions Suggestion?
  • marwan_at_engr.uconn.edu
  • Thank You!
Write a Comment
User Comments (0)
About PowerShow.com