CS 501: Software Engineering - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

CS 501: Software Engineering

Description:

When the utilization of any hardware component exceeds 30%, be prepared ... Utilization is the proportion of the capacity of a service that is used on average. ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 34
Provided by: wya1
Category:

less

Transcript and Presenter's Notes

Title: CS 501: Software Engineering


1
CS 501 Software Engineering
Lecture 22 Performance of Computer Systems
2
Administration
Next week Tuesday, April 17 no class Thursday,
April 19 Quiz 4 Remember that the quiz may
contain material from all lectures up to and
including today.
3
Performance of Computer Systems
In most computer systems The cost of people is
much greater than the cost of hardware Yet
performance is important Future loads may be
much greater than predicted A single bottleneck
can slow down an entire system The choice of
systems architecture may lead to a system that
places great demands on the skills of the
implementers.
4
Performance Challenges
Tasks Predict performance problems before a
system is implemented Identify causes and fix
problems after a system is implemented Basic
techniques Understand how the underlying
hardware and networking components interact when
executing the system For each component
calculate the capacity and load Identify
components that are near peak capacity
5
Understand the Interactions between Hardware and
Software
Example execution of http//www.cs.cornell.edu/
domain name service TCP connection HTTP get
Client
Servers
6
Understand the Interactions between Hardware and
Software
Thread
Toolkit
ComponentPeer
targetHelloWorld
run
run
callbackLoop
handleExpose
paint
7
Understand Interactions between Hardware and
Software
start state
fork
join
stop state
8
Look for Bottlenecks
Possible areas of congestion Network
load Database access how many joins to build a
record? Locks and sequential processing CPU
performance is rarely a factor, except in
mathematical algorithms. More likely
bottlenecks are Reading data from disk
(including paging) Moving data from memory to CPU
9
Timescale
Operations per second CPU instruction 1,000,
000,000 Disk latency 60
read 25,000,000 bytes Network LAN
10,000,000 bytes dial-up modem 6,000
bytes
10
Predicting System Performance
Direct measurement on subsystem
(benchmark) Mathematical models
Simulation Rules of thumb All require
detailed understanding of the interaction between
software and hardware systems.
11
Look for Bottlenecks Utilization
Utilization is the proportion of the capacity of
a service that is used on average.
When the utilization of any hardware component
exceeds 30, be prepared for congestion. Peak
loads and temporary increases in demand can be
much greater than the average.
12
Mathematical Models
Queueing theory Good estimates of congestion can
be made for single-server queues with
arrivals that are independent, random events
(Poisson process) service times that follow
families of distributions (e.g., negative
exponential, gamma) Many of the results can be
extended to multi-server queues.
13
Mathematical Models Queues
arrive
wait in line
service
depart
Single server queue
14
Queues
service
arrive
wait in line
depart
Multi-server queue
15
Behavior of Queues Utilization
mean delay
utilization
1
0
16
Software development for high-performance systems
High-performance computing Large data
collections (e.g., Amazon) Internet services
(e.g., Google) Large computations (e.g, weather
forecasting) Must balance cost of hardware
against cost of software development Some
configurations are very difficult to program and
debug Sometimes it is possible to isolate
applications programmers from the system
complexities CS 530, Architecture of Large-Scale
Information Systems
17
Software development for high-performance systems
INTERNET ARCHIVE
Web Collection
Wayback Machine
Text indexes
TeraGrid
Computer clusters
File server
National super-computers
Structure database
Text indexes
Page store
CORNELL UNIVERSITY
18
Software development very large databases
Database symmetric multiprocessing (SMP) on a
shared memory machine. WebLab uses half a 32
CPU machine with 64 GBytes memory Hardware is
expensive. Software development uses
conventional database system, which does not
need specialist knowledge. But high
specialist knowledge is needed to organize data
on disks, connect disks to memory, configure
database for backup and restore.
19
Software development cluster computing
Computer clusters are built on large numbers of
commodity computers. Hardware is cheap(ish). Web
Lab uses two cluster (Linux and Windows). The
Linux cluster is108 2.4 MHz dual processors,
each with 2 GB memory, and 72 GB disk, connected
by 10 Gbit/sec Ethernet. Programming requires
software techniques that understand the
hardware configuration, e.g., MPI (message
passing interface). New techniques (e.g.,
map/reduce) emphasize simple applications
code, so that moderately skilled programmers do
not need to understand complexities of
parallel programming.
20
Measurements on Operational Systems
Benchmarks Run system on standard problem
sets, sample inputs, or a simulated load on the
system. Instrumentation Clock specific
events. If you have any doubt about the
performance of part of a system, experiment with
a simulated load.
21
Example Web Laboratory
Benchmark Throughput v. number of CPUs on SMP
total MB/s
average / CPU
22
Techniques Simulation
Model the system as set of states and
events advance simulated time determine
which events occurred update state and event
list repeat Discrete time simulation Time is
advanced in fixed steps (e.g., 1
millisecond) Next event simulation Time is
advanced to next event Events can be simulated by
random variables (e.g., arrival of next customer,
completion of disk latency)
23
Case Study Performance of Disk Array
When many transaction use a disk array, each
transaction must wait for specific disk
platter wait for I/O channel signal to move
heads on disk platter wait for I/O
channel pause for disk rotation read data Close
agreement between results from queuing theory,
simulation, and direct measurement (within 15).
24
Example Web Laboratory
Balance of Resources Ideal Realistic Networkin
g 500 Mbit/sec 100 Mbit/sec Data online all few
crawls/year Metadata online all all? Disk 750
TB 240 TB Tape archive all few
crawls/year Computers research shared separate
with storage
25
Fixing Bad Performance
If a system performs badly, begin by identifying
the cause Instrumentation. Add timers to the
code. Often this will reveal that the delays are
centered in one specific part of the
system. Test loads. Run the system with
varying loads, e.g., high transaction rates,
large input files, many users, etc. This may
reveal the characteristics of when the system
runs badly. Design and code reviews. Have a
team review the system design and suspect
sections of code for performance problems. This
may reveal an algorithm that is running very
slowly, e.g., a sort, locking procedure, etc. Fix
the underlying cause or the problem will return!
26
Predicting Performance ChangeMoore's Law
Original version The density of transistors in
an integrated circuit will double every year.
(Gordon Moore, Intel, 1965) Current
version Cost/performance of silicon chips
doubles every 18 months.
27
Moore's Law Rules of Thumb
Planning assumptions Every year
cost/performance of silicon chips improves 25
cost/performance of magnetic media improves
30 10 years 1001 20 years 10,0001
28
Moore's Law and System Design
Design system 2006 Production use
2009 Withdrawn from production
2019 Processor speeds 1 1.9
28 Memory sizes 1 1.9 28 Disk
capacity 1 2.2 51 System
cost 1 0.4 0.01
29
Moore's Law Example
Will this be a typical personal computer?
2007 2019 Processor 2 GHz 50
GHz Memory 512 MB 14 GB Disc 50 GB 2
TB Network 100 Mb/s 1 Gb/s
Surely there will be some fundamental changes in
how this this power is packaged and used.
30
Parkinson's Law
Original Work expands to fill the time
available. (C. Northcote Parkinson) Planning
assumptions (a) Demand will expand to use all
the hardware available. (b) Low prices will
create new demands. (c) Your software will be
used on equipment that you have not envisioned.
31
False Assumptions from the Past
Unix file system will never exceed 2 Gbytes (232
bytes). AppleTalk networks will never have more
than 256 hosts (28 bits). GPS software will not
last 1024 weeks. Nobody at Dartmouth will ever
earn more than 10,000 per month. etc., etc.,
.....
32
Moore's Law and the Long Term
What level?
1965
2005
33
Moore's Law and the Long Term
What level?
Within your working life?
1965
When?
2006?
Write a Comment
User Comments (0)
About PowerShow.com