Title: TPB Models Development Status Report
1TPB Models Development Status Report
- Presentation to the
- Travel Forecasting Subcommittee
- Ron Milone
- National Capital Region Transportation Planning
Board (TPB) - January 23, 2009
tfs_2009-01-23_modelsDevStatus_cubeCluster5.ppt
2Version 2.3 Model Development
- Activities in motion since last meeting
- 2007/8 HH travel survey file cleaned
- Updated (3,700) TAZ system remains under review
- Project for improving the TPBs use of GIS
technology to facilitate network development
remains in progress - Approaches for reducing Version 2.3 model
execution times have been explored
3Speeding up model executions
- Four approaches identified
- Faster hardware ?
- New traffic assignment solution algorithms under
development by Citilabs, Inc. ? - Decrease the number of speed feedback iterations
? - Implementation of distributed processing (DP)
capability that currently exists in Cube Voyager
?
4Reducing Speed Feedback Iterations in Version 2.3
- Investigation Pre-existing model outputs
summarized by iteration (year 2030) - VMT by facility type
- Total transit trips by trip purpose
5Version 2.3 VMT by iteration
6Ver.2.3 Transit Trips by Iteration
7Conclusions based on iteration summaries
- Global metrics summaries indicate that results of
iteration 3-4 are not substantially different
than those of iteration 6 - Finer detail iteration summaries should be
analyzed (e.g., screenline, link level, possibly
i/j level)
8Implementing Distributed Processing
- Current hardware/software specifications used by
TPB - Preliminary DP work Identifying the most
time-consuming modeling steps - Overview key points regarding the deployment of
DP with Cube Voyager (a.k.a. Cube Cluster) - Experience gained by TPB thus far
9Travel model server specifications
- Hardware (cogtms002)
- High-end workstation (not a true server)
- Two CPUs, each with four cores 8 cores
- Intel Xeon X5365 Chip speed 3.0 GHz
- System bus 1,333 MHz L2 Cache 8 MB per CPU
- Memory 4 GB RAM
- Hard drive 2.27 TB direct-attach storage array
- Software
- Server operating system, so that the computer can
operate like a server (Windows Server 2003) - Citilabs Cube Base 5.0.2 Citilabs Cube Voyager
5.0.2 - Note We have moved from TP 4.1.1 to Voyager
5.0.2 for the Ver. 2.3 model - Server (cogtms002) is shared by 4-6 modelers
10Cube Cluster Preliminary work
- Identify the most time-consuming modeling steps
(Ver. 2.3 model) - Model execution time by iteration (cogtms002)
11Cube Cluster Preliminary work
- Model execution time by modeling step (iteration
6)
Minutes ?
Model Step
12Cube Cluster Overview
- Spread the computing load across
- Multiple computers connected via a LAN
- Multiple CPUs within one computer, or multiple
cores within a CPU or a set of CPUs in a computer
(the current approach being tested by TPB staff)
or - Both
- Each processor or core is referred to as a node
- There is generally a main process and one or more
sub-processes - Cube Cluster works with Voyager, not with TP
- Two flavors of distributed processing in Voyager
- Intra-step distributed processing (IDP)
- Multi-step distributed processing (MDP)
13Cube Cluster Overview
- Intra-step distributed processing (IDP)
- IDP breaks up zone-based processing of vectors or
matrices into zone groups that can be processed
concurrently on multiple computing nodes - Works for only two modules MATRIX, HIGHWAY
- Multi-step distributed processing (MDP)
- More general than IDP
- Can be used to break up processing conducted by
any module in Voyager, as well as any
user-written program (e.g., Fortran) - Caveat the distributed blocks and the mainline
process must be logically independent of each
other. - For example, you cannot run path skimming
procedures before you update the speeds on the
network that will be skimmed. However, you can
assign peak and off-peak networks concurrently in
most models, since these steps are generally
independent of each other.
14Cube Cluster Key points
- Because of the zone-independent requirement on
IDP and the step-independent requirement on MDP - it requires careful planning and setup by the
user to implement DP - Cube Cluster has limited error-handling
capabilities - It uses a file-based signaling method to
communicate between the main process and the
sub-process(es) - If a sub-process crashes, the main process will
wait indefinitely - Best to use DP on a model that has been cleaned
of syntax errors - In general, DP works well for computationally
intensive applications (e.g. doing hundreds of
matrix computations for each zone in a mode
choice step), but will result in less time
savings for disk intensive procedures (e.g.
combing 3 matrix files into one matrix file)
15Cube Cluster TPB experience
- AECOM did some work in this area, while under
contract with us - Work is undocumented, but we have reviewed some
of their work - Lesson Things can get complicated
- TPB staff
- Tested IDP (for MATRIX and HIGHWAY) on the
highway assignment step, since this is the most
time-consuming step
16Cube Cluster TPB experienceAdding code for DP
- Global control of DP options (in your script)
- DISTRIBUTE INTRASTEPT MULTISTEPF
- Initiate IDP of the current MATRIX or HIGHWAY
step (in your script) - DistributeINTRASTEP ProcessID'mwcog',
ProcessList1-4 - Open up one or more cluster nodes
- Interactively Cube Base gt Utilities gt Cluster
Node Management - Command line
- Voyager
- start Voyager.exe mwcog1.script /wait -Pvoya
- Cluster utility
- start Cluster ProcID ProcList
Start/Close/List Exit - These commands would generally be in the batch
file used to launch your script. Up to now, we
have used the interactive approach and the
Voyager command.
17Cube Cluster TPB experience
- Highway assignment Running IDP with 3-4
sub-processes results in a 50 time savings - 83 minutes should now take about 42 min.
- Time savings
- 42 x 7 294 min (4.9 hours)
- So 18.5 hours becomes 13.6 hours (25 savings)
18Conclusions
- Running time can be reduced by
- Reducing the number of speed feedback steps
- Implementing DP
- New hardware (?) We need to use benchmarks to
assess speed, not simply the rated speed of the
CPU in GHz - Future steps
- Optimizing the use of IDP and/or MDP in executing
traffic assignment and other steps (mode choice
and transit fare building) - Investigate improved traffic assignment
algorithms - These methods reach a higher level of convergence
using a smaller number of iterations, so they
have the potential to save us time