Grid Computing at Texas Tech University using SAS - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Grid Computing at Texas Tech University using SAS

Description:

Current fastest machine: ~40 Teraflops ($300M) 10 Tflops Machines (~$50M) ... Teraflops = (N computers) x (TFLPs per) For Free! ( Almost) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 28
Provided by: clas102
Category:

less

Transcript and Presenter's Notes

Title: Grid Computing at Texas Tech University using SAS


1
Grid Computing at Texas Tech University using SAS
  • Ron Bremer
  • Jerry Perez
  • Phil Smith
  • Peter Westfall
  • Director, Center for Advanced Analytics and
    Business Intelligence
  • Texas Tech University

2
What is Grid Computing?
  • Grid computing means using multiple resources
    connected by the net to perform demanding
    calculations.
  • Example

3
Economies of High Performance Computing
  • Current fastest machine 40 Teraflops (300M)
  • 10 Tflops Machines
  • (50M)
  • Fastest Cluster at TTU 0.1 Tflops (0.1M)
  • Speed of a PC 0.003 Tflops
  • (.001M)

4
Underused Resources
  • Computers are everywhere, mostly idle!
  • Grid computing leverages unused resources to
    create an effective Supercomputer
  • Teraflops (N computers) x (TFLPs per)
  • For Free! (Almost)

5
Grid Initiatives at TTU and in Texas
  • HipCAT High Performance Computing Across Texas
  • TIGRE Texas Internet Grid for Research and
    Education
  • SORCER Service ORienter Computing EviRonment
    (TTU CS dept.)
  • SAS/Connect grid

6
HipCAT
  • Consortium of Texas institutions working together
    to use
  • High performance computing
  • Clusters
  • Massive data storage
  • Scientific visualization
  • Grid computing.
  • Director Phil Smith, Texas Tech University
  • Members
  • Baylor College of Medicine
  • Rice University
  • Texas AM University
  • Texas Tech University
  • University of Houston
  • University of Texas
  • University of Texas at Austin
  • University of Texas at Arlington
  • University of Texas at El Paso
  • University of Texas Southwestern Medical Center

7
TIGRE
  • Texas Internet Grid for Research Education
  • Two year project involving UT, TTU, UH, Rice,
    and TAMU
  • Funding announced by the Governor in September
  • TIGRE will develop a grid software stack and
    policies and procedures to facilitate Texas grid
    computing efforts.

8
Grid Software Products Used at TTU
  • AVAKI
  • Globus
  • Jini Networking Technology
  • SAS/Connect (MPConnect), Distribute macro

9
Benefits of SAS
  • Ease of Use (relative to other grid products)
  • Available and applicable for many scientists in
    their resp. fields
  • Flexibility
  • Data base (DATA step, PROC SQL)
  • Math/Optimization (SAS/IML, SAS/OR)
  • Stat (SAS/STAT, SAS/ETS)

10
Problems Amenable to SAS Grid
  • Replicates of Fundamental task
  • Fundamental tasks are time consuming, lots of
    replicates
  • Examples
  • Simulation
  • Astrophysics
  • Bioinformatics
  • Ensembles of predictive models

11
Success Story
  • Financial Event Studies
  • Developed simulation tool to detect events
  • Simulated its performance
  • 25 hours finished in 40 minutes
  • Published in J. Fin. Econometrics
  • Old system Sneaker grid

12
Another Success StoryPortfolio Analysis
  • 300 portfolios, 50 securities each by randomly
    sampling securities from CRSP daily database
    (7.23 Gigabytes)
  • 15 models created for each of 50 securities (PROC
    AUTOREG of SAS/ETS), under 169 treatment
    settings.
  • 126,750 models and associated data steps per
    portfolio.
  • 500 days of continuous computing time reduced to
    two weeks.

13
Notoriety
  • Web articles appeared in SAS, Grid today,
    Next-Gen Data forum
  • Interviewed by DataBase Trends and Applications

14
SAS Grid Structure
  • Client connects to host machines
  • Client sends replicates of fundamental task
    (chunks) to hosts
  • Hosts process chunks, send back to client
  • Client combines chunks and summarizes

15
The SAS Grid
16
SAS Farm
  • 100 SAS machines in student lab
  • 2.66 GhZ per node
  • All have SAS software installed
  • SAS Spawner must be started on all
  • Avaki also installed - diagnoses problems

17
Student Lab
18
Load Balancing
  • Automatically supports load balancing by farming
    out independent tasks to the next available
    resource.
  • Students never noticed that their machines were
    being used!

19
Simulation-Based Methods
  • PROC MULTTEST of SAS/STAT(first hard-coded
    bootstrap?)

20
Simulation-Based Methods, II
  • Adjustsimulate in GLM and MIXED
  • Posterior simulation in MIXED

21
Toy Example Testing Random Number Generators
  • Random number generators often fail to provide
    independent numbers.
  • Test case U1, U2 are Uniform on (0,1).
  • If independent, then E6(U1-U2)2 1.00.
  • Check Generate many pairs, report average
    (should be 1.000000)

22
Code
23
Results
24
Startup (Windows)
1. Start Spawner
C\Program Files\SAS\SAS 9.1gtspawner -i -comamid
tcp
2. Activate Spawner
3. Set batch log in permissions
25
The Distribute Macro
  • Written by Cheryl Doninger and Randy Tobias
  • File http//support.sas.com/rnd/scalability/paper
    s/distribute.zip
  • Supporting document
  • http//support.sas.com/rnd/scalability/papers/dis
    tConnect0401.pdf

26
Problems We Have Experienced
  • Random crashes (client as well as hosts)
  • Diagnosing errors
  • I/O problems
  • Windows Service Pack 2 Firewall
  • Social issues (grid involves people!)

27
Future Plans
  • Support from business and government
  • grid-enabled bioinformatics
  • business intelligence/data mining
  • Support HPC at TTU and in Texas
Write a Comment
User Comments (0)
About PowerShow.com