Title: PowerPoint presentation template
1SINGA Putting Deep Learning into the Hands of
Multimedia Users
http//singa.apache.org/ Wei Wang, Gang Chen,
Tien Tuan Anh Dinh, Jinyang Gao, Beng Chin Ooi,
Kian-Lee Tan, and Sheng Wang
2- Introduction
- Multimedia data and application
- Motivations
- Deep learning models and training, and design
principles - SINGA
- Usability
- Scalability
- Implementation
- Experiment
3Introduction
VocallIQ (acquired by Apple)
Audio
Madbits (acquired by Twitter)
Multimedia Data
Perceptio (acquired by Apple)
Image/video
LookFlow (acquired by Yahoo! Flickr)
Deepomatic (e-commerce product search)
Descartes Labs (satellite images)
Text
Clarifai (tagging)
Ldibon
ParallelDots
Deep Learning has been noted for its
effectiveness for multimedia applications!
AlchemyAPI (acquired by IBM)
Semantria (NLP tasks gt10 languages)
4Motivations
Model Categories
Feedforward Models
CNN, MLP, Auto-encoder Image/video classification
CNN
Krizhevsky, Sutskever, and Hinton, 2012 Szegedy
et al., 2014 Simonyan and Zisserman, 2014a
5Motivations
Model Categories
Feedforward Models
CNN, MLP, Auto-encoder Image/video classification
DBN, RBM, DBM Speech recognition
Energy models
DBN
RBM
Dahl et al., 2012
6Motivations
Model Categories
CNN, MLP, Auto-encoder Image/video classification
DBN, RBM, DBM Speech recognition
RNN, LSTM, GRU Natural language processing
Mikolov et al., 2010 Cho et al., 2014
7Motivations
Model Categories
CNN, MLP, Auto-encoder Image/video classification
Design Goal I Usability easy to implement
various models
DBN, RBM, DBM Speech recognition
RNN, LSTM, GRU Natural language processing
8Motivations Training Process
- Training process
- Update model parameters to minimize prediction
error - Training algorithm
- Mini-batch Stochastic Gradient Descent (SGD)
- Training time
- (time per SGD iteration) x (number of SGD
iterations) - Long time to train large models over large
datasets, e.g., 2 weeks for training Overfeat
(Pierre, et al.) reported by Intel
(https//software.intel.com/sites/default/files/ma
naged/74/15/SPCS008.pdf).
9Motivations Distributed Training Frameworks
- Synchronous training (Google Sandblaster, Dean et
al., 2012 Baidu AllReduce, Wu et al., 2015) - Reduce time per iteration
- Scalable for single-node with multiple GPUs
- Cannot scale to large cluster
- Asynchronous training (Google Downpour, Dean et
al., 2012, Hogwild!, Recht et al., 2011) - Reduce number of iterations per machine
- Scalable for big cluster with commodity
machine(CPU) - Not stable
- Hybrid frameworks
Design Goal II Scalability not just flexible,
but also efficient and adaptive to run different
training frameworks
10- SINGA
- A Distributed Deep Learning Platform
11Usability Abstraction
NeuralNet
stop
Layer
class Layer vectorltBlobgt data, grad
vectorltParamgt param ... void
Setup(LayerProto conf, vectorltLayergt src)
void ComputeFeature(int flag, vectorltLayergt
src) void ComputeGradient(int flag,
vectorltLayergt src) DriverRegisterLayerltFooL
ayergt("Foo") // register new layers
TrainOneBatch
Input layers load raw data (and label)
Output layers output feature (and prediction results)
Neuron layers transform features, e.g., convolution and pooling
Loss layers measure training loss, e.g., cross-entropy loss
Connection layers connect layers due to neural net partition
12Usability Neural Net Representation
NeuralNet
stop
Layer
TrainOneBatch
13Usability TrainOneBatch
NeuralNet
stop
Loss
Layer
labels
Hidden
TrainOneBatch
Input
Feedforward models (e.g., CNN)
Back-propagation (BP)
Contrastive Divergence (CD)
Just need to override the TrainOneBatch function
to implement other algorithms!
RBM
RNN
14Scalability Partitioning for Distributed Training
1
- NeuralNet Partitioning
- 1. Partition layers into different subsets
- 2. Partition each singe layer on batch dimension.
- 3. Partition each singe layer on feature
dimension. - 4. Hybrid partitioning strategy of 1, 2 and 3.
Worker 1
Worker 2
2
3
Users just need to CONFIGURE the partitioning
scheme and SINGA takes care of the real work
(eg. slice and connect layers)
Worker 2
Worker 1
Worker 2
Worker 1
Worker 1
15ScalabilityTraining Framework
Legends
Synchronous training cannot scale to large group
size
16ScalabilityTraining Framework
Legends
Communication is the bottleneck!
17ScalabilityTraining Frameworks
Legends
sync
async
SINGA is able to configure most known frameworks.
18Implementation
SINGA Software Stack
Legend
19Deep learning as a Service (DLaaS)
Third party APPs (Web app, Mobile,..) ------------
---------- API
Developers (Browser) ---------------------- GUI
http request
http request
Rafiki Server
User, Job, Model, Node Management
Data Base
File Storage System (e.g. HDFS)
Routing(Load balancing)
http request
http request
Rafiki Agent
Rafiki Agent
Timon (c wrapper)
Timon (c wrapper)
Timon (c wrapper)
Timon (c wrapper)
1. To improve the Usability of SINGA 2. To
level the playing field by taking care of
complex system plumbing work, its reliability,
efficiency and scalability.
SINGA
SINGA
SINGA
SINGA
SINGAs RAFIKI
20ComparisonFeatures of the Systems
MXNet on 28/09/15
Feature SINGA Caffe CXXNET cuda-convnet H2O
Deep Learning Models Feed-forward (CNN) ? ? ? ? MLP
Deep Learning Models Energy model (RBM) ? x x x x
Deep Learning Models Recurrent networks (RNN) ? ? x x x
Distributed Training Frameworks Synchronous ? ? ? ? ?
Distributed Training Frameworks Asynchronous ? ? x x x
Distributed Training Frameworks Hybrid ? x x x x
Hardware CPU ? ? ? x ?
Hardware GPU V0.2.0 ? ? ? x
Cloud Software HDFS ? x x x ?
Cloud Software Resource management ? x x x ?
Cloud Software Virtualization ? x x x ?
Binding Python (P), Matlab(M), R ongoing (P) PM P P PR
Comparison with other open source projects
21Experiment --- Usability
- Used SINGA to train three known models and verify
the results
Hinton, G. E. and Salakhutdinov, R. R.
(2006)Reducing the dimensionality of data with
neural networks.Science, Vol. 313. no. 5786, pp.
504 - 507, 28 July 2006.
RBM
Deep Auto-Encoders
22Experiment --- Usability
W. Wang, X. Yang, B. C. Ooi, D. Zhang, Y. Zhuang
Effective Deep Learning Based Multi-Modal
Retrieval. VLDB Journal - Special issue of
VLDB'14 best papers, 2015. W. Wang, B.C. Ooi, X.
Yang, D. Zhang, Y. Zhuang Effective MultiModal
Retrieval based on Stacked AutoEncoders. Int'l
Conference on Very Large Data Bases (VLDB), 2014.
Deep Multi-Model Neural Network
CNN
MLP
23Experiment --- Usability
Mikolov Tomá, Karafiát Martin, Burget Luká,
Èernocký Jan, Khudanpur Sanjeev Recurrent neural
network based language model, INTERSPEECH 2010),
Makuhari, Chiba, JP
24Experiment --- Efficiency and Scalability
Train DCNN over CIFAR10 https//code.google.com/p
/cuda-convnet
- Cluster
- Quad-core Intel Xeon 3.1 GHz CPU and 8GB memory,
1Gbps switch - 32 nodes, 4 workers per node
- Single Node
- 4 NUMA nodes (Intel Xeon 7540, 2.0GHz)
- Each node has 6 cores
- hyper-threading enabled
- 500 GB memory
Caffe, GTX 970
Synchronous
25Experiment --- Scalability
Train DCNN over CIFAR10 https//code.google.com/p
/cuda-convnet
Caffe
SINGA
Asynchronous
26Conclusions
- Programming Model, Abstraction, and System
Architecture - Easy to implement different models
- Flexible and efficient to run different
frameworks - Experiments
- Train models from different categories
- Scalability test for different training
frameworks - SINGA
- Usable, extensible, efficient and scalable
- Apache SINGA v0.1.0 has been released
- V0.2.0 (with GPU-CPU, DLaaS, more features) out
next month - Being used for healthcare analytics, product
search,
27Thank You! Acknowledgement Apache SINGA Team
(ASF mentors, contributors, committers, and
users) funding agencies (NRF, MOE, ASTAR)