Linear Prediction Filters and Neural Networks - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Linear Prediction Filters and Neural Networks

Description:

Artificial NNs excel at quantifying. Biological NNs excel at qualifying. NN ... Comparing the recovery phase (VBs = 0) to the main phase (VBs = 5 mV/m), we see ... – PowerPoint PPT presentation

Number of Views:215
Avg rating:3.0/5.0
Slides: 32
Provided by: PaulO84
Category:

less

Transcript and Presenter's Notes

Title: Linear Prediction Filters and Neural Networks


1
Linear Prediction Filters and Neural Networks
  • Paul OBrien
  • ESS 265, UCLA
  • February 8, 1999

2
Outline
  • Linear Prediction Filters
  • Uses
  • System Description
  • Time Evolution
  • Greens Functions
  • Construction
  • Filter Analysis
  • Examples
  • Dst-DH
  • Dst-VBs
  • Neural Networks
  • Uses
  • System Description
  • Time Evolution
  • Theory
  • Training
  • Network Analysis
  • Examples
  • Dst-DH
  • Dst-VBs

3
Linear Prediction Filter Uses
  • Map one Input (or many) to one Output
  • Convert Dst to Single Ground-station DH
  • Make a Forecast
  • Convert Solar Wind Measurements To Geomagnetic
    Indices
  • Determine System Dynamics
  • Use Impulse Response To Determine Underlying
    Ordinary Differential Equation

4
What an LPF Looks Like
  • An LPF can have an autoregressive (AR) part and a
    moving average (MA) part
  • AR part describes internal dynamics
  • MA part describes external dynamics
  • Ambiguity occurs when used separately

5
Other Names for LPFs
  • Convolution Filter
  • Moving Average
  • Infinite Impulse Response Filter (IIR)
  • Finite Impulse Response Filter (FIR)
  • Recursive Filter
  • ARMA Filter
  • LPFs are a subset of Linear Filters which relate
    the Fourier spectra of two signals

6
MA Filters Are Greens Functions
  • Ordinary Differential Equations Can Be Solved
    with Greens Functions

7
AR Filters Are Differential Equations
  • Ordinary Differential Equations Can Be Rewritten
    As AR Filters

8
Determining LPF Coefficients
  • The as and bs are found by solving an
    overdetermined matrix equation

Often tk t0kDt
Solved using Least Squared Error Optimization or
Singular Value Decomposition
9
More on Linear Filters
  • The Linear Filter formalism can be extended to
    multiple inputs
  • There is an ambiguity between the MA and AR parts
    for certain kinds of differential equations. An
    ARMA filter can greatly simplify some MA filters

10
Handling Data Gaps
  • Missing Inputs
  • Omit intervals with data gaps, if possible
  • Interpolation over data gaps will smear out MA
    coefficients
  • Missing Outputs
  • Omit intervals with data gaps
  • Interpolation over data gaps can ruin the AR
    coefficients
  • Less sensitive to interpolation in lag outputs

11
Linear Filter Analysis
  • Once the filter coefficients are determined, we
    relate them to ODEs

12
LPF Localization
  • Localization adds Nonlinearity
  • Nonlocal LPFs cannot handle nonlinear dynamics
  • Localize in Time
  • Continuously reconstruct LPF based only on most
    recent data
  • Can be very accurate but hard to interpret
  • Localize in State Space
  • Construct different LPF for each region of state
    space
  • Can provide multivariate nonlinearity

13
LPF Example Dst-DH
  • Dst is a weighted average of DH measured at 4
    stations around the globe.
  • Depending on activity and the local time of a
    station, there is a different relationship
    between Dst and DH
  • We will localize this filter in 24 1-hour bins of
    local time
  • The filter will be very simple DH(t) b0(lt)
    Dst(t)
  • lt is the local time of the station at time t
  • We solve this equation in each 1-hour bin of
    local time
  • By plotting b0 vs lt, we can infer the local
    current systems
  • This local current system is believed to be the
    partial ring current

14
Partial Ring Current
DHSJG-Dst 1-Point Filter by Local Time
1.2
1
1/b0
0.8
Local Dusk
0.6
0
4
8
12
16
20
24
Local Time
  • Localized Filter DH b0(lt)Dst
  • Dst is less intense than DH near Dusk due to an
    enhanced Ring Current in the Dusk sector

15
LPF Example Dst-VBs
  • The Ring Current (Dst) is largely driven by the
    solar wind
  • One good coupling function for the solar wind
    driver is the interplanetary electric field (VBs)
  • We construct a long MA filter

16
Dst-VBs Filter Coefficients
  • Note the roughly exponential decay
  • The differential equation could be
  • We can, therefore, build a trivial ARMA filter to
    do the same job

17
Neural Network Uses
  • Map multiple Inputs to one Output (or many)
  • Excellent nonlinear interpolation but not
    reliable for extrapolation
  • Make a forecast
  • Excellent for forecasting complex phenomena
  • Determine System Dynamics
  • NN is a black box model of the system
  • Run NN on simulated data to isolate system
    response to individual inputs
  • Many exotic NNs exist to perform other tasks

18
NN Theory
  • Based on biological neural systems
  • Biological NN composed of connections between
    individual neurons
  • Artificial NN composed of weights between
    perceptrons
  • We dont know exactly how biological neural
    systems learn, so we have made some approximate
    training schemes
  • Artificial NNs excel at quantifying
  • Biological NNs excel at qualifying

19
NN Topology
  • A standard Feed Forward NN has no recursive
    connections
  • Arrows represent weights (w,v) and biases (b,c)
  • hi and Oi are perceptrons
  • Typically only one hidden layer is necessary
  • More hidden units allow for more complexity in
    fitting (not always good)
  • Nonlinearity is achieved through an activation
    function
  • tanh(x) or 1e-x-1

wi
nij
  • This is equivalent to an MA Filter
  • AR behavior can be achieved through recurrence

20
NN Recurrence
Pseudo-Input
Pseudo-Input
  • An Output Recurrent network is useful when O(t)
    depends on O(t-Dt)
  • The recurrence is usually only implicit during
    training I3 is taken from actual data rather
    than previous O1
  • An Elman Network is useful when O(t) depends on
    the time history of the Inputs
  • This makes training rather difficult
  • Continuous time series are needed
  • Batch optimization is impossible

21
NN Training Theory
  • A NN is initialized with a random set of weights
  • It is trained to find the optimal weights
  • Iteratively adjust the weights
  • Target least squared error for most data
  • Target relative error for data with rare large
    events
  • There must be far more training samples than
    there are weights
  • At least a factor of 10
  • The goal is to achieve a fit which will work well
    out of sample as well as in sample
  • Poor out of sample performance is a result of
    overfitting

22
Gradient Descent
  • Simplest non-linear optimization
  • Weights are corrected in steps down the error
    gradient
  • A learning rate h is used to ensure smooth
    convergence to error minimum
  • Descent can be stabilized by adding momentum m,
    which recalls DW(s-1) from last step
  • m and h should be between 0 and 1, sometimes
    functions of s
  • For recurrent networks, gradient descent should
    be done serially for each tk
  • This type of Gradient Descent replaces
    Backpropagation

NN Output
W is a vector holding all the NN weights and
biases
23
Levenberg-Marquardt Training
  • LM training is much faster than gradient descent
  • LM training is not very appropriate for recurrent
    networks
  • Algorithm
  • 1. Increase m until a step can be taken without
    increasing the error
  • 2. Decrease m while error decreases with each
    step
  • 3. Repeat 1 2 until m exceeds threshold or
    other training limit is met

NN Output
LM is based on Newtons Method, but, when m is
large, LM training becomes gradient descent
24
NN Generalization
  • Overfitting can render NNs useless
  • Always reserve some of your training data
    (20-50) for out-of-sample testing
  • Identical network topologies can perform
    differently, depending on their initial weights
    (assigned randomly)
  • Train several networks (5) and keep the one with
    the best out-of-sample performance
  • Starve the NN by reducing the number of hidden
    units until fit quality begins to plummet

25
NN Analysis
  • The quality of NN an output is dependent on the
    training set density of points near the
    associated inputs
  • Always plot histograms of the input output
    parameters in the training set to determine the
    high training density region
  • Regions of input space which are sparsely
    populated are not well determined in the NN and
    may exhibit artificial behavior
  • Try several different input combinations
  • Indicates what influences the system
  • Analyzing weights directly is nearly impossible
  • Instead, we run the trained NN on artificial
    inputs so that we can isolate the effects of a
    single variable on the multivariate system
  • Vary one input (or two) while holding all other
    inputs constant
  • Simulate a square-wave input for a time series
    (pseudo-impulse)
  • To identify real and artificial behavior, plot
    training points in the neighborhood of the
    simulated data to see what the NN is fitting

26
NN Example Dst-DH
  • We will repeat the Dst- DH analysis nonlinearly
  • Inputs Dst, VBs, sin(wlt), cos(wlt), Output DH
  • Train with Levenberg-Marquardt
  • The VBs input allows us to specify the intensity
    of (partial) ring current injection
  • The lt inputs allow us to implicitly localize the
    NN
  • By plotting DH vs lt for fixed values of Dst and
    VBs, we can infer the local current systems and
    determine what role the VBs electric field plays
  • By using an NN instead of an LPF, we add
    non-linear interpolation at the expense of linear
    extrapolation

27
Partial Ring Current (2)
  • We ran the NN on simulated data
  • (Dst,VBs,lt) (-10010,0,5,018)
  • Psw was constant at 3 nPa
  • We plot the DH-Dst relationship at constant lt
    ,VBs, and Psw to see its characteristics
  • A localized current system is creating an
    asymmetry (Dawn-Dusk)
  • Otherwise, the relationship is linear
  • Comparing the recovery phase (VBs 0) to the
    main phase (VBs 5 mV/m), we see that at larger
    Dst, the local-time asymmetry is weaker for the
    recovery phase than for the main phase
  • It is generally impossible to make direct
    measurements of the local time DH-Dst
    relationship at fixed VBs

28
NN Example Dst-VBs
  • There is some speculation that the dynamics of
    the ring current are nonlinear try an NN!
  • Inputs Dst(t-1),VBs(t),Psw(t-1),Psw(t)
  • Output Dst(t)
  • Dst(t-1) provides implicit output recurrence
  • We can still train with Levenberg-Marquardt!
  • VBs(t) provides the new injection
  • Psw allows the NN to remove the magnetopause
    contamination

29
Dst-VBs NN Analysis
  • We ran the network on simulated data
  • (Dst,VBs) (-2000,07)
  • Psw was constant at 3 nPa
  • DDst Dst(t1)-Dst(t)
  • The phase-space trajectories are very linear in
    the HTD area
  • Curvature outside of the HTD area may be
    artificial
  • Note how VBs affects the trajectories
  • The dynamic equation is

30
Summary
  • Linear Prediction Filters
  • Uses
  • System Description
  • Time Evolution
  • Construction
  • Choose AR, MA or ARMA
  • Least Squares Solution of Overdetermined Matrix
  • Localize for Nonlinearity
  • Filter Analysis
  • Greens Function Analogue
  • Local Filters Reveal Local Processes
  • Neural Networks
  • Uses
  • System Description
  • Time Evolution
  • Theory
  • Based on Biology
  • Training
  • Iterative Adjustment of Weights
  • Network Analysis
  • Consider Training Density
  • Run on Simulated Data
  • Examine Phase Space

31
Further Reading
  • All you ever wanted to know about LPFs in Space
    Physics Solar Wind-Magnetosphere Coupling
    Proceedings, (Kamide and Slavin eds) Terra
    Scientific Publishing, 1986. Articles by Clauer
    (p. 39), McPherron et al. (p. 93), and Fay et al
    (p. 111)
  • An Excellent Source for all kinds of NN stuff is
    the Matlab Neural Network Toolbox Users Guide or
    Neural Network Design by Beale, Hagan, Demuth.
    PWS Publishers 1995
  • Learning internal representations by error
    propagation by Rumelhart, Hinton, Williams in
    Parallel Data Processing, p. 318, Vol. 1, Ch 8.
    MIT Press (Rumelhart McClelland eds)
  • Proceedings of the International Workshop on
    Artificial Intelligence Applications in
    Solar-Terrestrial Physics 1993 and 1997 meetings
Write a Comment
User Comments (0)
About PowerShow.com