Title: The Lepton jets Channel
1Measuring the Top Quark Mass with Neural Networks
Carlos Sanchez Ohio State University April 3,
2003 Seminar
- Introduction
- The Leptonjets Channel
- Mass Measurement
- Templates
- Standalone NN
- Gluon Radiation
- Run II
2The Standard Model
- Standard Model Particles
- quarks (u, c, t, d, s, b)
- leptons (e, ?, ?, ?e, ??, ?? )
- gauge bosons (g, ?, W?, Z0)
- Standard Model Interactions
- strong gluon g exchange
- weak W?, Z0 bosons exchange
- electromagnetic photon ? exchange
- Successes
- predicts a wide range of phenomena
- valid down to distances 10-18 m
- Unresolved issues
- EW symmetry breaking (Higgs?)
- fermion masses and mixing
- gravitational interaction
3Collider Detector at Fermilab
- Fermi National Accelerator Laboratory (Fermilab)
- Tevatron protons p on antiprotons?p at 1.8 TeV
(Run I 1992-1996) - Two collision detectors CDF and DØ
- Run II started March 2001 (by 2005 accumulate 20
times more data) - The CDF detector
- silicon layers
- b-jet identification
- central drift chamber
- electromagnetic calorimeters
- hadronic calorimeters
- muon chambers
4Top Production
- Top quarks are produced
- In pairs top-antitop (via strong interactions
, ) - Individually (via electroweak interactions
, ) - Huge amount of background
- Top-pairs in Run1
- 5 trillion collisions
- 50 million events written to tape (40 terabytes
of data) - 35 top-antitop events (in a sample of 76 events)
quark-antiquark annhilation (dominant at Tevatron)
gluon-gluon fusion (dominant at LHC)
5Top Decay
- The top quark decays into a W boson and a b quark
with a branching ratio of nearly 100. - The lifetime of the top is very small ( 5 x
10-25 sec). - The decays of the top quark are classified
according to the W boson decays - Hadronic both W bosons decay into a
quark-antiquark pair. - Dilepton both W bosons decay into a
lepton-neutrino pair. - Leptonjets one W decays into a quark-antiquark
pair while the other one decays into a
lepton-neutrino pair.
6Event Selection
- All events in the mass analysis must pass the
following cuts - an isolated lepton with PT gt 20 GeV/c
- missing ET gt 20 GeV
- at least three jets with ET gt 15.0 GeV and ? lt
2.0 - an additional jet with ET gt 15.0 GeV and ? lt
2.0 or ET gt 8.0 GeV and ? lt 2.4 - events that fall within the Z mass window and
dilepton events are removed - after the mass reconstruction is performed,
events are required to pass a goodness-of-fit
cut, ?2 lt 10.0 - We divide the top mass sample into four non-
overlapping subsamples (Take advantage of
different S/B ratios) - SVX Double Events with two SVX tags
- SVX Single Events with one and only one SVX tag
- SLT Events with one or two SLT tags, but no SVX
tags - No Tags 4 or more jets with ET gt 15.0 GeV and
? lt 2.0
7Top Mass Sample
- Run I events in the different mass subsamples
- Background processes
- Wmultijet, non-W events, mistags, single top
events, diboson events, and Drell Yan.
8Mass Reconstruction (1)
- The top mass is calculated by reconstructing the
4-momenta of the top decay particles. - The hypothesis of Standard Model ttbar production
process is
followed by the decays
The final assignment of the decay partons is
determined by the mass reconstruction algorithm
9Mass Reconstruction (2)
- There are many ways to combine the top decay
products to form the mass of the top quark. - The number of combinations is reduced if we use
b-tagging information. - We define a chi2 function based on a series of
energy and kinematic constraints to calculate the
top mass. - All possible combinations are used.
- We chose the reconstructed mass (Mrec) that
corresponds to the lowest chi2. - Get the correct combination 50 of the time in
Double SVX subsample. - Incorrect combinations still have info.
10Gaussian Gamma Templates
- We have generated a set of ttbar Monte Carlo
samples ranging from 120 to 230 GeV. - They tend to peak around their generated mass,
and they have asymmetric tails. - We should be able to fit all the templates to a
single function that only depends on the top mass
Mtop. - Finite number of MC events.
- Continuous form allows us to obtain the Mrec
distribution for any given Mtop. - We fit the signal templates with a combination of
a Gaussian and a Gamma function. - The background samples are fitted in a similar
way but, the mass dependence is removed.
SVX Single distributions
11Extracting Mtop
- Compare the shape of the data Mrec distribution
to Monte Carlo expectations - Continuous likelihood procedure is used to
extract Mtop. - It uses the functional forms for signal and
background. - In the fit, Mtop is the only free parameter and
the background fractions are constrained to be
within their expected values. - Median Mtop and error from 2000 pseudoexperiments
is
Mtop 175.1 ? 7.3 GeV/c2
12Improvements
- GaussianGamma fit is motivated by the shape of
the distributions. - Any function that can fit the templates can be
used. - Neural Networks are able to approximate any
function. - Do not need to make an a-priori decision of what
the underlying function describing the
distribution is. - Not limited to 1D distributions.
- Including more information to measure Mtop.
- There are other kinematic variables, which have
mass information. - Neural Network provides a simple an elegant way
of combining many variables into a single
analysis. - Takes into account correlations between the
different variables. - Classify events into top signal or background.
13Introduction to ANN
- ANN is a function of N variables
- Useful graphical representation
- All nodes above the input layer perform a simple
calculation
hidden
Output
- Architecture grid of nodes
- Weights connections between nodes
- Activation function g(x) non-linear
g(x) tanh(x)
w3
w1
Input1
Input2
Input3
14ANN Learning Rule
- A Neural Network has to be configured such that
the application of a set of input values produces
the desired output values. - Supervised learning uses learning samples to
train the network to perform a given task. - Neural Network learning rule
- Start with a random weights for the connections.
- Select an input vector from the learning samples.
- Modify all the weights so that the Neural Network
output is as close as the desired output as
possible. - Weights are modified by minimizing an error
function. - Back-propagation algorithm.
- Return to step 2.
15 Simple Example and problem
and problem
16ANN Performance
- Information
- Use the best variables available to train the
network. - Architecture
- Input and output nodes are determined by the
problem you want to solve. - and problem can be solved with 2 input nodes and
1 output node. - Hidden nodes
- Try different number and choose architecture that
results in the best performance. - Learning sample
- At least use 10 times the number of weights in
the network. - Number of iterations
- Make sure the network has been trained properly
- X2 method to decide when to stop the training
(function approximation). - Look at testing sample (pattern classification).
17ANN Template Fitter
- We use the MLPfit package to fit the Mrec
templates as a function of top mass, Mtop. - To solve this problem we need
- 2 input nodes
- One associated with Mrec
- One associated with Mtop
- 1 output node
- desired output value is set the number of events
in each bin. - The architecture chosen to fit the signal
distributions is 2-4-4-1. - To fit the background we choose an architecture
of 1-5-1. - No Mtop dependence
The Mrec distributions range from 80 GeV/c2 to
380 GeV/c2 and are divided into 5 GeV/c2 bins.
18Template Comparison
- We perform a ?2 fit to make sure both fitting
techniques are working correctly. - Individual ?2 fits to each mass template are also
lower for the NN fits than for the GG fits.
SVX Single distributions
19Pseudoexperiments
- Uses same continuous likelihood procedure
described previously. - MC results from 2000 simulated experiments are
shown below - Both fitting methods use exactly the same
templates. - NN fitting method gives a result which is 12
better than the GG fitting technique. - Applicable to any distribution.
20Mass Information
- Other analyses have shown that other kinematic
quantities have mass information. - A variables with good mass information will have
a small RMS/slope. - Narrow distributions.
- Large separation between means.
21HT Mass Analysis
- We can use a Neural Network to fit the HT
distributions. - We ran 2000 pseudoexperiments and we find
- How do we combine the Mrec and HT mass results?
- Simple if the variables are uncorrelated.
- Difficult to find a way to include correlations
into the likelihood. - Combining the results a-posteriori is not
straight forward. - Look at other options.
SVX Single distributions
Mtop 174.6 ? 11.3 GeV
22Standalone Neural Network (1)
- A Neural network provides a simple and elegant
way to add more information. - New variables are added as input nodes.
- Correlations between the variables are naturally
taken into account. - We want to train a NN to classify events into the
different top masses generated for this study as
well as background. - We have generated 23 different top samples. Our
NN has 24 outputs (23 signal 1 background). - We train the NN using the BFGS method.
- In training and testing, all 23 different top
mass samples (signal) as well as background are
used. - During the training, we set the output target
value for each class to 1.0. - In our 24-dimensional output space Mtop 175
GeV/c2 corresponds to (0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0). The
background target value is given by (0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1). - Output values can be interpreted as a-posteriori
Bayes probabilities for each class.
23Standalone Neural Network (2)
- We use the MLPfit package.
- We have generated a Neural Network for each of
the mass subsamples. - Training is stopped when the testing curve begins
to increase. - NN begins to learn the specific features of the
training sample. - The sum of the outputs should be equal to 1.
NN performance
24Extracting Mtop
- The NN output probabilities are used to construct
a discrete likelihood to extract Mtop. - Each point in the likelihood is associated with a
generated top mass. - Each point contains an admixture of signal and
background. - To account for asymmetric errors we fit the
log-likelihood with a 3rd degree polynomial. - Mtop is given by the minimum.
- Mtop errors are given by the 0.5 unit increase in
the log-likelihood. - Can we construct a continuous likelihood that
uses this information?
25Pseudoexperiments
- We ran 2000 pseudoexperiments with the Run I
statistics. - The Standalone NN technique does 30 better than
the GG-fitted template method. - The Standalone NN technique does 16 better than
the NN-fitted template technique.
26Systematic Errors
- Jet energy scale apply 1s and -1s shifts to the
jet momenta in ttbar signal and background. The
error is half the median between the -1s and 1s
distributions. - ISR/FSR turn off ISR in PYTHIA. For FSR we
choose only the events with 4 jets which are
uniquely matched to partons. We add the errors to
obtain the systematic uncertainty. - b-tagging create two samples with only fake SLT
tags or real SLT tags. The error is half the
difference. - PDF change PDF in PYTHIA to CTEQ3L.
- MC generators compare HERWIG samples to PYTHIA
templates.
27Gluon and Non-gluon Events
Non-gluon event all 4 highest ET jets matched to
MC partons.
Gluon event at least one of the highest ET jets
not matched to a MC parton.
lepton
neutrino
28Gluon Effect on Top Mass
- Use subsamples with the smallest amount of
background. - From our Monte Carlo sample we see
- 52.7 ? 0.66 of the events in the single
SVX-tagged sample contain at least one gluon
among their 4 highest ET jets. - 48.7 ? 1.1 of the events in the double
SVX-tagged sample contain gluons. - We construct mass templates, which are divided
into two sets - Gluon templates only contain events with at
least one gluon jet among their 4 highest ET
jets. - Non-Gluon templates only contain events in which
the 4 highest ET jets have been uniquely matched
to the top decay partons. - There are features of the mass analysis from Run
I consistent with the data containing less gluon
than the Monte Carlo predicts. - Would a better knowledge of the gluon content in
the mass sample lead to a better top mass
measurement?
29Templates
- Gluon events will be mismeasured.
- Gluon templates peak at a lower mass than the
non-gluon distributions. - Non-gluon templates show better separation that
gluon templates do. - Non-gluon templates are narrower that the
templates containing gluon events. - We look at RMS/slope for the different templates
- Events which do not contain gluon jets will
provide a better top mass measurement.
30Pseudoexperiments
- We have generated a new set of templates for
which we vary the percentage of gluon events. - The new templates have 20, 40, 60 and 80
gluon events in them. - We ran a series of pseudoexperiments in which we
draw the events from the above templates and
compare them to the default Monte Carlo. - The results from the Single SVX subsample are
given below.
31NN Input Variables
- We want to develop a Neural Network that can
distinguish gluon events from non-gluon ones. - This study uses the following three variables
- Di-jet invariant mass in the case of single
tagged events there are 3 ways you can combine
the untagged jets to form the di-jet mass. We
only use 2 of them since the one constructed from
the two least energetic jets differs very little.
- Number of extra jets this is the number of jets
with ETgt 8.0 and ? lt 2.4 besides our
four highest jets. - X2 this is a goodness-of-fit parameter returned
by MINUIT after we reconstruct the top mass for a
given event. - All these variables have good gluon content
information. - A Neural Network is well suited for this analysis
since it provides a natural way of combining all
of the variables.
32NN Input Variables
Single SVX subsample
Double SVX subsample
33Neural Network
- We are using the JETNET subroutines interfaced to
ROOT via the Root_Jetnet package. - We have two separate Neural Networks
- NN_1SVX (4-8-1) the Single SVX NN has four input
variables. - NN_2SVX (3-6-1)
- We trained the Neural Network using two different
MC samples one containing only gluon events, and
the other on without them. - During training, the desired NN output for
non-gluon events was set to 1, while the output
corresponding to gluon events was set to 0.
34Run I and Run II
Pseudoexperiments
- We perform 5000 pseudo-experiments drawing events
from our Monte Carlo samples following the shape
of the distributions. - After our pseudo-experiments we find
- The average statistical uncertainty returned by
the fitter for the gluon and non-gluon content is
19.1 and 17.5 respectively. - The median statistical uncertainty for Run II
will be 5.6 for the gluon content and 4.3 for
the non-gluon content.
35Data-like Templates
- We apply our fitting technique to the actual Run
I data and we find - The non-gluon content in the SVX Run I data is
higher than what we expect from the Monte Carlo. - The error in the measurement are rather large
(roughly 20). - Use the mean of the gluon content measurement to
construct data-like templates. - Results from MC pseudoexperiments
36Run II Mass Analysis
- The biggest gain in the Standalone NN method
comes from the combination of HT and Mrec. - We propose the use of a NN to fit a series of 2-D
histograms of Mrec versus HT. - High number of statistics.
- Adaptive binning.
- These 2-D surfaces would be used much like the
Mrec templates in the Standard mass analysis. - We would use the continuous log likelihood
approach to calculate the top mass.
Single SVX distributions
Mrec Vs. HT surfaces generated with NN functional
form.
37Run II Expectations
- The different methods used to obtain the mass of
the top quark are presented. - Using the 2D surfaces gives
- a result which is 30 better than using the Mrec
templates fitted with a GaussianGamma functions. - a result which is 11 better than using the Mrec
templates fitted with a Neural Network. - The goal for Run II is to measure Mtop with an
error less than 2.0 GeV/c2. - Systematic error will dominate the error in the
top mass.
Statistical error only
38Conclusions
- NN template fitter
- Provides a better measurement that the previous
GaussianGamma fitting technique. - Applicable to any set of kinematic distributions.
- Standalone NN
- Combines different information to obtain a better
top mass estimate. - 2D NN fit gives similar results.
- The techniques presented here will help improve
the top mass measurement in Run II. - Although these multivariate techniques were
developed for the top mass analysis they can be
applied to many different physical processes - single top, Higgs boson, Supersymmetry, etc.