The Lepton jets Channel

About This Presentation

Title:

The Lepton jets Channel

Description:

Measuring the Top Quark Mass with Neural Networks. Carlos ... ANN Template Fitter ... statistical uncertainty returned by the fitter for the gluon and non-gluon ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 39

Provided by: andyr1

Category:

more less

Transcript and Presenter's Notes

Title: The Lepton jets Channel

1
Measuring the Top Quark Mass with Neural Networks
Carlos Sanchez Ohio State University April 3,
2003 Seminar

Introduction
The Leptonjets Channel
Mass Measurement
Templates
Standalone NN
Gluon Radiation
Run II

2
The Standard Model

Standard Model Particles
quarks (u, c, t, d, s, b)
leptons (e, ?, ?, ?e, ??, ?? )
gauge bosons (g, ?, W?, Z0)
Standard Model Interactions
strong gluon g exchange
weak W?, Z0 bosons exchange
electromagnetic photon ? exchange
Successes
predicts a wide range of phenomena
valid down to distances 10-18 m
Unresolved issues
EW symmetry breaking (Higgs?)
fermion masses and mixing
gravitational interaction

3
Collider Detector at Fermilab

Fermi National Accelerator Laboratory (Fermilab)
Tevatron protons p on antiprotons?p at 1.8 TeV
(Run I 1992-1996)
Two collision detectors CDF and DØ
Run II started March 2001 (by 2005 accumulate 20
times more data)
The CDF detector
silicon layers
b-jet identification
central drift chamber
electromagnetic calorimeters
hadronic calorimeters
muon chambers

4
Top Production

Top quarks are produced
In pairs top-antitop (via strong interactions
, )
Individually (via electroweak interactions
, )
Huge amount of background
Top-pairs in Run1
5 trillion collisions
50 million events written to tape (40 terabytes
of data)
35 top-antitop events (in a sample of 76 events)

quark-antiquark annhilation (dominant at Tevatron)
gluon-gluon fusion (dominant at LHC)
5
Top Decay

The top quark decays into a W boson and a b quark
with a branching ratio of nearly 100.
The lifetime of the top is very small ( 5 x
10-25 sec).
The decays of the top quark are classified
according to the W boson decays
Hadronic both W bosons decay into a
quark-antiquark pair.
Dilepton both W bosons decay into a
lepton-neutrino pair.
Leptonjets one W decays into a quark-antiquark
pair while the other one decays into a
lepton-neutrino pair.

6
Event Selection

All events in the mass analysis must pass the
following cuts
an isolated lepton with PT gt 20 GeV/c
missing ET gt 20 GeV
at least three jets with ET gt 15.0 GeV and ? lt
2.0
an additional jet with ET gt 15.0 GeV and ? lt
2.0 or ET gt 8.0 GeV and ? lt 2.4
events that fall within the Z mass window and
dilepton events are removed
after the mass reconstruction is performed,
events are required to pass a goodness-of-fit
cut, ?2 lt 10.0
We divide the top mass sample into four non-
overlapping subsamples (Take advantage of
different S/B ratios)
SVX Double Events with two SVX tags
SVX Single Events with one and only one SVX tag
SLT Events with one or two SLT tags, but no SVX
tags
No Tags 4 or more jets with ET gt 15.0 GeV and
? lt 2.0

7
Top Mass Sample

Run I events in the different mass subsamples
Background processes
Wmultijet, non-W events, mistags, single top
events, diboson events, and Drell Yan.

8
Mass Reconstruction (1)

The top mass is calculated by reconstructing the
4-momenta of the top decay particles.
The hypothesis of Standard Model ttbar production
process is

followed by the decays

The final assignment of the decay partons is
determined by the mass reconstruction algorithm
9
Mass Reconstruction (2)

There are many ways to combine the top decay
products to form the mass of the top quark.
The number of combinations is reduced if we use
b-tagging information.
We define a chi2 function based on a series of
energy and kinematic constraints to calculate the
top mass.
All possible combinations are used.
We chose the reconstructed mass (Mrec) that
corresponds to the lowest chi2.
Get the correct combination 50 of the time in
Double SVX subsample.
Incorrect combinations still have info.

10
Gaussian Gamma Templates

We have generated a set of ttbar Monte Carlo
samples ranging from 120 to 230 GeV.
They tend to peak around their generated mass,
and they have asymmetric tails.
We should be able to fit all the templates to a
single function that only depends on the top mass
Mtop.
Finite number of MC events.
Continuous form allows us to obtain the Mrec
distribution for any given Mtop.
We fit the signal templates with a combination of
a Gaussian and a Gamma function.
The background samples are fitted in a similar
way but, the mass dependence is removed.

SVX Single distributions
11
Extracting Mtop

Compare the shape of the data Mrec distribution
to Monte Carlo expectations
Continuous likelihood procedure is used to
extract Mtop.
It uses the functional forms for signal and
background.
In the fit, Mtop is the only free parameter and
the background fractions are constrained to be
within their expected values.
Median Mtop and error from 2000 pseudoexperiments
is

Mtop 175.1 ? 7.3 GeV/c2
12
Improvements

GaussianGamma fit is motivated by the shape of
the distributions.
Any function that can fit the templates can be
used.
Neural Networks are able to approximate any
function.
Do not need to make an a-priori decision of what
the underlying function describing the
distribution is.
Not limited to 1D distributions.
Including more information to measure Mtop.
There are other kinematic variables, which have
mass information.
Neural Network provides a simple an elegant way
of combining many variables into a single
analysis.
Takes into account correlations between the
different variables.
Classify events into top signal or background.

13
Introduction to ANN

ANN is a function of N variables
Useful graphical representation
All nodes above the input layer perform a simple
calculation

hidden
Output

Architecture grid of nodes
Weights connections between nodes
Activation function g(x) non-linear

g(x) tanh(x)
w3
w1
Input1
Input2
Input3
14
ANN Learning Rule

A Neural Network has to be configured such that
the application of a set of input values produces
the desired output values.
Supervised learning uses learning samples to
train the network to perform a given task.
Neural Network learning rule
Start with a random weights for the connections.
Select an input vector from the learning samples.
Modify all the weights so that the Neural Network
output is as close as the desired output as
possible.
Weights are modified by minimizing an error
function.
Back-propagation algorithm.
Return to step 2.

15
Simple Example and problem
and problem
16
ANN Performance

Information
Use the best variables available to train the
network.
Architecture
Input and output nodes are determined by the
problem you want to solve.
and problem can be solved with 2 input nodes and
1 output node.
Hidden nodes
Try different number and choose architecture that
results in the best performance.
Learning sample
At least use 10 times the number of weights in
the network.
Number of iterations
Make sure the network has been trained properly
X2 method to decide when to stop the training
(function approximation).
Look at testing sample (pattern classification).

17
ANN Template Fitter

We use the MLPfit package to fit the Mrec
templates as a function of top mass, Mtop.
To solve this problem we need
2 input nodes
One associated with Mrec
One associated with Mtop
1 output node
desired output value is set the number of events
in each bin.
The architecture chosen to fit the signal
distributions is 2-4-4-1.
To fit the background we choose an architecture
of 1-5-1.
No Mtop dependence

The Mrec distributions range from 80 GeV/c2 to
380 GeV/c2 and are divided into 5 GeV/c2 bins.
18
Template Comparison

We perform a ?2 fit to make sure both fitting
techniques are working correctly.
Individual ?2 fits to each mass template are also
lower for the NN fits than for the GG fits.

SVX Single distributions
19
Pseudoexperiments

Uses same continuous likelihood procedure
described previously.
MC results from 2000 simulated experiments are
shown below
Both fitting methods use exactly the same
templates.
NN fitting method gives a result which is 12
better than the GG fitting technique.
Applicable to any distribution.

20
Mass Information

Other analyses have shown that other kinematic
quantities have mass information.
A variables with good mass information will have
a small RMS/slope.
Narrow distributions.
Large separation between means.

21
HT Mass Analysis

We can use a Neural Network to fit the HT
distributions.
We ran 2000 pseudoexperiments and we find
How do we combine the Mrec and HT mass results?
Simple if the variables are uncorrelated.
Difficult to find a way to include correlations
into the likelihood.
Combining the results a-posteriori is not
straight forward.
Look at other options.

SVX Single distributions
Mtop 174.6 ? 11.3 GeV
22
Standalone Neural Network (1)

A Neural network provides a simple and elegant
way to add more information.
New variables are added as input nodes.
Correlations between the variables are naturally
taken into account.
We want to train a NN to classify events into the
different top masses generated for this study as
well as background.
We have generated 23 different top samples. Our
NN has 24 outputs (23 signal 1 background).
We train the NN using the BFGS method.
In training and testing, all 23 different top
mass samples (signal) as well as background are
used.
During the training, we set the output target
value for each class to 1.0.
In our 24-dimensional output space Mtop 175
GeV/c2 corresponds to (0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0). The
background target value is given by (0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1).
Output values can be interpreted as a-posteriori
Bayes probabilities for each class.

23
Standalone Neural Network (2)

We use the MLPfit package.
We have generated a Neural Network for each of
the mass subsamples.
Training is stopped when the testing curve begins
to increase.
NN begins to learn the specific features of the
training sample.
The sum of the outputs should be equal to 1.

NN performance
24
Extracting Mtop

The NN output probabilities are used to construct
a discrete likelihood to extract Mtop.
Each point in the likelihood is associated with a
generated top mass.
Each point contains an admixture of signal and
background.
To account for asymmetric errors we fit the
log-likelihood with a 3rd degree polynomial.
Mtop is given by the minimum.
Mtop errors are given by the 0.5 unit increase in
the log-likelihood.
Can we construct a continuous likelihood that
uses this information?

25
Pseudoexperiments

We ran 2000 pseudoexperiments with the Run I
statistics.
The Standalone NN technique does 30 better than
the GG-fitted template method.
The Standalone NN technique does 16 better than
the NN-fitted template technique.

26
Systematic Errors

Jet energy scale apply 1s and -1s shifts to the
jet momenta in ttbar signal and background. The
error is half the median between the -1s and 1s
distributions.
ISR/FSR turn off ISR in PYTHIA. For FSR we
choose only the events with 4 jets which are
uniquely matched to partons. We add the errors to
obtain the systematic uncertainty.
b-tagging create two samples with only fake SLT
tags or real SLT tags. The error is half the
difference.
PDF change PDF in PYTHIA to CTEQ3L.
MC generators compare HERWIG samples to PYTHIA
templates.

27
Gluon and Non-gluon Events
Non-gluon event all 4 highest ET jets matched to
MC partons.
Gluon event at least one of the highest ET jets
not matched to a MC parton.
lepton
neutrino
28
Gluon Effect on Top Mass

Use subsamples with the smallest amount of
background.
From our Monte Carlo sample we see
52.7 ? 0.66 of the events in the single
SVX-tagged sample contain at least one gluon
among their 4 highest ET jets.
48.7 ? 1.1 of the events in the double
SVX-tagged sample contain gluons.
We construct mass templates, which are divided
into two sets
Gluon templates only contain events with at
least one gluon jet among their 4 highest ET
jets.
Non-Gluon templates only contain events in which
the 4 highest ET jets have been uniquely matched
to the top decay partons.
There are features of the mass analysis from Run
I consistent with the data containing less gluon
than the Monte Carlo predicts.
Would a better knowledge of the gluon content in
the mass sample lead to a better top mass
measurement?

29
Templates

Gluon events will be mismeasured.
Gluon templates peak at a lower mass than the
non-gluon distributions.
Non-gluon templates show better separation that
gluon templates do.
Non-gluon templates are narrower that the
templates containing gluon events.
We look at RMS/slope for the different templates
Events which do not contain gluon jets will
provide a better top mass measurement.

30
Pseudoexperiments

We have generated a new set of templates for
which we vary the percentage of gluon events.
The new templates have 20, 40, 60 and 80
gluon events in them.
We ran a series of pseudoexperiments in which we
draw the events from the above templates and
compare them to the default Monte Carlo.
The results from the Single SVX subsample are
given below.

31
NN Input Variables

We want to develop a Neural Network that can
distinguish gluon events from non-gluon ones.
This study uses the following three variables
Di-jet invariant mass in the case of single
tagged events there are 3 ways you can combine
the untagged jets to form the di-jet mass. We
only use 2 of them since the one constructed from
the two least energetic jets differs very little.
Number of extra jets this is the number of jets
with ETgt 8.0 and ? lt 2.4 besides our
four highest jets.
X2 this is a goodness-of-fit parameter returned
by MINUIT after we reconstruct the top mass for a
given event.
All these variables have good gluon content
information.
A Neural Network is well suited for this analysis
since it provides a natural way of combining all
of the variables.

32
NN Input Variables
Single SVX subsample
Double SVX subsample
33
Neural Network

We are using the JETNET subroutines interfaced to
ROOT via the Root_Jetnet package.
We have two separate Neural Networks
NN_1SVX (4-8-1) the Single SVX NN has four input
variables.
NN_2SVX (3-6-1)
We trained the Neural Network using two different
MC samples one containing only gluon events, and
the other on without them.
During training, the desired NN output for
non-gluon events was set to 1, while the output
corresponding to gluon events was set to 0.

34
Run I and Run II
Pseudoexperiments

We perform 5000 pseudo-experiments drawing events
from our Monte Carlo samples following the shape
of the distributions.
After our pseudo-experiments we find
The average statistical uncertainty returned by
the fitter for the gluon and non-gluon content is
19.1 and 17.5 respectively.
The median statistical uncertainty for Run II
will be 5.6 for the gluon content and 4.3 for
the non-gluon content.

35
Data-like Templates

We apply our fitting technique to the actual Run
I data and we find
The non-gluon content in the SVX Run I data is
higher than what we expect from the Monte Carlo.
The error in the measurement are rather large
(roughly 20).
Use the mean of the gluon content measurement to
construct data-like templates.
Results from MC pseudoexperiments

36
Run II Mass Analysis

The biggest gain in the Standalone NN method
comes from the combination of HT and Mrec.
We propose the use of a NN to fit a series of 2-D
histograms of Mrec versus HT.
High number of statistics.
Adaptive binning.
These 2-D surfaces would be used much like the
Mrec templates in the Standard mass analysis.
We would use the continuous log likelihood
approach to calculate the top mass.

Single SVX distributions
Mrec Vs. HT surfaces generated with NN functional
form.
37
Run II Expectations

The different methods used to obtain the mass of
the top quark are presented.
Using the 2D surfaces gives
a result which is 30 better than using the Mrec
templates fitted with a GaussianGamma functions.
a result which is 11 better than using the Mrec
templates fitted with a Neural Network.
The goal for Run II is to measure Mtop with an
error less than 2.0 GeV/c2.
Systematic error will dominate the error in the
top mass.

Statistical error only
38
Conclusions

NN template fitter
Provides a better measurement that the previous
GaussianGamma fitting technique.
Applicable to any set of kinematic distributions.
Standalone NN
Combines different information to obtain a better
top mass estimate.
2D NN fit gives similar results.
The techniques presented here will help improve
the top mass measurement in Run II.
Although these multivariate techniques were
developed for the top mass analysis they can be
applied to many different physical processes
single top, Higgs boson, Supersymmetry, etc.