Title: SUSY 1lepton background multidimensional fits CSC Note 1
1SUSY 1-lepton background multidimensional fits
CSC Note 12 ATLAS SUSY WG
- A. Koutsman W. Verkerke, NIKHEF
- 23 augustus 2007
2One lepton mode SUSY
1 fb-1
Effective mass (GeV)
- Dominant backgrounds
- Top pair
- Wjets
- QCD
- Zjets
- GOAL estimate and understand backgrounds from
data - TARGET Develop methods to discover/exclude SUSY
with 1 fb-1
3Multidimensional method
- MT Method
- extrapolate Wjets/ttbar bkg from control region
(low MT) to signal region (high MT) - Main Idea Improve MT method
- Try to use additional observables for
extrapolation (e.g. mtop) - Explicitly account for SUSY contamination in
control region
Overestimated by factor 2.5
Key issues to understand - Amount of
correlations between observables and type
of correlation - Amount and shape of SUSY
in control region
4Fitting the background w/o correlations
- In absence of correlations, we can construct
relatively simple multi-dimensional models to
describe background data - E.g. Pttbar(MT,ET,mtop) P1(MT)?P2(ET)?P3(mtop)
- New observable reconstructed hadronic top mass
mtop - Defined as invariant mass of 3 jet system with
highest sum pT - Next step Write model that describes combined
background in control region and use that to
extrapolate to signal region - Ptotal(MT,ET,mtop) Ntt1l Ptt1l(MT,ET,mtop)
Ntt2l
Ptt2l(MT,ET,mtop)
Nwnj Pwnj(MT,ET,mtop)
Nsusy Psusy(MT,ET,mtop) (Ansatz model) - Idea Hope for improved determination of SM
backgrounds due to - Additional observables used in procedure
- Generic SUSY component included in fit to account
for non-zero SUSY contamination in control region
5First iteration of combined background fit
- Start out with simplest exercise Shapes of
components fixed - Determined from fits to individual background MC
samples - Shapes chosen for various
- backgrounds
- TTbar Semileptonic
- (11214110 parameters)
- exponential in missing ET
- exponentialgauss in mtrans
- landaugaus in mtop
- TTbar Dileptonic
- (1225 parameters)
- exponential in missing ET
- gauss in mtans
- landau in mtop
- Wjets
- (112127 parameters)
- exponential in missing ET
- exponentialgauss in mtrans
- landau in mtop
TTbar Semileptonic
TTbar Dileptonic
Wjets
6First iteration of combined background fit w/o
SUSY
- Now fit model for combined background with fixed
shapes to mix of background samples and see if - We have enough information in fit to constrain
various fractions - If we find back the fractions of background that
went into the fit (no bias etc) - Fits on 1 fb-1 of data
Fit Truth Ndi
123 17 141 Nsemi 567
40 578 Nwjets 168 35
140
OK!
7Next iteration of combined background fit
- Include generic SUSY contribution in fit (flat in
ET, gentle slope in MT, - landau in mtop) and fit to data with SUSY
SU3 contamination - Combined fit with SUSY on 1 fb-1 of data
Fit Truth Ndi
146 25 141 Nsemi 557
43 578 Nwjets 168 43
140 Nsu3 216 24 228
OK
8Combined background fit cross checks
- Cross check 1 fit model with floating SUSY
component to data w/o SUSY - Cross check 2 fit model w/o SUSY component to
data with SUSY contamination
Fit
Truth Ndi 125 18
141 Nsemi 565 40
578 Nwjets 172 35
140 Nsu3 -3 4.7 0
OK
Fit
Truth Ndi 331 26
141 Nsemi 427 37
578 Nwjets 329 37
140 Nsu3 0(fixed) 228
OK
9How well does the generic SUSY shape work?
- Are we sure the fit is not biased? Run fit 1000
times on toy MC samples drawn from combined
background p.d.f. fitted to MC-data and look at
pull distributions - In the fit we have taken a generic shape for SUSY
component. How well does it portray other SUSY
points? - Fit to pull distributions offraction SUSY in
combined fit - SU1
- mean -0.027 0.052
- s 0.993 0.037
- SU2
- mean -0.067 0.057
- s 1.040 0.037
- SU3
- mean -0.0006 0.051
- s 0.958 0.032
1000 toy MCs output
Fits are unbiased
10Summary simple fit with generic SUSY component
- Have enough information in MT,ET,mtop to
constrain individual background components
(tt1l,tt2l,Wjets) - Can account for unknown SUSY contribution in
control region with generic SUSY component in fit - In current simplified approach the generic SUSY
component in fit allows unbiased determination of
amount of SM background in presence of unknown
amount of SUSY in data - Have checked with multiple SUSY data points that
procedure essentially works for all SUSY points
11How to deal with correlations?
- 2D histograms give a clue, but no quantative
account - In the previous iteration we dealt with simple
factorizable PDFs - What if ET and MT have a small correlation?
How can we understand it?
How do we model it? - CONDITIONAL PDFs
- Model the dependence of ET and MT and functions
of each other
12Are ET, MT correlated in signal/bkg?
- Procedure
- Slice sample in bins of MT and look at ET
distribution - Make fit to distribution in each slice, see if
fit parameter changes vs MT - Make sure fits model
- can describe data in
- every slice
Wjets
0ltMT10
20ltMT30
30ltMT40
10ltMT20
40ltMT50
50ltMT60
60ltMT70
70ltMT80
110ltMT120
80ltMT90
100ltMT110
90ltMT100
missing ET
13Model the correlation
- First Look at ET slope dependence vs MT slice
- Conclusion
- There is a correlation because
- slope is not constant
- Next step try to model this dependence by
replacing - with slope of ET expressed as polynomial in
MT - Fit 2D distribution with 2D conditional product
PDF and see if ET slope dependence on MT is
correctly described
For MTgt160 statistics is the bottle neck and
fits are not trustworthy
14Fit the 2D distribution
- Now fit the 2D model with conditional dependence
- Wjets
- conditional exponential in ET
- exponentialgauss in MT
- Check the results of the fit by
- comparing the conditional PDF
- with data ? very good agreement
2D fit result Sliced data
OK
15Alternate correlation coordinates
- What if we turn the observables around?
- Slice sample in bins of ET and look at MT
distribution - Wjets exponentialgauss in MT
- 4 parameters slope(1) of the exponential
mean(2),sigma(3) and fraction (4) of gaussian - Multiple parameters cause
- more difficulty, low statistics
- make fits unstable
80ltET90
90ltET100
100ltET110
110ltET120
120ltET130
130ltET140
140ltET150
150ltET160
160ltET170
180ltET190
170ltET180
190ltET200
Thus we set the mean of the gaussian constant and
float all other parameters
200ltET210
230ltET240
220ltET230
210ltET220
MT peak portrays W-mass
MT
16Is MT dependent on ET?
- Make a plot of each parameter of MT as a function
of ET
- Statistics insufficient for
- ETgt300 (less than 20 events)
- Sigma gaussian constant (no dependence)
- Fraction gaussian gentle slope
- Exponential slope very gentle slope
Now we can fit
17Double conditional pdf (Wjets)
- But were really after
- Shape of MT depends on ET and shape ET depends on
MT - Wjets try-out
- Model the development of
- the shapes by polynomials
- Fit the Wjets in ET,MT to the
- data with two simple conditional
- dependences
- Check if correlations come out of the fit
correctly - conditional exponential in ET
N.B. binned fits shown here
- conditional exponentialgauss in MT
OK
18Triple conditional dependence
- Now double conditional fit works for ET,MT we
studied also the correlations with 3rd variable
mtop using same procedure - Idea once we have studied all the correlations,
we want to fit our model in all three variables
(ET, MT, mtop). If necessary every parameter from
the plain model gets fashioned with a correlation
to other variables. So we get a triple
conditional pdf to describe each individual
background sample. - Example
- How well does the triple conditional correlations
model fit our data? - Checks done for every background sample
- Make sure that all correlation coefficients are
significant - Global correlation of each coefficient should be
reasonable - Keep all the correlations that pass the checks
and on to combined fit
OK
Wjets
19Combined background fit
- Summary
- for every background sample (Wjets,
tt?lnln, tt?lnqq) we - have now a conditionally dependent
multi-dimensional - model that fits the data
- Next step
- Construct a combined model that describes
combined background and a non-zero contamination
from SUSY in control region - Pdata(MT,ET,mtop) Ntt1l Ptt1l(MT,ET,mtop)
Ntt2l
Ptt2l(MT,ET,mtop)
Nwjets Pwjets(MT,ET,mtop)
Nsusy Psusy(MT,ET,mtop) (Ansatz model) - How well does the new correlated model fit the
data? - Is it better than the simplified model?
20First iteration of correlated combined bkg fit
- Start out with simplest exercise Shapes of
components fixed - Determined from fits to individual background MC
samples - Shapes chosen for various backgrounds
- TTbar Semileptonic
- Correlated model
- 15 parameters
- Simple model
- 10 parameters
- TTbar Dileptonic
- Correlated model
- 7 parameters
- Simple model
- 5 parameters
- low statistics make
- correlation studies difficult
- Wjets
- Correlated model
- 15 parameters
TTbar Semileptonic
TTbar Dileptonic
Wjets
SU3
21Combined background fit w/o SUSY
- First fit model for combined background w/o SUSY
component with fixed shapes to mix of background
samples - Do we find back the fractions of background that
went into the fit? - How are the fractions compared to fit with the
simplified model? - Fit on 1fb-1 of background data
Correlated Fit Plain Fit
Truth Ndi 220 26
235 25 229 Nsemi 1073
62 1074 63
1072 Nwjets 416 61 401
61 408
OK
N.B. following results shown for release-11, but
comparable with release-12
22Combined background fit with SUSY
- Now we include SU3 contamination into our data
and a generic SUSY contribution into our model (
Ansatz model SUSY flat in ET, - gentle slope in MT, landau in mtop)
- Fit with shapes of components fixed on 1fb-1 of
data
Correlated Fit Plain Fit
Truth Ndi 180 42
158 39 229 Nsemi
1095 65 1127 67
1072 Nwjets 434 66 382
68 408 Nsu3 379 36
420 36 378
Correlated fit a bit better
23Combined background fit cross checks
- Cross check 1 fit model with floating SUSY
component to data w/o SUSY - Cross check 2 fit model w/o SUSY component to
data with SUSY contamination
Correlated Fit Plain Fit
Truth Ndi 239 32
219 31 229 Nsemi
1066 62 1080 64
1072 Nwjets 421 61 396
61 408 Nsu3 -17 16
14 18 0
OK
Correlated Fit Plain Fit
Truth Ndi 592 38
623 38 229 Nsemi
972 61 979 61
1072 Nwjets 524 64 484
64 408 Nsu3 0(fixed)
0(fixed) 378
OK
24How well does the generic SUSY shape work?
- Are we sure the correlated fit is not biased?
Again run fit 1000 times on toy MC samples drawn
from combined background p.d.f. fitted to MC-data
and look at pull distributions - How well does the combined correlated fit
portray other SUSY points? - Fit to pull distributions offraction SUSY in
combined fit - SU1
- mean -0.096 0.056
- s 1.005 0.036
- SU2
- mean -0.021 0.055
- s 1.031 0.037
- SU3
- mean -0.023 0.055
- s 1.030 0.036
1000 toy MCs output
Fits are unbiased
25Summary on fits with correlations
- Shown that we can determine the amount of
individual background components and a generic
SUSY contamination correctly using a simplified
model w/o correlations in a multidimensional fit - Possible correlations of parameters in ET,MT and
mtop have been studied for all background samples - Non-negligible correlations between variables
have been introduced into the our model using
triple conditional pdfs - The correlated model has enough information in
MT,ET,mtop to constrain individual background
components (tt1l,tt2l,Wjets) and account for
unknown SUSY contamination in control region - Effect of correlations is not huge ? Aim to
introduce a subset of most important correlations
in final model - Next
- Exclude signal region from fit and test the
procedure - of extrapolation from control region
26Extrapolation control region ? signal region
- Have shown that we can distinguish amounts of
Wjets, tt(1l) and tt(2l) background from data
using full observables space - Now repeat exercise without signal region
- Fit Wtt(1l)tt(2l)genericSUSY background to
data in control region - Extrapolate amount of W,tt(1l),tt(2l) to signal
region - Compare predicted amounts in signal region to MC
truth - Shapes still fixed (releasing shape parameters
last step of whole exercise)
27First iteration fit extrapolation
- Define two sidebands in MT, ET
- Example
- SB1 0 lt MT lt 120
- ET full range
- SB2 120 lt MT lt 300
- 0 lt ET lt 300
- Fit the combined model in both ranges
- While extrapolating make sure fractions are
correctly defined
28Combined fit extrapolation
- First try-out do extrapolation only in MT
- SB1 0 lt MT lt 70
- SB2 70 lt MT lt 150
- Do extrapolation in both observables
- SB1 0 lt MT lt 70
- 100 lt ET lt 200
- SB2 70 lt MT lt 150
- 200 lt ET lt 250
-
Fit Truth Ndi
51 12 49 Nsemi 4.6
0.7 4 Nwjets 4.5 1.2
3 Nsu3 114 12 118
OK
Fit Truth Ndi
2.6 2 5 Nsemi
0.1 0.04 0 Nwjets 1e-4 0.03
1 Nsu3 67 2 64
OK
29Summary and outlook
- We can correctly determine the amount of
background in the control region using only a
part of the observables space - The extrapolation to the signal region works
accurately using fit results from the control
region - ? Control region is enough to determine
the amounts of W, tt(1l) and tt(2l) background - ? We do no need the signal region to
determine the amount of SUSY contamination in the
control region - LAST STEP float as many shape parameters of the
- distribution as possible
and show that fitting - procedure still works
30Back-up slides
31Method-1 (S.Asai, K.Oe) CSC Note1/2
- Main idea separate data in two sets
- MTgt100 signal region
- MTlt100 control region
- Assumption-1 the shape of BG in control region
is same as shape of BG in signal region ? Just
need to scale with events - Assumption-2 SUSY is negligible in control region
Top Wnjets
Top Wnjets SUSY
Estimated signal region X scaling
Works without SU3 in the game ?Assumption 1
is fairly good
Actual BG can be estimated correctly
Estimated BG over- estimated by factor 2
Problems with SU3 ?Assumption 2 is no
good
(Kenta Oe Shoji Asai)
(Kenta Oe Shoji Asai)
32Release 12 Samples
- W0,1,2,3,4,5 partons
- WenunJets (n2..5) 5223-5226
- WmununJets (n3..5) 8203-8205
- WtaununJets (n2..5) 8208-8211
- T1 (MC_at_NLO) 5200
- separate at truth-level between
- semi-leptonic (e, mu, tau)
- di-leptonic (ee, mumu, tautau, emu,
etau, mutau) - SUSY
- SU1 5401
- SU2 5402
- SU3 5403
- SU4 6400
- SU6 5404
- SU8 5406
- All samples normalized to 1 fb-1
- All following plots for ELECTRONS
33Top mass correlation
top sliced in ET
- Studying the correlations we came
- across one unexpected result
- concerning the reconstructed
- hadronic top mass. It seemed
- to have a dependence on ET
- To see if this was an effect of reconstruction
software, we repeated the study with an extra
feature - the reconstructed hadronic top had
- to match a truth top (?R lt 0.2)
- The result of this study shows that
- the top mass is indeed dependent
- of ET, even if you match to truth
- MORE STUDIES NEED TO BE DONE
- BEFORE A CONCLUSION CAN BE DRAWN!!!
matched top sliced in ET
34Are ET, mtop correlated?
- See if mtop shape depends on ET
- mtop is described by a landau in our model ? 2
parameters
mtop sliced in ET
-Sigma landau mtop has a very gentle slope
-Mean landau mtop is correlated -Model
correlation as polynomial 1st order
35Are ET, mtop correlated?
- Procedure as before
- Slice 2D sample in bins of mtop and look at ET
distribution - Make fit to distribution in each slice, see if
fit parameter changes vs mtop
Fitted slope per slice vs. mtop
- Clear correlation between slope of exponential
in ET and mtop, as the slope is not constant vs
mtop - Model correlation as a polynomial of 2nd
order