Title: Feature Extraction
1Feature Extraction
Dmitry Chirkin, LBNL
IceCube Collaboration meeting in Berkeley, March
2005
2What is Feature Extraction
- Given an ATWD or FADC waveform, determine arrival
times of all photons which contributed - hit series
- FEInfo combination or leading edge, width,
charge (or amplitude) - Also applicable to AMANDA TWR
3Feature Extraction last fall (DFL data)
Fitted function p0A0 exp(-(t-t0)/s0)(1-exp(-(t
-t0)2/s02))
4New features discovered since
5Multi-peak fit
6Multi-peak fit (cont.)
- Fit the sum of two SPE functions to the waveform
- repeat for all SPE terms with amplitude above
the threshold until the quality of the fit stops
improving
7In-Ice Fits (low PEs)
8In-Ice Fits (high PEs)
9IceTop Fits
10Other feature extraction features
- other fitting functions were tried log-normal
(by Tom McCauley) ? provides a different
description of the rising leading edge - the undershooting is now fitted, so higher ATWD
channels should be used not only for saturated
values, but also for values close to 0 - zero-suppression road grader algorithm needs be
modified to suppress the most-repeated value
instead of 0 - the higher ATWD channels are narrower, creating
extra mismatch peak at the trailing edge. - ? higher-channel peaks need to be widened before
they are combined with channel 0
11Other feature extraction features
- a slewing correction (shift of the leading
edge proportional to width) may need to be made
to the leading edge to describe electronics
delays - Laser DFL or flasher in-ice calibration?
- another correction proportional to high-voltage
needs must made to describe high-voltage-dependent
delay of the developing signal in the PMT - ? Laser DFL calibration should be sufficient?
12IceTray FE implementation
- FeatureExtractor is a project on glacier, a part
of - OFFLINE-SOFTWARE
- FATDATA
- ? example script is in the fat-reader/resources/
directory - ? you can control
- MaxNumHits maximum number of separate SPE
functions to be fit, if necessary (default 20) - through the DataOptions of the fat-reader
select hits that only pass a certain fraction of
SPE threshold (--thrs) - At this time hidden in the source code
- maximum SPE waveform width reduce it to split
up large pulses into smaller ones (default 6
bins) - fixed parameters for the description of
undershooting
13FeatureExtractor usage and dataclasses
- ATWDChannelMerger must be plugged in to produce
the CombinedATWD waveform used by the
FeatureExtractor - I3DOMCalibration class was modified to
accommodate calibration and combining of the ATWD
channels of different size - ? now Set methods set by ATWD bin name, 0-127
in reversed time order, as before - ? now Get methods get by the time-ordered ATWD
bin number, 0-127 in correct time order ? this
changed - ? need not worry about this if only combined
ATWD traces or Feature-Extracted hits are used
14Conclusions
- possibility to fit multi-peak waveforms was a
highly-anticipated feature, which should be
considered a major improvement - precision of the multi-peak fits for complicated
waveforms is proportional to the time one is
willing to spend on extracting features from
waveforms from a few milliseconds for 2-3 peaks
to a few seconds for 10 to a few dozen seconds
for 20. - ATWDChannelMerger and I3DOMCalibration class
were modified to accommodate for hits with
different ATWD-channel sizes (e.g., currently for
in-ice 128, 32, 32) - FeatureExtractor is a part of both
OFFLINE-SOTWARE, and FATDATA. For the
FeatureExtractor development the FATDATA provides
a more versatile environment, allowing for a fast
selection of the high- or low-PE events.
15Road-grader zero-suppression
- Common SPE-like waveform
- pedestal is shifted down compared to the value
expected from calibration. This is a well-known
(by now) effect and is corrected by the
fat-reader - ok to use road-grader as is
16Road-grader zero-suppression
- A large-amplitude, saturated waveform
- undershoot is not recorded by the current
road-grader implementation, but is a part of the
waveform features
17Road-grader zero-suppression
- Highly-saturated muti-PE waveform
- the undershooting and small pulses on top of the
undershot tail are all suppressed by the
road-grader. Both amount of the undershooting and
small pulses are features of the waveform and are
used/reconstructed by the FeatureExtractor
18Road-grader proposed modifications
- find the most-repeated value, and compress all
values above and below it (no more than a
threshold-setting away) - this requires one pass over the incoming
waveform and a small (256 byte) memory buffer - the zero-suppressed value itself should be
encoded into the compressed data - to make word length more uniform (11 bits all
the time), prepend the 10-bit number of the next
zero-suppressed words with 1, and all other
(10-bit) values with 0. This is more uniform
(and possibly efficient) than the current
road-grader Huffman-encoding algorithm
19Modified road-grader compression ratio