Efficient algorithms of multidimensional ?-ray spectra compression - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient algorithms of multidimensional ?-ray spectra compression

Description:

... dimensional spectrum decompressed from data compressed ... Decompressed slice. ... After the experiment the operator can decompress any slices of equal or lower ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 69
Provided by: wwwzeut
Category:

less

Transcript and Presenter's Notes

Title: Efficient algorithms of multidimensional ?-ray spectra compression


1
Efficient algorithms of multidimensional ?-ray
spectra compression
  • V. Matoušek and M. Morhác
  • Institute of Physics, Slovak Academy of Sciences,
    Bratislava, Slovakia
  • Vladislav.Matousek_at_savba.sk
    Miroslav.Morhac_at_savba.sk
  • ACAT 2005, Zeuthen May 22
    - 27, 2005

2
The measurements of data in nuclear physics
experiments are oriented towards gathering
a large amount of multidimensional data.
  • The data are collected in the form of events.
  • In a typical experiment with spectrometers
    (Gammasphere, Euroball), each coincidence event
    consists of a set of n integers (e1, e2, , en),
    which are proportional to the energies of the
    coincident ?-rays.
  • Such a coincidence specifies a point in an
    n-dimensional hypercube.
  • Storing of multidimensional data goes very
    frequently beyond the available storage media
    volumes.

3
Multiparameter nuclear data taken from
experiments are typically stored
  • directly by events and index the coincidences -
    list mode storage
  • analyzed and stored as multidimensional
    histograms (hypercubes) - nuclear spectra.
  • List of events storing mode has several
    disadvantages
  • enormous amount of information that has to be
    written onto storage media (primarily tapes),
  • long time needed to process the data.

4
Multidimensional histograms - nuclear spectra
  • Advantages
  • Possibility of interactive handling with data.
  • It allows easily to create slices of lower
    dimensionality.
  • Disadvantages
  • The multidimesional amplitude analysis must be
    done.
  • Storage requirements for multidimensional
    hypercubes are enormous
  • e.g. 3-D ?-?-? coincidence nuclear spectrum with
    resolution of 14 bits (16 384 channels) per axis
    and 2 Bytes per channel requires 8 TB of memory.
  • Data often need to be stored in RAM for
    interactive handling.
  • It is needed to compress the multidimensional
    nuclear spectra to the size of available memory.

5
Suitable data compression techniques must satisfy
these requirements
  • Less storage space after compression of the
    multidimensional nuclear spectra
  • Preservation as much information as possible -
    minimum data distortion
  • Fast enough to be suitable for on-line
    compression during the experiment
  • Constrains
  • The size of the original multidimensional
    spectrum goes beyond the capacity of available
    memory.
  • Data from nuclear experiments are received as a
    train of events - they need to be analyzed and
    compressed separately event by event
  • Thus, the multidimensional amplitude analysis
    must be performed together with compression,
    event by event in on-line acquisition mode.

6
Suitable methods widely used
  • Binning - neighboring channels are summed
    together - loss of information.
  • Employing natural properties of data - e.g.
    symmetry removal from the multidimensional ?-ray
    spectra from Gammasphere - no loss of
    information.
  • Use of fast orthogonal transformation algorithms.
  • Storing the descriptors of events with counts of
    occurrences.

7
Symmetry removal
  • For instance in multidimensional ?-ray spectra
    from Gammasphere one can utilize the property of
    symmetry of the data. It holds
  • for 2-dimensional spectra E(?1, ?2) E(?2,
    ?1)
  • for 3-dimensional spectra

8
Principle of storage of 2-dimensional symmetrical
data
Two-dimensional symmetrical spectra with
resolution R 4.
  • The size of reduced space can be simply expressed

9
By composition of triangles of the sizes R, R-1,
..., 2, 1 we get the geometrical shape called
tetrahedron.
  • An example of storage 3-dimensional symmetrical
    data in the form of tetrahedron.
  • The size (volume) of the reduced space of the
    tetrahedron is

10
In case of 4-dimensional data by compositions of
tetrahedrons we obtain hyperhedron of the 4-th
order for R 4.
The volume of the hyperhedron of 4-th order can
be expressed as
11
The achievable compression ratios and storage
requirements for typical spectra (14 bit ADCs and
2 Bytes per channel)
Dimensionality of spectra Compression ratio CR Storage requirements MB
2 2 256
3 6 1.25 106
n n!
  • Radware package - the author combines both
    utilizing the property of symmetry and binning.
    Three-fold coincidences are stored in the form of
    cubes with the sizes 8 x 8 x 8. Inside of each
    cube the data are binned so that they span
    entirely the resolution 8192 channels in each
    dimension.

12
Compression methods using orthogonal
transformations
  • The multidimensional array, hypercube, is
    transformed into a new data array in transform
    domain, where the maximum amount of information
    is concentrated into smaller number of elements.
  • The basic premise is that the transformation of a
    signal has an energy distribution more amenable
    to retaining the shape of the data than the
    spatial domain representation.
  • Because of the inherent element-to-element
    correlation, the energy of the signal in the
    transform domain tends to be clustered in a
    relatively small number of transform
    coefficients.

13
The advantages of using fast orthogonal
transforms
  • Existing fast algorithms allowing their on-line
    implementation.
  • Linearity of the transforms. The signal that is
    being compressed need not to be stored statically
    in the memory. Each event can be transformed in
    time separately. The predetermined transformed
    coefficients are summed (analysis with on-line
    compression).

14
Fixed kernel orthogonal transforms usually
employed in data compression
  • Discrete Cosine, Walsh-Hadamard, Fourier, Hartley
    and other transforms.
  • Haar transform - the first and simplest scaling
    function of the mother wavelet suitable for
    generating an orthonormal wavelet basis.
  • The use of classical orthogonal transforms is
    very efficient provided that the form of
    compressed data resembles the form of the
    transform base functions.
  • The efficiency of the compression strongly
    depends on the nature of the experimental data.
  • Fourier transform and DCT are well suited to
    compress cosine and sine data shapes, whereas the
    Walsh-Hadamard transform is suitable to compress
    rectangular shapes in the input data.

15
There arose an idea to modify the shape of the
base functions of the orthogonal transform so
that the maximum possible compression of the
multidimensional spectra can be achieved
  • We have proposed the fast orthogonal transform
    with transform kernel adaptable to the reference
    vectors representing the processed data.
  • The structure of the signal flow graph is the
    Cooley-Tukey's type.
  • The principle of the method consists in direct
    modification of the multiplicative coefficients
    a, b, c, d of the signal flow graph in such a way
    that the base functions approximate the shape of
    the reference vector.

16
Let us illustrate the method for the case of size
of the transform N 4.
  • Signal flow graph of the fast adaptive orthogonal
    transform.

17
Basic element of the signal flow graph.
  • The coefficients of basic element of the signal
    flow graph are calculated as
  • ,
    , ,
    ,
  • where x0, x1 are values of the reference vector.
  • The values y0, y1 at the output are

  • , .

18
The transform coefficients are calculated in such
a way that for the reference vector at the input
they transform it into the one point at the
output.
  • We have proposed the fast algorithm of on-line
    multidimensional amplitude analysis with
    compression using adaptive Walsh transform
  • removes the necessity to store whole spectrum
    before compression, compression is performed
    event by event,
  • it is optimized so that only a minimum number of
    operations are needed.
  • The above mentioned principle of adaptability can
    be applied also for other transform structures.
  • The compression is achieved by discarding
    pre-selected elements in the transformed
    multidimensional array.
  • Two basic methods for element selection
  • zonal sampling
  • threshold sampling.

19
Block data compression using orthogonal
transforms with symmetry removal
  • In case of 3-dimensional space, it is divided
    into cubes. Each cube of the size S ? S ? S will
    be compressed to the cube of the size C ? C ? C.
  • We assume
  • The sizes of cubes are equal in all dimensions.
  • The number of cubes in each dimension and their
    sizes S, C are the power of 2.

20
The number of cubes in the tetrahedron is
  • Where
  • R is the number of channels (e.g. resolution of
    ADC), S is the size of cube before the
    compression.
  • For each cube we have to define adaptive
    transform and consequently we need to store its
    coefficients.
  • The number of transform coefficients for one
    dimension is

21
The elements are stored in the float format (4
Bytes). The transform coefficients must be stored
for each dimension, thus to store 3-dimen-sional
compressed data we need
  • Bytes of storage media.
  • Then in general for D-dimensional data the size
    of needed memory is
  • We have to adhere to the following rules
  • the size of the cube of original data S should be
    as small as possible.
  • the size of the cube after compression, C ,
    should be the biggest possible (C S), i.e., we
    desire smallest possible compression.
  • the data volume for the chosen combination C, S
    must fit the size of memory available.

22
The following sizes of cubes were chosen for
block transform compression of multidimensional
?-ray spectra of 16 384 channels per axis and 4
Bytes for each channel.
Dim. of spectra S Channels C Channels Storage MB Compression ratio CR
3 256 8 366 8010
4 1024 8 189.5 63.3 ? 106
5 2048 8 168.4 2.33 ? 1011
  • We have compressed histograms for 3-, 4-, 5-fold
    ?-ray coincidences of the event data from
    Gammasphere.

23
Examples achieved by employing compression on
3-fold ?-ray spectra with symmetry removal
  • Slice from original data (thin line) and
    decompressed slice (thick line) from data
    compressed by employing binning operation
    (Radware)

24
Slice from original data (thin line) and
decompressed slice (thick line) from data
compressed by employing adaptive Walsh transform.
25
Two-dimensional slice from original data.
26
Two-dimensional slice from data compressed by
employing binning operation (Radware).
27
Two-dimensional decompressed slice from data
compressed via adaptive Walsh transform.
28
Three-dimensional original spectrum (sizes of
spheres are proportional to counts the channels
contain).
29
Three-dimensional spectrum decompressed from data
compressed via adaptive Walsh transform. Due to
the smoothing effect of the adaptive transform
some information is lost.
30
Similar experiments were done with 4-fold
coincidence ?-ray spectra.
  • One-dimensional slice from original 4-dimensional
    spectrum (thin line) and the same slice
    decompressed from data compressed via adaptive
    Walsh transform (thick line). Due to enormous
    compression ratio the distortion of data in some
    regions is considerable. On the other hand in
    some regions the fidelity of the method is
    satisfactory.

31
Two-dimensional slice from original 4-dimensional
spectrum.
32
Two-dimensional slice decompressed from
4-dimensional data compressed via adaptive Walsh
transform
33
Compression of multidimensional ?-ray coincidence
spectra using list of descriptors.
  • The input data describing an external event can
    be expressed using descriptor. Each descriptor
    describes fully the event.
  • This method is based on maintaining the list of
    descriptors.
  • The number of different descriptors, which
    actually occurred during an experiment is much
    smaller than the number off all possible
    descriptors.
  • So, the multidimensional space has empty regions.
  • Conventional analyzer - The descriptor defines
    the location in the memory at which the counts
    (number of occurrences of the descriptor) is
    stored. The range of descriptors is defined by
    the size of the memory.

34
An alternative technique - Store only those
descriptors that actually occurred in the
experiment.
  • The correspondence between the location and the
    description is lost, it is necessary to store the
    descriptor as well as associated counts.
  • When a new event comes, it must be sorted into
    its channel in a list by using its descriptor
  • The problem is to devise a procedure for
    assigning the descriptor location number so that
    the time needed to store or read out a descriptor
    is minimized.
  • There exist several retrieval algorithms

35
Sequential method An obvious routine for
searching the list on the memory is to compare
the descriptor of a new event with the descriptor
in each location starting at the first one. When
a match is found, the associated count is
increased by one. Such an algorithm is time
consuming and cannot be accepted for on-line
applications.
  • Sequential retrieval of events

36
Tree method A considerable reduction of access
time can be achieved by using a tree search
algorithm. The descriptor of a new event is
compared repeatedly with descriptors arranged in
a tree. The main disadvantage of this technique
is its complexity and amount of redundant
information given by address pointers.
  • Tree search algorithm of event retrieval.

37
Partitioning and indexing method - It is the
combination of the two previous methods and is
implemented e.g. in the database Blue for
high-fold ?-ray coincidence data.
  • The hypercube is partitioned in high and low
    density regions. Each node of the tree represents
    a subvolume of the n-dimensional hypercube. The
    left and right child nodes represent the bisected
    volume of the parent. Associated with each
    leaf-node is a sublist of descriptors falling
    into appropriate geometric volume. They are
    arranged according sequential retrieval
    algorithm.
  • Cromaz M. et al. Blue a database for
    high-fold ?-ray coincidence data, NIM A 462
    (2001) 519.

38
Pseudo-random transformation of addresses of
locations of descriptors. Requirements
  • Uniform (or quasi-uniform) distribution of
    descriptors over memory addresses for any shape
    of multidimensional spectra.
  • Clusters of descriptors in physical field,
    hypercube, must be spread over the whole range of
    possible addresses and adjacent descriptors must
    go to addresses far away from each other.
  • Transformation must be fast, so that it can be
    applied on-line for high-rate experiments.
  • Unlimited number of methods of generation of
    pseudorandom numbers
  • residues of modulo operations, Hammings code
    technique, transformation through the division of
    polynomials, etc.

39
One of the methods satisfying the above stated
criteria and give pseudorandom distribution is
based on the assignment of inverse number (in the
sense of modulo arithmetic) to each address in
original space.
  • where M is prime.
  • This operation can be carried out through the use
    of look-up table of pre-computed inverse numbers.
  • Through the transformation each descriptor
    uniquely derives its storage address.
  • There is possibility of more descriptors being
    transformed to the same address. To overcome this
    serious limitation, the transformation is used
    only to generate an address at which to start
    searching in the bucket of descriptors.

40
A list of successive locations, where d is the
depth of searching are checked
  • If in a location the descriptor coincides with
    read out descriptor, the counts in this location
    are incremented.
  • If no descriptor coincides with read out
    descriptor and in a location within search depth
    is empty location, the descriptor is written to
    this location and its count is set to 1.
  • If there is no empty location within search depth
    and no descriptor coincides with read out
    descriptor, additional processing is done.
  • During the experiment, the events with higher
    counts (statistics) occur earlier and therefore
    there is a higher probability that free positions
    will be occupied by statistically relevant
    events.
  • One can utilize additional information and to
    store the events with the highest weights, i.e.,
    the highest probability of occurrence.

41
Provided that all locations for the depth d are
occupied and the descriptor did not occur in this
region, we scan the region once more and find the
event with the smallest probability of occurrence
  • Then we compare the probability of the processed
    event pk with pj. If pk gt pj we replace the
    descriptor in the position j with descriptor of
    the processed event and we set the counts of the
    event to 1. Otherwise, the processed event is
    ignored.
  • How to determine the probabilities of the
    occurrences of events?
  • Several approaches are possible in practice.
  • One of them is to utilize marginal (projection)
    spectra for each dimension. Then for
    n-dimensional event with the event values

42
this probability can be defined
  • where si is marginal spectrum for dimension i.
  • However many other definitions and approaches are
    possible.
  • Example of 3-fold coincidence ?-ray spectra
    storing
  • The descriptor of each event contains the
    addresses x, y, z and counts (short integers),
    i.e., each event takes 8 Bytes.
  • We utilize again the property of symmetry of the
    multidimensional ?-ray spectra. Then chosen prime
    module has to satisfy the condition

43
For the 384 MB memory we have chosen the prime
module M 601.
Assignment between numbers from ring lt1,600gt and
their modulo inverse numbers.
44
Spectrum of distances between two adjacent modulo
inverse numbers.
  • One can observe great scattering in these
    distances. This allows quasi-uniform distribution
    of descriptors in the transformed area.

45
We utilize the property of symmetry in ?-ray
coincidence spectra.
  • The algorithm of calculation of the address of an
    event in the transformed area
  • arrange the coordinates so that x y z
  • calculate ,
    ,
  • calculate ,
    ,
  • calculate address in the transformed area
  • This defines the beginning position of the
    searching for a given descriptor.

46
The whole linear array of descriptors (36 361 808
items) have been mapped to the 16384 channels
spectrum. One can observe quasi-constant
distribution, which witnesses about quasi-uniform
distribution of descriptors over all memory
addresses in the transform domain.
  • Distribution of descriptor counts in the
    transformed domain.

47
Prime module M, memory requirements and achieved
compression ratio for 3-, 4- and 5-fold ?-ray
spectra (16 384 channels in each dimension)
Dim. of spectra Prime module M Storage MB Compression ratio CR
3 601 290.9 30 239
4 157 262.9 33 452
5 73 237.0 37 100
  • The searching depth for all cases is 1000 events.

48
Three-fold coincidence spectra.
  • High counts region of 1-dimesional slice from
    original data (thick line) and corresponding
    region from compressed data (thin line).

49
Low counts region of slice from original data
(thick line) and corresponding from compressed
data (thin line).
50
Influence of searching depth on quality of
decompressed spectra
  • Increasing the length of searching in the buffer
    of compressed events improves the preservation of
    the peak heights.
  • In all spectra we subtracted background.

51
Narrow (one peak wide) 1-dimensional slice from
non-compressed original data (thick line) and
compressed 3-dimensional array (thin line), for
the searching depth1000 events.
52
Two-dimensional slice from original 3-dimensional
data.
53
Reconstructed 2-dimensional slice from
compressed3-dimensional data.
54
Three-dimensional slices from both original and
compressed events.
  • Original 3-dimensional data.

55
Decompressed 3-dimensional data.
56
Four-fold coincidence events.
  • Part of 1-dimensional slices from non-compressed
    original (thick line) and compressed
    4-dimensional (thin line) arrays.

57
  • Two-dimensional slice from original 4-dimensional
    data.

58
Two-dimensional slice from compressed
4-dimensional data.
59
  • Three-dimensional slice from original
    4-dimensional data.

60
Three-dimensional slice from compressed
4-dimensional data.
61
Examples of 4-dimensional slices from 4-fold
coincidence data in pies display mode.
  • Original 4-dimensional data. The sizes of balls
    are proportional to the volumes and the colors in
    the pies correspond to the content of channels in
    the 4-th dimension (64 channels in x, y, z
    dimensions and 16 channels in v dimension).

62
Four-dimensional slice from compressed
4-dimensional data.
  • Decompressed slice. Big peaks correspond in both
    data, however in small peaks some differences can
    be observed.

63
Examples of applying compression methods to
5-fold coincidence data.
  • A part of one-dimensional slices from original
    (thick line) and compressed (thin line) 5-fold
    coincidence events.

64
  • Two-dimensional slice from original 5-dimensional
    data.

65
Two-dimensional slice from compressed
5-dimensional data.
66
Conclusion
  • In the talk were presented new methods of
    multidimensional coincident ?-ray spectra
    compression
  • Using fast adaptive orthogonal transforms.
  • Using method of retaining the list of descriptors
    scattered in the compressed area using
    pseudorandom address transformation method.
  • The processed data have the property of symmetry
    in their nature. In both cases the symmetry
    removal methods are implemented directly in
    compression algorithms.
  • A new class of adaptive transforms with the
    transformation kernel modifiable to the reference
    vectors that reflect the shape of the compressed
    data were presented.

67
Methods of compression used
  • Orthogonal transforms - the compression is
    achieved by removal of redundant and irrelevant
    data components.
  • List of descriptors - the compression is achieved
    on account of quasi-uniform distribution of the
    data in the transform domain space and thus by
    its more efficient utilization.
  • The algorithms are designed to be employed for
    on-line compression during experiment.
  • After the experiment the operator can decompress
    any slices of equal or lower dimensionality from
    the compressed data.
  • For nuclear spectra, both methods proved to give
    better results as the classical ones and allow to
    achieve higher compression ratios with less
    distortion.

68
Some relevant publications
  1. Morhác M., Matoušek V. Multidimensional nuclear
    spectra compression using fast adaptive Fourier -
    based transforms, Computer Physics
    Communications, 165 (2005) 127.
  2. Matoušek V., Morhác M., Kliman J., Turzo I.,
    Krupa L., Jandel M., Efficient storing of
    multidimensional histograms using advanced
    compression techniques, NIM A 502 (2003) 725.
  3. Morhác M., Matoušek V., A new method of on-line
    multiparameter analysis with compression, NIM A
    370 (1996) 499.
  4. Morhác M., Matoušek V., Turzo I., Multiparameter
    data acquisition and analysis system with on-line
    compression. IEEE Transaction on Nuclear Science
    43 (1996) 140.
  5. Morhác M., Kliman J., Matoušek V., Turzo I.,
    Integrated multi-parameter nuclear data analysis
    package, NIM A 389 (1997) 89.
Write a Comment
User Comments (0)
About PowerShow.com