Title: CSC 613.2 Lecture 4
1- CSC 613.2 Lecture 4
- Performance and Verification Models
- Software network modeling
- Correctness checking
- Functional transformations
- Theory of verification
- Theories of program invariance
- Fundamental programs for searching
- Hashing, sorting, file structures
2Performance Verification Models The
construction and analysis of models suited for
performance and reliability studies of real-world
phenomena is a difficult task. To a large extent
this problem is attacked using human intelligence
and experience. Due to increasing size and
complexity of systems, this tendency seems even
growing performance as well as reliability
modelling becomes a task dedicated to
specialists, in particular for systems exhibiting
a high degree of irregularity. Traditional
performance models such as queuing networks lack
hierarchical composition and abstraction means,
significantly hampering the modelling of systems
that are developed nowadays.
3Some notable results and concepts have been
developed by several authors, but they remain
isolated from the system design cycle, due to the
lack of a well-founded theory of hierarchy,
composition and abstraction. On the other hand,
for describing the plain functional behaviour of
systems various specification formalisms have
been developed that are strongly focussed on
modelling systems in a compositional,
hierarchical manner. A prominent example of such
specification formalisms is process algebra which
has emerged as an important framework to achieve
compositionality. Process algebra provides a
formal apparatus for reasoning about structure
and behaviour of systems in a compositional way.
4Correctness Checking
The process of checking the correctness of
time-critical computations involves a fundamental
and inherently conflicting trade-off the
precision of the checking vs. the time required
to perform the checking process. This trade-off
is particularly relevant when the computations
are intensive and can require appreciable time
for completion. Purpose Correctness checking
enables us to confirm that the business area
model is an accurate representation of the area
and that it conforms to the IEM rules and
conventions. In particular, consider the
following questions - Have the
attribute types been assigned to the correct
entity types? - Do the models exclude
all unnecessary elements? - Are the
information views consistent across
processes? - Does the model look right
to the end-users? The result of correctness
checking is a "correct" business area model on
which to base the Business System Design stage.
5Guideline 1 - Normalization - Normalization is
based on finding functional dependencies between
attributes. You will need to make use of users'
knowledge for this. - It is best to work
through all the steps of normalization on all
entity types. Some entity types are not as
obvious as they seem. - It is easier to carry
out normalization if examples of attribute values
are available, but remember that you are
normalizing the attributes, not their values. It
is very easy to be misled by apparent
dependencies between values when none exists
between the respective attributes. - Do not
formally document the results of
normalization. - When constructing the various
normal forms, you produce new, smaller
collections of attributes. Every 3NF collection
of attributes should correspond to an entity
type. In 4NF collections, any new collections are
subtypes of the 3NF entity type. - A useful
ditty to remember the normal forms "Every
attribute is dependent on the identifier (1NF),
the whole identifier (2NF), and nothing but the
identifier (3NF)."
6Guideline 2 - Process Dependency Checking -
You should do process dependency checking when
building up the diagrams. In correctness
checking, you are concerned only with checking
that this has been done. - The most important
checks are that every elementary process appears
in a dependency diagram and that information
views crossing boundaries are consistent. The
other checks are important, but if time is short
these checks should be chosen above the
others. - Information views can be decomposed
into other information views. If this is done,
checks must be made that the decomposition is
consistent (i.e., that the total of lower-level
views is the same as higher-level views). - You
need to know what can trigger an elementary
process. This is why you check that there is some
incoming information view with an identifiable
source, or an event.
7 Correctness Checking You should do redundancy
checking when building up the diagrams. In
correctness checking, you are concerned only with
ensuring that this has been done. Synonyms Thorou
ghly check for synonymous objects. Duplicate
Attributes Check that attributes are not
duplicated in other entity types. If they are and
if the entity types are related, remove the
attribute from one entity type. If they are and
the entity types are not related, even
indirectly, there is probably a missing
relationship. Overlapping Entity Types Another
possibility for duplicated attributes is that two
or more entity types overlap (i.e., are not
sufficiently defined). Redundant
Relationships Relationships may be redundant if
all the information expressed by one relationship
can be expressed by others in all circumstances
if this is the case, remove the redundant
relationship.
8Derived Attributes Find all derived attributes
and check that a derivation algorithm has been
defined. This can be done automatically by
computerized tools. Duplicate Processes Check
process decomposition and dependency diagrams for
any processes that are duplicated. Some will have
already been found, others not. Mark any such
processes as duplicated and consolidate any
documentation. Processes with different
names Processes with the same name are easy to
find (but check that they are the same process).
Processes with different names can be difficult.
This can happen in a large project in which
analysts are working on different parts of the
business area. The two processes must be given
the same name so that future modifications to one
are also applied to the other. Unless you
recognize that certain processes are essentially
the same, you will waste effort in the design
stage by designing two procedures (or more) for
the same process.
9Candidate duplicate processes Candidate duplicate
processes can also be found by scanning the
process/entity type matrix to find processes that
have the same set of actions upon entity types.
Processes with identical sets of actions are not
necessarily duplicates, so next check the process
logic models and the definitions of the two
processes to determine whether they are the same.
If they are, choose a common name and make
appropriate changes to the diagrams. When two
processes are similar, but not identical, the
chances are that some of their subprocesses are
the same. Automated facilities Searching for
duplicate processes can be improved by an
automated facility. Such a facility can
report - Processes that have the same (or
similar) entity, relationship, and attribute
actions - Processes that have the same (or
similar) input and output information views
10Guideline 4 - Quantity Cross-Checking -
Quantity cross-checking is tedious and time
consuming. Only use the technique when there is
concern over a mismatch between volumes and
cardinality. Some checks of this sort can be
accomplished through automated facilities. - A
"many" in a process dependency diagram implies
that there should be a "many" in an entity
relationship diagram (though not always). - If
the volumes of related entity types are about the
same, this suggests that the cardinality is about
one-to-one. If the volumes are vastly different,
there is a one-to-many (smaller-volume-to-larger-v
olume) cardinality.
11Guideline 5 - Structured Walkthroughs -
Structured walkthroughs must be carefully managed
and controlled or they may not succeed. - If
users are involved in a structured walkthrough,
you must either spend time beforehand introducing
them to the techniques whose results are to be
reviewed, or you must start the walkthrough by
gently taking uneducated users through the
models. If the latter is done, the initial
objects must be reviewed again before the end of
the walkthrough to ensure that the users have a
full opportunity to question everything.
12Verification theory The verification theory (of
meaning) is a philosophical theory proposed by
the logical positivists of the Vienna Circle. A
simplified form of the theory states that a
proposition's meaning is determined by the method
through which it is emprically verified. In other
words, if something cannot be empiricially
verified, it is meaningless. For example, the
statement "It is raining" is meaningless unless
there is a way whereby one could, in principle,
verify whether or not it is in fact raining. The
theory has radical consequences for traditional
philosophy as it, if correct, would render much
of past philosophical work meaningless, for
example metaphysics and ethics.
13Network-modeling tools (IT Guru, NetRule,
Shunra/Storm) Simple questions sometimes don't
have simple answers. When questions like "Can we
increase the bandwidth to our regional offices?"
"Can we provide more redundancy for our critical
links?" and "Can we do this and reduce costs at
the same time?" are asked of enterprise network
executives, it becomes more difficult. Keeping
the complex beast of the various network sections
running efficiently at maximum bandwidth and
minimum cost is a monumental task.
Network-modeling tools let designers or
operators test changes to network topology before
they are implemented in a production network.
14Modeling accuracy The key to network modeling is
the ability to closely match the generated
network model map to the real network topology.
Configuration and performance Ability to e.g.
import configuration directly from Cisco and
Juniper devices Installation Issues Documentati
on Deployment considerations
15HASHING A hash table is simply an array that is
addressed via a hash function. For example, in
Figure 1, HashTable is an array with 8 elements.
Each element is a pointer to a linked list of
numeric data. The hash function for this example
simply divides the data key by 8, and uses the
remainder as an index into the table. This yields
a number from 0 to 7. Since the range of indices
for HashTable is 0 to 7, we are guaranteed that
the index is valid. To insert a new
item in the table, we hash the key to determine
which list the item goes on, and then insert the
item at the beginning of the list. For example,
to insert 11, we divide 11 by 8 giving a
remainder of 3. Thus, 11 goes on the list
starting at HashTable(3). To find a number, we
hash the number and chain down the correct list
to see if it is in the table. To delete a number,
we find the number and remove the node from the
linked list.
16Entries in the hash table are dynamically
allocated and entered on a linked list associated
with each hash table entry. This technique is
known as chaining. If the hash function is
uniform, or equally distributes the data keys
among the hash table indices, then hashing
effectively subdivides the list to be searched.
Worst-case behavior occurs when all keys hash to
the same index. Then we simply have a single
linked list that must be sequentially searched.
Consequently, it is important to choose a good
hash function. The following sections describe
several hashing algorithms.
17Table Size Assuming n data items, the hash table
size should be large enough to accommodate a
reasonable number of entries. Table 1 shows the
maximum time required to search for all entries
in a table containing 10,000 items. A
small table size substantially increases the time
required to find a key. A hash table may be
viewed as a collection of linked lists. As the
table becomes larger, the number of lists
increases, and the average number of nodes on
each list decreases. If the table size is 1, then
the table is really a single linked list of
length n. Assuming a perfect hash function, a
table size of 2 has two lists of length n/2. If
the table size is 100, then we have 100 lists of
length n/100. This greatly reduces the length of
the list to be searched. There is considerable
leeway in the choice of table size.
18Hash Functions In the previous example, we
determined a hash value by examining the
remainder after division. In this section well
examine several algorithms that compute a hash
value. Division Method (TableSize Prime) A
hash value, from 0 to (HashTableSize - 1), is
computed by dividing the key value by the size of
the hash table and taking the remainder Public
Function Hash(ByVal Key As Long) As Long Hash
Key Mod HashTableSize End Function Selecting an
appropriate HashTableSize is important to the
success of this method. For example, a
HashTableSize divisible by two would yield even
hash values for even keys, and odd hash values
for odd keys. This is an undesirable property, as
all keys would hash to an even value if they
happened to be even. If HashTableSize is a power
of two, then the hash function simply selects a
subset of the key bits as the table index. To
obtain a more random scattering, HashTableSize
should be a prime number not too close to a power
of two.
19Multiplication Method (TableSize 2N) The
multiplication method may be used for a
HashTableSize that is a power of 2. The key is
multiplied by a constant, and then the necessary
bits are extracted to index into the table. One
method uses the fractional part of the product of
the key and the golden ratio, or (5-1)/2. For
example, assuming a word size of 8 bits, the
golden ratio is multiplied by 28 to obtain 158.
The product of the 8-bit key and 158 results in a
16-bit integer. For a table size of 25 the 5 most
significant bits of the least significant word
are extracted for the hash value. The following
definitions may be used for the multiplication
method ' 8-bit index Private Const K As Long
158 ' 16-bit index Private Const K As Long
40503 ' 32-bit index Private Const K As Long
2654435769 ' bitwidth(index)w, size of
table2m Private Const S As Long 2(w -
m) Private Const N As Long 2m - 1 Hash ((K
Key) And N) \ S
20For example, if HashTableSize is 1024 (210), then
a 16-bit index is sufficient and would be
assigned a value of 2(16 - 10) 64. Constant N
would be 210 - 1, or 1023. Thus, we
have Private Const K As Long 40503 Private
Const S As Long 64 Private Const N As Long
1023 Public Function Hash(ByVal Key As Long) As
Long Hash ((K Key) And N) \ S End Function
21Variable String Addition Method (TableSize
256) To hash a variable-length string, each
character is added, modulo 256, to a total. A
hash value, range 0-255, is computed. Public
Function Hash(ByVal S As String) As Long Dim h As
Byte Dim i As Long h 0 For i 1 to Len(S) h
h Asc(Mid(S, i, 1)) Next i Hash h End
Function
22Variable String Exclusive-Or Method (Tablesize
256) This method is similar to the addition
method, but successfully distinguishes similar
words and anagrams. To obtain a hash value in the
range 0-255, all bytes in the string are
exclusive-or'd together. However, in the process
of doing each exclusive-or, a random component is
introduced. Private Rand8(0 To 255) As
Byte Public Function Hash(ByVal S As String) As
Long Dim h As Byte Dim i As Long h 0 For i 1
To Len(S) h Rand8(h Xor Asc(Mid(S, i, 1))) Next
i Hash h End Function Rand8 is a table of 256
8-bit unique random numbers. The exact ordering
is not critical. The exclusive-or method has its
basis in cryptography, and is quite effective
23Variable String Exclusive-Or Method (Tablesize lt
65536) If we hash the string twice, we may derive
a hash value for an arbitrary table size up to
65536. The second time the string is hashed, one
is added to the first character. Then the two
8-bit hash values are concatenated together to
form a 16-bit hash value. Private Rand8(0 To
255) As Byte Public Function Hash(ByVal S As
String, ByVal HashTableSize As Long) As Long Dim
h1 As Byte Dim h2 As Byte Dim c As Byte Dim i As
Long if Len(S) 0 Then Hash 0 Exit
Function End If h1 Asc(Mid(S, 1, 1)) h2 h1
1 For i 2 To Len(S) c Asc(Mid(S, i, 1)) h1
Rand8(h1 Xor c) h2 Rand8(h2 Xor c) Next i '
Hash is in range 0 .. 65535 Hash (h1 256)
h2 ' scale Hash to table size Hash Hash Mod
HashTableSize End Function Hashing strings is
computationally expensive, as we manipulate each
byte in the string. A more efficient technique
utilizes a DLL, written in C, to perform the hash
function. Included in the download is a test
program that hashes strings using both C and
Visual Basic. The C version is typically 20 times
faster.
24Node Representation If you plan to code your own
hashing algorithm, you'll need a way to store
data in nodes, and a method for referencing the
nodes. This may be done by storing nodes in
objects and arrays. I'll use a linked-list to
illustrate each method.
25Objects References to objects are implemented as
pointers in Visual Basic. One implementation
simply defines the data fields of the node in a
class, and accesses the fields from a module '
in class CObj Public NextNode As CObj Public
Value As Variant ' in module Main Private hdrObj
As CObj Private pObj As CObj ' add new node to
list Set pObj New CObj Set pObj.NextNode
hdrObj Set hdrObj pObj pObj.Value value '
find value in list Set pObj hdrObj Do While Not
pObj Is Nothing If pObj.Value value Then Exit
Do Set pObj pObj.NextNode Loop ' delete first
node Set pObj hdrObj.NextNode Set hdrObj
pObj Set pObj Nothing In the above code, pObj
is internally represented as a pointer to the
class. When we add a new node to the list, an
instance of the node is allocated, and a pointer
to the node is placed in pObj. The expression
pObj.Value actually de-references the pointer,
and accesses the Value field. To delete the first
node, we remove all references to the underlying
class.
26Arrays An alternative implementation allocates an
array of nodes, and the address of each node is
represented as an index into the array. ' list
header Private hdrArr As Long ' next free
node Private nxtArr As Long ' fields of
node Private NextNode(1 To 100) As Long Private
Value(1 To 100) As Variant ' initialization hdrArr
0 nxtArr 1 ' add new node to list pArr
nxtArr nxtArr nxtArr 1 NextNode(pArr)
hdrArr hdrArr pArr Value(pArr) value ' find
value in list pArr hdrArr Do While pArr ltgt 0 If
Value(pArr) value Then Exit Do pArr
NextNode(pArr) Loop
27Each field of a node is represented as a separate
array, and referenced by subscripts instead of
pointers. For a more robust solution, there are
several problems to solve. In this example, we've
allowed for 100 nodes, with no error checking.
Enhancements could include dynamically adjusting
the arrays size when nxtArr exceeds array bounds.
Also, no provisions have been made to free a node
for possible re-use. This may be accomplished by
maintaining a list of subscripts referencing free
array elements, and providing functions to
allocate and free subscripts. Included in the
download is a class designed to manage node
allocation, allowing for dynamic array resizing
and node re-use.
28Comparison Table 2 illustrates resource
requirements for a hash table implemented using
three strategies. The array method represents
nodes as elements of an array, the object method
represents nodes as objects, while the collection
method utilizes the built-in hashing feature of
Visual Basic collections.
29Memory requirements and page faults are shown for
insertion only. Hash table size for arrays and
objects was 1/10th the total count. Tests were
run using a 200Mhz Pentium with 64Meg of memory
on a Windows NT 4.0 operating system. Statistics
for memory use and page faults were obtained from
the NT Task Manager. Code may be downloaded that
implements the above tests, so you may repeat the
experiment. It is immediately apparent that
the array method is fastest, and consumes the
least memory. Objects consume four times as much
memory as arrays. In fact, overhead for a single
object is about 140 bytes. Collections take about
twice as much room as arrays. An interesting
anomaly is the high deletion time associated with
objects. When we increase the number of nodes
from 50,000 to 100,000 (a factor of 2), the time
for deletion increases from 13 to 48 seconds (a
factor of 4). During deletion, no page faults
were noted. Consequently, the extra overhead was
compute time, not I/O time. One implementation
used at run-time for freeing memory involves
maintaining a list, ordered by memory location,
of free nodes. When memory is freed, the list is
traversed so that memory to be released can be
returned to the appropriate place in the list.
This is done so that memory chunks may be
recombined when adjacent chunks are freed.
Unfortunately, this algorithm runs in O(n2) time,
where execution time is roughly proportional to
the square of the number of items being
released. I encountered a similar problem while
working on compilers for Apollo. In this case,
however, the problem was exacerbated by page
faults that occurred while traversing links. The
solution involved an in-core index that reduced
the number of links traversed.
30Conclusion Hashing is an effective method to
quickly access data using a key value.
Fortunately, Visual Basic includes collections
an effective solution that is easy to code. In
this article, we compared collections with
hand-coded solutions. Along the way we discovered
that storing data in objects for large datasets
can incur substantial penalties in both execution
time and storage requirements. In this case, you
can make significant gains by coding your own
algorithm, utilizing arrays for node storage. For
smaller datasets, however, collections remain a
good choice.