Symbol Table Data Structures

Symbol Table Operations Include:
1) retrieve some symbol's attribute
2) update some symbol's attribute
3) insert a new symbol consisting of a name and attribute value
4) sort the symbol table

Possible Data Structure Choices:

1) Unsorted Linear Array
Searching (Best, Worst, Average):
Insertion (Best, Worst, Average):

2) Ordered Linear Array
Searching (Best, Worst, Average):
Insertion (Best, Worst, Average):

3) Binary Search Tree
Searching (Best, Worst, Average):
Insertion (Best, Worst, Average):

4) AVL Tree
Searching (Best, Worst, Average):
Insertion (Best, Worst, Average):

Hash Table - computes the symbol's address from the symbol itself
Searching (Best, Worst, Average):
Insertion (Best, Worst, Average):

Data Structures 1-3 rely on efficient search methods.

Hashing calculates where key K is using a function of K itself. i.e. some function h(K) maps the domain of keys K into the range of addresses 0,1,2,...,M-1. Some problems to solve include:
(1) finding a suitable function h
(2) how to determine a suitable M
(3) handling keys that hash to the same location.

Hash Methods:

1) Mid Square - this method first squares the key. Next a certain number of bits are taken from the middle of the squared key to form a bucket address. The whole idea here is that all characters that make up the key are used in this process thus producing a bucket address that is different from keys with similar character makeups.

2) Division - this method simply divides the key by some number N where N is a prime number corresponding the number of buckets in the hash table. This gives a bucket address in the range of 0 to N-1.

3) Multiplicative - start with the golden ratio (g = (sqrt(5)-1)/2 = 0.61803399. Examine multiples i*g for 1<=i<=10, the fractional parts of those multiples, and the floor of 10 times the fractional parts of the multiples where {x} denotes x-floor(x).

Let's examine the fractional part {i*g} over the interval [0,1). Specifically, let's examine where each newly added point falls. This phenomenon generalizes as follows: Let g be any irrational number. The fractional parts of N multiples of g over the interval [0,1) divides the interval into N+1 segments. That is {g}, {2*g}, {3*g}, ..., {N*g} divides the interval into N+1 segments and the next point {(N+1)*g} will fall into the next largest segment. Research has found that choosing g equal to the golden ratio gives the most uniform interval splitting of all possible values of irrational numbers over the interval [0,1).

Let's choose some arbitrary K as a multiplier for g and then compute the fractional part of K*g scaled by the tablesize. This implies the following: M*{K*g}. We will then choose a hash address to be the greatest integer <= M*{K*g}.

Note: This computes a hash address in the range of 0 to (M-1). Convince yourself that this is true. That is, we have the following function: h(K)=floor(M*{K*g})

h(K) needs to have two properties:
1) needs to be easy to compute
2) needs to be good enough so as to minimize the number of collisions

Consider choosing h(K) as follows:

h(K)=trunc(M*(g*K-trunc(g*K))) where:
M is the prime table size
g is the golden ratio (0.61803399)
K is the key value such as the sum of the characters converted to an integer

Note: To find K you could simply:
1) sum up the ascii values of the characters that make up the key
2) ord(first)+ord(last)+symbol size
3) use mid square type of procedure

Collision Handling Techniques

Open Addressing
Suppose we have a key K such that h(K) maps to the same location as some arbitrary key K' which is distinct from K. To resolve this conflict in open addressing, we must find another unoccupied space for this key.

If b0=h(K), then let b0,b1,b2,...,bM-1 be a probe sequence where (M=tablesize)

Hash Table Search by Open Addressing is as follows:

Let T be some table with M entries which looks like: T[0],T[1],T[2],...,T[M-1]. We will assume without loss of generality that all keys inserted have positive values and empty entries are signified by a value of 0. We will search for the key K. The following algorithm will perform this search:

i := h(K)
j := i
while((T[i].key <> K) and (T[i].key <> 0)) do
  begin
    i := i - p(K)
    if(i < 0) then i := i + M
    if(j = i) then tablefull(i)
  end
return i
The question is: How do we choose p(K)?

1) Linear Probing

With linear probing we choose p(K)=1 which implies the following: h(K), h(K)-1, h(K)-2,...,0,M-1,M-2,...,h(K).

Problem: Using the hash function h(Kn)=n mod 11 show what the hash table would look like after inserting the following keys: M13,G7,Q17,Y25,R18. Just use the number following the letter as the key value.

What happens if you add Z26 and then F6?

We will define the following terms:
1) primary clustering - this implies that all keys that collide at address b will extend the cluster that contains b. In the linear probing example, we had key Q17 colliding with key F6. This is an example of primary clustering because the collision of F6 and Q17 extended the primary cluster.

2) secondary clustering - is when adjacent clusters join to form a composite cluster.

We would like the collision handling technique to avoid both primary and secondary clustering.

Let us consider the previous example where we inserted the keys M13, G7, Q17, Y25, R18, Z26, and F6. Given a new key K to be inserted into the hash table using h(Kn) = n mod 11, what is the chance of location 9 being filled with K? What is the chance of location 0 being filled with K?

In analyzing a given hash method and collision handling technique, it is good to compute the average number of probes necessary to find an arbitrary key K. To compute the average, the following formula is used:
avg = (summation of the # of probes to locate each key in the table) / # of keys in the table

Problem: For the previous hash method and linear probing, compute the average number of probes.


© Douglas J. Ryan / ryandj@pacificu.edu