Let's examine the fractional part {i*g} over the interval [0,1). Specifically, let's examine where each newly added point falls. This phenomenon generalizes as follows: Let g be any irrational number. The fractional parts of N multiples of g over the interval [0,1) divides the interval into N+1 segments. That is {g}, {2*g}, {3*g},
..., {N*g} divides the interval into N+1 segments and the next point {(N+1)*g} will fall into the next largest segment. Research has found that choosing g equal to the golden ratio gives the most uniform interval splitting of all possible values of irrational numbers over the interval [0,1).
Let's choose some arbitrary K as a multiplier for g and then compute the fractional part of K*g scaled by the tablesize. This implies the following: M*{K*g}. We will then choose a hash address to be the greatest integer <= M*{K*g}.
Note: This computes a hash address in the range of 0 to (M-1). Convince yourself that this is true. That is, we have the following function: h(K)=floor(M*{K*g})
h(K) needs to have two properties:
1) needs to be easy to compute
2) needs to be good enough so as to minimize the number of collisions
Consider choosing h(K) as follows:
h(K)=trunc(M*(g*K-trunc(g*K))) where:
M is the prime table size
g is the golden ratio (0.61803399)
K is the key value such as the sum of the characters converted to an integer
Note: To find K you could simply:
1) sum up the ascii values of the characters that make up the key
2) ord(first)+ord(last)+symbol size
3) use mid square type of procedure
If b0=h(K), then let b0,b1,b2,...,bM-1 be a probe sequence where (M=tablesize)
Hash Table Search by Open Addressing is as follows:
Let T be some table with M entries which looks like: T[0],T[1],T[2],...,T[M-1]. We will assume without loss of generality that all keys inserted have positive values and empty entries are signified by a value of 0. We will search for the key K. The following algorithm will perform this search:
i := h(K) j := i while((T[i].key <> K) and (T[i].key <> 0)) do begin i := i - p(K) if(i < 0) then i := i + M if(j = i) then tablefull(i) end return iThe question is: How do we choose p(K)?
Problem: Using the hash function h(Kn)=n mod 11 show what the hash table would look like after inserting the following keys: M13,G7,Q17,Y25,R18. Just use the number following the letter as the key value.
What happens if you add Z26 and then F6?
We will define the following terms:
1) primary clustering - this implies that all keys that collide at address b will extend the cluster that contains b. In the linear probing example, we had key Q17 colliding with key F6. This is an example of primary clustering because the collision of F6 and Q17 extended the primary cluster.
2) secondary clustering - is when adjacent clusters join to form a composite cluster.
We would like the collision handling technique to avoid both primary and secondary clustering.
Let us consider the previous example where we inserted the keys M13, G7, Q17, Y25, R18, Z26, and F6. Given a new key K to be inserted into the hash table using h(Kn) = n mod 11, what is the chance of location 9 being filled with K? What is the chance of location 0 being filled with K?
In analyzing a given hash method and collision handling technique, it is good to compute the average number of probes necessary to find an arbitrary key K. To compute the average, the following formula is used:
avg = (summation of the # of probes to locate each key in the table) / # of keys in the table
Problem: For the previous hash method and linear probing, compute the average number of probes.