  
  [1X4 Hashing techniques[0X
  
  
  [1X4.1 The idea of hashing[0X
  
  If  one wants to store a certain set of similar objects and wants to quickly
  access  a  given  one (or come back with the result that it is unknown), the
  first  idea  would  be  to  store them in a list, possibly sorted for faster
  access.  This  however  still  would need log(n) comparisons to find a given
  element or to decide that it is not yet stored.
  
  Therefore  one  uses a much bigger array and uses a function on the space of
  possible  objects with integer values to decide, where in the array to store
  a  certain  object. If this so called hash function distributes the actually
  stored  objects  well  enough over the array, the access time is constant in
  average.  Of  course,  a hash function will usually not be injective, so one
  needs  a  strategy  what to do in case of so-called "collision", that is, if
  more than one object with the same hash value has to be stored.
  
  The  basic  functions  to  work  with  hash  tables are [2XNewHT[0m ([14X4.3-1[0m), [2XAddHT[0m
  ([14X4.3-2[0m), and [2XValueHT[0m ([14X4.3-3[0m). They are described in Section [14X4.3[0m. In the next
  section, we first describe the infrastructure for hash functions.
  
  
  [1X4.2 Hash functions[0X
  
  In  the  [5Xorb[0m  package  hash  functions  are chosen automatically by giving a
  sample  object together with the length of the hash table. This is done with
  the following operation:
  
  [1X4.2-1 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ___________________________________[0Xoperation
  [6XReturns:[0X  a record
  
  The first argument [3Xob[0m must be a sample object, that is, an object like those
  we  want to store in the hash table later on. The argument [3Xlen[0m is an integer
  that  gives  the  length  of  the hash table. Note that this might be called
  later  on  automatically,  when  a  hash  table  is  increased  in size. The
  operation  returns a record with two components. The component [10Xfunc[0m is a [5XGAP[0m
  function  taking  two  arguments,  see below. The component [10Xdata[0m is some [5XGAP[0m
  object.  Later  on, the hash function will be called with two arguments, the
  first  is  the object for which it should call the hash value and the second
  argument must be the data stored in the [10Xdata[0m component.
  
  The  hash  function  has  to return values between 1 and the hash length [3Xlen[0m
  inclusively.
  
  This setup is chosen such that the hash functions can be global objects that
  are  not  created  during  the execution of [2XChooseHashFunction[0m but still can
  change their behaviour depending on the data.
  
  In the following we just document, for which types of objects there are hash
  functions that can be found using [2XChooseHashFunction[0m.
  
  [1X4.2-2 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This  method is for compressed vectors over the field [10XGF(2)[0m of two elements.
  Note  that  there  is no hash function for non-compressed vectors over [10XGF(2)[0m
  because those objects cannot efficiently be recognised from their type.
  
  Note  that  you can only use the resulting hash functions for vectors of the
  same length.
  
  [1X4.2-3 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This  method  is  for  compressed vectors over a finite field with up to 256
  elements.  Note  that  there  is  no  hash  function for non-compressed such
  vectors  because  those  objects cannot efficiently be recognised from their
  type.
  
  Note  that  you can only use the resulting hash functions for vectors of the
  same length.
  
  [1X4.2-4 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This method is for compressed matrices over the field [10XGF(2)[0m of two elements.
  Note  that  there is no hash function for non-compressed matrices over [10XGF(2)[0m
  because those objects cannot efficiently be recognised from their type.
  
  Note  that you can only use the resulting hash functions for matrices of the
  same size.
  
  [1X4.2-5 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This  method  is  for compressed matrices over a finite field with up to 256
  elements.  Note  that  there  is  no  hash  function for non-compressed such
  vectors  because  those  objects cannot efficiently be recognised from their
  type.
  
  Note  that you can only use the resulting hash functions for matrices of the
  same size.
  
  [1X4.2-6 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This method is for integers.
  
  [1X4.2-7 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This method is for permutations.
  
  [1X4.2-8 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This method is for lists of integers.
  
  [1X4.2-9 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This method is for kernel Pc words.
  
  [1X4.2-10 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This method is for lists of integers.
  
  [1X4.2-11 ChooseHashFunction[0m
  
  [2X> ChooseHashFunction( [0X[3Xob, len[0X[2X ) ______________________________________[0Xmethod
  [6XReturns:[0X  a record
  
  This method is for lists of matrices.
  
  
  [1X4.3 Using hash tables[0X
  
  The following functions are needed to use hash tables. For details about the
  data structures see Section [14X4.4[0m.
  
  [1X4.3-1 NewHT[0m
  
  [2X> NewHT( [0X[3Xsample, len[0X[2X ) _____________________________________________[0Xfunction
  [6XReturns:[0X  a new hash table object
  
  A new hash table for objects like [3Xsample[0m of length [3Xlen[0m is created. Note that
  it  is  a  good  idea to choose a prime number as the hash length due to the
  algorithm for collision handling which works particularly well in that case.
  The  hash function is chosen automatically. The resulting object can be used
  with  the  functions  [2XAddHT[0m  ([14X4.3-2[0m) and [2XValueHT[0m ([14X4.3-3[0m). It will start with
  length [3Xlen[0m but will grow as necessary.
  
  [1X4.3-2 AddHT[0m
  
  [2X> AddHT( [0X[3Xht, ob, val[0X[2X ) _____________________________________________[0Xfunction
  [6XReturns:[0X  an integer or fail
  
  Stores  the  object  [3Xob[0m  into  the  hash  table  [3Xht[0m and stores the value [3Xval[0m
  together with [3Xob[0m. The result is [9Xfail[0m if an error occurred, which can only be
  that the hash table is already full. This can only happen, if the hash table
  cannot grow automatically.
  
  If  no  error  occurs,  the result is an integer indicating the place in the
  hash  table  where the object is stored. Note that once the hash table grows
  automatically this number is no longer the same!
  
  If  the  value  [3Xval[0m  is [9Xtrue[0m for all objects in the hash, no extra memory is
  used for the values. All other values are stored in the hash. The value [9Xfail[0m
  cannot be stored as it indicates that the object is not found in the hash.
  
  See  Section  [14X4.4[0m  for  details  on the data structures and especially about
  memory requirements.
  
  [1X4.3-3 ValueHT[0m
  
  [2X> ValueHT( [0X[3Xht, ob[0X[2X ) ________________________________________________[0Xfunction
  [6XReturns:[0X  the stored value, [9Xtrue[0m, or [9Xfail[0m
  
  Looks  up  the  object  [3Xob[0m in the hash table [3Xht[0m. If the object is not found,
  [9Xfail[0m  is  returned. Otherwise, the value stored with the object is returned.
  Note that if this value was [9Xtrue[0m no extra memory is used for this.
  
  The  following  function is only documented for the sake of completeness and
  for emergency situations, where [2XNewHT[0m ([14X4.3-1[0m) tries to be too intelligent.
  
  [1X4.3-4 InitHT[0m
  
  [2X> InitHT( [0X[3Xlen, hfun, eqfun[0X[2X ) _______________________________________[0Xfunction
  [6XReturns:[0X  a new hash table object
  
  This  is usually only an internal function. It is called from [2XNewHT[0m ([14X4.3-1[0m).
  The  argument [3Xlen[0m is the length of the hash table, [3Xhfun[0m is the hash function
  record  as  returned by [2XChooseHashFunction[0m ([14X4.2-1[0m) and [3Xeqfun[0m is a comparison
  function taking two arguments and returning [9Xtrue[0m or [9Xfalse[0m.
  
  Note  that  automatic  growing  is  switched on for the new hash table which
  means  that  if  the  hash  table grows, a new hash function is chosen using
  [2XChooseHashFunction[0m  ([14X4.2-1[0m).  If  you do not want this, change the component
  [10Xcangrow[0m to [9Xfalse[0m after creating the hash table.
  
  [1X4.3-5 GrowHT[0m
  
  [2X> GrowHT( [0X[3Xht, ob[0X[2X ) _________________________________________________[0Xfunction
  [6XReturns:[0X  nothing
  
  This  is  a more or less internal function. It is called when the space in a
  hash  table  becomes  scarce.  The  first  argument  [3Xht[0m must be a hash table
  object, the second a sample point. The function increases the hash size by a
  factor  of 2. This makes it necessary to choose a new hash function. Usually
  this  is  done  with  the  usual [10XChooseHashFunction[0m method. However, one can
  assign  the  two  components  [10Xhfbig[0m  and  [10Xhfdbig[0m  to a function and a record
  respectively.  In  that  case, upon growing the hash, a new hash function is
  created  by  taking  the  function [10Xhfbig[0m together with [10Xhfdbig[0m as second data
  argument  and reducing the resulting integer modulo the hash length. In this
  way  one  can  specify a hash function suitable for all hash sizes by simply
  producing big enough hash values.
  
  
  [1X4.4 The data structures for hash tables[0X
  
  A hash table object is just a record with the following components:
  
  [8X[10Xels[0m[8X[0m
        A  [5XGAP[0m  list  storing  the  elements. Its length can be as long as the
        component  [10Xlen[0m indicates but will only grow as necessary when elements
        are stored in the hash.
  
  [8X[10Xvals[0m[8X[0m
        A  [5XGAP[0m  list  storing  the  corresponding  values.  If a value is [9Xtrue[0m
        nothing is stored here to save memory.
  
  [8X[10Xlen[0m[8X[0m
        Length of the hash table.
  
  [8X[10Xnr[0m[8X[0m
        Number of elements stored in the hash table.
  
  [8X[10Xhf[0m[8X[0m
        The  hash function (value of the [10Xfunc[0m component in the record returned
        by [2XChooseHashFunction[0m ([14X4.2-1[0m)).
  
  [8X[10Xhfd[0m[8X[0m
        The  data  for  the second argument of the hash function (value of the
        [10Xdata[0m component in the record returned by [2XChooseHashFunction[0m ([14X4.2-1[0m)).
  
  [8X[10Xeqf[0m[8X[0m
        A  comparison  function  taking  two  arguments and returning [9Xtrue[0m for
        equality or [9Xfalse[0m otherwise.
  
  [8X[10Xcollisions[0m[8X[0m
        Number of collisions (see below).
  
  [8X[10Xaccesses[0m[8X[0m
        Number of lookup or store accesses to the hash.
  
  [8X[10Xcangrow[0m[8X[0m
        A  boolean value indicating whether the hash can grow automatically or
        not.
  
  [8X[10Xishash[0m[8X[0m
        Is [9Xtrue[0m to indicate that this is a hash table record.
  
  
  [1X4.4-1 Memory requirements[0X
  
  Due to the data structure defined above the hash table will need one machine
  word  (4 bytes on 32bit machines and 8 bytes on 64bit machines) per possible
  entry  in  the  hash  if all values corresponding to objects in the hash are
  [9Xtrue[0m and two machine words otherwise. This means that the memory requirement
  for  the hash itself is proportional to the hash table length and not to the
  number of objects actually stored in the hash!
  
  In addition one of course needs the memory to store the objects themselves.
  
  
  [1X4.4-2 Handling of collisions[0X
  
  If  two  or more objects have the same hash value, the following is done: If
  the  hash  value  is  coprime to the hash length, the hash value is taken as
  "the increment", otherwise 1 is taken. The code to find the proper place for
  an  object just repeatedly adds the increment to the current position modulo
  the hash length. Due to the choice of the increment this will eventually try
  all  places  in  the  hash  table. Every such increment step is counted as a
  collision  in  the  [10Xcollisions[0m  component  in the hash table. This algorithm
  explains why it is sensible to choose a prime number as the length of a hash
  table.
  
  
  [1X4.4-3 Efficiency[0X
  
  Hashing is efficient as long as there are not too many collisions. It is not
  a  problem if the number of collisions (counted in the [10Xcollisions[0m component)
  is smaller than the number of accesses (counted in the [10Xaccesses[0m component).
  
  A  high  number  of collisions can be caused by a bad hash function, because
  the  hash  table  is  too small (do not fill a hash table to more than about
  80%),  or because the objects to store are just not well enough distributed.
  Hash tables will grow automatically if too many collisions are detected.
  
