Multidimensional Indexing Techniques

B.Sc Thesis Progress Report

Version Type Description Progress
--v1.0 Stable Kernel Mode version.Basics get done succesfully but it appears to exist further optimization for insertion. DONE
--v1.1 Testing Adding functionality to insertion.Checking lot of techniques
to achieve optimized insertion.
DONE
--v1.2 Testing Grading of optimization techniques and choosing the best of them when insertion slows down. DONE
--v1.3 Testing
  • Rebuild the internal Structure of nonleafnodes.
  • Implementing efficient way of insertion
  • Change structure of nonleafnodes and as a result the way gctree inserts objects.

The difference between the older versions and this one is based on the way entries are getting inserted in nonleafnodes.
Lets visualise the way entries are getting inserted till now.
  ---------   -----------          ---------    ---------
| Entry N | Entry N-1 |.....| Entry 2 | Entry 1 |
  ---------   -----------          ---------    ---------

And now we insert entryK:

  ---------   -----------          ---------    ---------
| Entry K | Entry N |.....| Entry 2 | Entry 1 |
  ---------   -----------          ---------    ---------

In previous versions the entries were inserted at the start of the nonleafnode which caused continuous shifts of the rest entries. Lets say now that we have to insert a new entry entryK
In this version lets visualise the nonleafnode before insertion of entryK

   -----------   -----------     -----------       ---------    ---------
|...EMPTY...| Entry N | Entry N-1 |.....| Entry 2 | Entry 1 |
   -----------   -----------     -----------       ---------    ---------

And now we insert entryK:

   -----------   ----------- -----------       ---------    ---------
|...EMPTY...| Entry K | Entry N |.....| Entry 2 | Entry 1 |
   -----------   ----------- -----------       ---------    ---------

In previous versions the insertion of entry K would cause the shift of all the entries in nonleafnode so that it was possible to insert entry in correct(first) position. This change in the internal structure caused a happy effect.In case that there is no place for inserting a new entry,we have to call SplitNonLeaf. If the previous entry pointed outlier node then we have to insert the entry in a nonleafnode and point the overflow page.This happens by making the old nonleafnode an overflow page and creating a new nonleafnode where we insert the Eo and the Ec and make the newly allocated nonleafnode point to the old nonleafnode. Using this strategy lets us have insertions only to real parent and not to overflow pages.In previous versions we had to read all overflow pages to see if there is space to insert the new entry but now we keep in memory the real parent (nonleafnode) and perhaps the overflow nonleafnode and check the situations we face during insertions more efficiently.
DONE
--v1.4 Testing
  • Efficient way of partioning entire domain
  • Changing Initial Insert making it have two modes of building the root (The one i already have implement and another one to make insertion easier)
  • Optimizing partition of a leafnode.When the partition cant find a cluster this means that in next insertion we will have to re-partition,by recalculating the new cells and compute in which cell each object lies.With the new strategy we are about to keep some history(lru),of the cells so that the next time we insert an object which would cause repartition will find the cells already calculated in memory. I will do this due to the observation that continuous repartition causes a big overhead during insertion..
DONE
--v2.0 Stable Last Bug Fixes and further functionality addition.
DONE
--v2.1 Testing Building GCtree Va-file and Linear techniques together
to make compare easier.
DONE
--v2.2 Testing Last optimizing method for insertion with concurrent minimize of disk usage. What will change here is the way we store the cells for each nonleafnode. Till now for each nonleafnode we compute full-length cell approximation. This means that if the parent has cell approximation 01010101 then one of his child will might have approximation 011011011011.This maximizes disk usage because we could have the opportunity store it in a costless way.Suppose again parent has cell approximation 01010101 and his child was supposed to have 011011011011,what we do now is to store only the last bit from the bpd for the child.This means that we will store the approximation 1111 and when we traverse the gctree we will take parent's cell approximation and add to it child's approximation to build the real approximation.This method will minimze the disk usage and also the time of insertion because till now if we wanted to check if two cells are the same we had to check full-length approximations but now we are about to check only the last bit from the bpd.We are able to do so because the comparisson between cells gets done only in the same partitioning level. DONE
--v3.0 Stable The combination of all possible tecnhiques under the building procedure make GCtree work really fast.
Insertion's bound and GCTree structure are both as optimized as possible. More optimizations and Dosumentations coming soon...
DONE
--v3.3 Stable Statistics of GCtree's usage can be inserted into mysql table in order to be exported and manipulated easily.Also the current version of GCTree can be declared as GCtree+ as the knn_search and its internal structure are totally different from the original GCTree in order to optimize its speed and bounds. I am not going to name it in a different way as far the philosophy of the main tasks remain the same. DONE


© Copyright 2010
Created by Psallidas Fotis (std06170[at]di[dot]uoa[dot]gr)
All Rights Reserved