Krishna Bhargava VangapanduB.E(I.T)
Department Of Computer Science
University Of Georgia

CSCI 4370/6370 Database Management

Project 3

Due Date: 16th November 11:59 PM

This project builds on the first project. In the first project, you have developed your own implementation of a relation which uses List to store the table data. We will extend the Relation to improve the Query Execution time. You do this by using an efficient way to look up tuples based on the primary key. For this assignment, you are required to build index using primary key only. You are not required to construct indices for secondary keys.

Improving Query Execution

As discussed in the class, Data Structures like B+ Trees, Linear Hash Tables, Double Hash Tables optimize the search process. You will build one such data structure for your Relation class. You can use either B+ Tree, B-Tree, Linear Hashing or Extendible Hashing. Note that you should implement the data structure on your own and using HashMap available in Java would not account for Linear Hashing/Extendible Hashing. Your data structure would be implemented as a class that either implements the java.util.Map interface or extends the java.util.AbstractMap interface, as shown

public class MyMap extends AbstractMap implements Map, Cloneable, Serializable
{
//your implementation goes here
}

In the Relation class, your code would be used in the following way.

private final Map table;

Since the 'table' member is of type Map, you can instantiate table either as MyMap or java.util.TreeMap or java.util.HashMap. Methods like table.get(key), table.put(key,attrValues) can be used to look up for keys to avoid inserting duplicates, faster retrieval of tuples, etc.

The 'insert' and 'select' operators should be supported. They might need some modification so as to support the new data structure. 'delete' operator does not need to be supported. Serialization should still be supported and you should be able to store your new Relation objects to persistant storage. If your Serialization was done right in the first Project, then it would work as it is without making any new changes to the code.

It is recommended but not required that you implement B+ Trees since they optimize range queries to great extent and optimizes point queries to some extent. Point queries are greatly optimized when Extendible Hashing is used.

Performance Evaluation

In the first project, you compared your Relation performance against MySql database. You would do a similar kind of performance evaluation for this optimized-Relation. Instead of comparing with MySql database, you would conduct a study of performance evaluation of using various implementations of table.

  1. MyMap (your own implementation of B-Tree/B+Tree/Linear Hashing/Extendible Hashing)
  2. List (implemented in Project 1)
  3. java.util.TreeMap
  4. java.util.HashMap

Collect timing data (in milliseconds using System.currentTimeInMillis() vs. the number of tuples). Please do your performance test and analysis using "point queries" of selection, for the following
1. Point Queries using Primary Key 2. Point Queries on non-key attributes

When done right, point queries on primary key should be executed faster compared to queries using non-key attributes. Here is an assumed performance data for a point query: (unit - seconds)

Here is an assumed performance data for a point query: (unit: second)

#tuples MyMap TreeMap HashMap
500 0.1 0.1 0.11
1000 0.2 0.21 0.2
2000 0.71 0.39 0.42
3000 1.23 0.62 0.64
4000 3.1 0.96 0.94

You are also required to plot your results as a chart(using Excel) like you did in the first project. Other points that are of interest
You can assume that a primary key will consist of only one attribute.

For both TreeMap and HashMap the line in your Relation.java file should be:
private final Map table;

The file that uses TreeMap, would define table as a TreeMap with lines like:
table = new TreeMap();

The file that uses HashMap, would define table as a HashMap with lines like:
table = new HashMap();

Joining the mailing list

Firstly, it is very important and mandatory that you join the Mailing List as I might post some useful information which can help your implementation and other updated instructions. From this project, any news about the project posted on the mailing list would be considered at the time of grading the project. You would not see a complete change of instructions but might see some changes on what to submit and how to submit. It has already been mentioned earlier that important information would be posted on mailing list and I see no reason on why one should not join the group. Look out for updated information on Term Project too.

What to submit?
  1. Pack all the materials into a Zip file and email it to ugadbf07@gmail.com with subject as "Project 3"
  2. Your submission should include four versions of Relation class, one for each implementation of table - List(same as project 1),MyMap(your implementation of table),HashMap,TreeMap. Also include test cases (main()) for each version.
  3. Performance data in an xls sheet. Charts plotting that data. Compare both point and range queries involving key and non-key attributes.
  4. A summary of your implementation. Especially describe the algorithm that you have implemented. The description can be simple and be explained with example.
  5. As usual a READ ME with instructions to the TA and any assumptions that you made. Your assumptions should not override given Project requirements. Also specify what data structure you have implemented. You should implement one of Linear Hashing, Extendible Hashing, B-Tree or B+ Tree
  6. Please do not expect me to fetch your Project 1 Relation class. You should include four versions of Relation class
Grading
Standard tests for compilation on Atlas, normal execution of the program would be performed. You would be graded on the implementation of your data structure (B-Tree or B+ Tree or Linear Hashing or Extendible Hashing), performance data/charts, your analysis (a short description of what you implemented) and of course the Relation class should be functional just like in the first project except that it is more optimized for point-queries now. Your Relation class would still be tested on various (reasonable) queries (point, range queries, etc) and should not break. You can use test.java to create a large database. Note that this is not the only class that would be used to test your code. So making this class work and failing in other cases would not be a good program.

Course web page: http://cs.uga.edu/~krishna/DBF07/index.html
Project web page: http://cs.uga.edu/~krishna/DBF07/p3.html
Join the mailing list: http://groups.google.com/group/ugadbf07?hl=en


TAs: Krishna Bhargava Vangapandu (bhargav@uga.edu), Kavitha Anandan (kavitha@uga.edu)
Krishna Bhargava Vangapandu© 2007 University of Georgia