CSCI 4370/6370 Database ManagementProject 3Due Date: 16th November 11:59 PMThis project builds on the first project. In the first project, you have developed your own implementation of a relation which uses List to store the table data. We will extend the Relation to improve the Query Execution time. You do this by using an efficient way to look up tuples based on the primary key. For this assignment, you are required to build index using primary key only. You are not required to construct indices for secondary keys. Improving Query ExecutionAs discussed in the class, Data Structures like B+ Trees, Linear Hash Tables, Double Hash Tables optimize the search process. You will build one such data structure for your Relation class. You can use either B+ Tree, B-Tree, Linear Hashing or Extendible Hashing. Note that you should implement the data structure on your own and using HashMap available in Java would not account for Linear Hashing/Extendible Hashing. Your data structure would be implemented as a class that either implements the java.util.Map interface or extends the java.util.AbstractMap interface, as shown
public class MyMap extends AbstractMap implements Map, Cloneable, Serializable In the Relation class, your code would be used in the following way. private final Map table; Since the 'table' member is of type Map, you can instantiate table either as MyMap or java.util.TreeMap or java.util.HashMap. Methods like table.get(key), table.put(key,attrValues) can be used to look up for keys to avoid inserting duplicates, faster retrieval of tuples, etc. The 'insert' and 'select' operators should be supported. They might need some modification so as to support the new data structure. 'delete' operator does not need to be supported. Serialization should still be supported and you should be able to store your new Relation objects to persistant storage. If your Serialization was done right in the first Project, then it would work as it is without making any new changes to the code. It is recommended but not required that you implement B+ Trees since they optimize range queries to great extent and optimizes point queries to some extent. Point queries are greatly optimized when Extendible Hashing is used. Performance EvaluationIn the first project, you compared your Relation performance against MySql database. You would do a similar kind of performance evaluation for this optimized-Relation. Instead of comparing with MySql database, you would conduct a study of performance evaluation of using various implementations of table.
Collect timing data (in milliseconds using System.currentTimeInMillis() vs. the number of tuples).
Please do your performance test and analysis using "point queries" of selection, for the following When done right, point queries on primary key should be executed faster compared to queries using non-key attributes. Here is an assumed performance data for a point query: (unit - seconds) Here is an assumed performance data for a point query: (unit: second)
You are also required to plot your results as a chart(using Excel) like you did in the first project. Other
points that are of interest Joining the mailing listFirstly, it is very important and mandatory that you join the Mailing List as I might post some useful information which can help your implementation and other updated instructions. From this project, any news about the project posted on the mailing list would be considered at the time of grading the project. You would not see a complete change of instructions but might see some changes on what to submit and how to submit. It has already been mentioned earlier that important information would be posted on mailing list and I see no reason on why one should not join the group. Look out for updated information on Term Project too. What to submit?
GradingStandard tests for compilation on Atlas, normal execution of the program would be performed. You would be graded on the implementation of your data structure (B-Tree or B+ Tree or Linear Hashing or Extendible Hashing), performance data/charts, your analysis (a short description of what you implemented) and of course the Relation class should be functional just like in the first project except that it is more optimized for point-queries now. Your Relation class would still be tested on various (reasonable) queries (point, range queries, etc) and should not break. You can use test.java to create a large database. Note that this is not the only class that would be used to test your code. So making this class work and failing in other cases would not be a good program.Course web page: http://cs.uga.edu/~krishna/DBF07/index.html TAs: Krishna Bhargava Vangapandu (bhargav@uga.edu), Kavitha Anandan (kavitha@uga.edu) |