CSCI 4370/6370 Database ManagementProject 2Due Date: October 25 (8 am)In the first project you stored a large amount of random data into a mySql database, this time the data will be meaningful. You will use the CIA Fact Book as the source of information to store in the database. The CIA Fact Book contains a large amount of information of every country in the world. You are going to create a database using the data contained in the CIA Fact Book to populate the tables and perform queries on this data. Parsing HTML DataFirst you will have to extract the data for several countries contained on the CIA Fact Book. The way you extract this data from the HTML is up to you. You can use an existing program or build your own. If you choose to build your own parsing program you can download each countries HTML file to your hard drive and parse it from there. This method is recommended for those of you who have had little exposure to parsing programs or XML. If you want to use an existing program an example of a program you could use is Web-Harvest. It can be found on source forge. It parses an HTML page and transforms the results into text files or XML files. You are required to extract at least the following attributes:
United States, Canada, Mexico, Belize, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Suriname, Uruguay, Venezuela After the parsing is complete you should have a file, either text or XML containing the values of all the attributes mentioned above for each country you select that is ready to be inserted into a mySQL database. Entering Data into the DatabaseYou must have at least two tables in your database. For example: Countries and Borders. The data you parsed in the first part must be saved in some format so that it can be quickly inserted into the tables. Once all the data is entered in the database queries will be performed. Countries Table
The Queries
SubmissionYou will need to describe how you extracted the data from the web. If you used an existing application you will need to explain which one and any config files or settings you used. If you wrote your own you must submit the source code. You will also need to submit the raw parsed data. Turn in the mySQL instructions for all the queries along with the results of each. Email everything in .zip or .tar format to ugadbf07@gmail.com by 8 am of the due date.GradingYou will be graded on the data extraction, the correctness of the mySQL instructions, and correctness of the resultsCourse web page: http://cs.uga.edu/~krishna/DBF07/index.html TAs: Krishna Bhargava Vangapandu (bhargav@uga.edu), Kavitha Anandan (kavitha@uga.edu) |