Krishna Bhargava VangapanduB.E(I.T)
Department Of Computer Science
University Of Georgia

CSCI 4370/6370 Database Management

Project 2

Due Date: October 25 (8 am)

In the first project you stored a large amount of random data into a mySql database, this time the data will be meaningful. You will use the CIA Fact Book as the source of information to store in the database. The CIA Fact Book contains a large amount of information of every country in the world. You are going to create a database using the data contained in the CIA Fact Book to populate the tables and perform queries on this data.

Parsing HTML Data
First you will have to extract the data for several countries contained on the CIA Fact Book. The way you extract this data from the HTML is up to you. You can use an existing program or build your own. If you choose to build your own parsing program you can download each countries HTML file to your hard drive and parse it from there. This method is recommended for those of you who have had little exposure to parsing programs or XML. If you want to use an existing program an example of a program you could use is Web-Harvest. It can be found on source forge. It parses an HTML page and transforms the results into text files or XML files. You are required to extract at least the following attributes:
  • The name of the country
  • the total area
  • the countries that border it
  • the lowest elevation
  • the highest elevation
  • the population
  • the birth rate
  • the death rate
  • the total population's life expectancy
  • literacy rate of the total population
  • GDP (official exchange rate)
  • unemployment rate
From the following countries:
United States, Canada, Mexico, Belize, Costa Rica, El Salvador, Guatemala, Honduras, Nicaragua, Panama, Argentina, Bolivia, Brazil, Chile, Colombia, Ecuador, Guyana, Paraguay, Peru, Suriname, Uruguay, Venezuela After the parsing is complete you should have a file, either text or XML containing the values of all the attributes mentioned above for each country you select that is ready to be inserted into a mySQL database.
Entering Data into the Database
You must have at least two tables in your database. For example: Countries and Borders. The data you parsed in the first part must be saved in some format so that it can be quickly inserted into the tables. Once all the data is entered in the database queries will be performed.

Countries Table

CountryName TotalArea LowestElevation HighestElevation -----
United States --- --- --- ---

Borders Table

CounrtyName BorderCountry
United States Mexico
--- ----

The Queries
  • Find the country with the greatest difference in elevation.
  • Find all countries with more than three border countries that have a GDP higher than $2.055 trillion.
  • Find the country with the highest life expectancy.
  • Find the country with the highest life expectancy in relation to its neighbors. (i.e. Country A's Life Expectancy / Lowest Life Expectancy Of its Neighbors)
  • Find any countries with an area greater than 75,000 sq km and with a population less than 4,000,000.
  • Formulate 3 of your own queries that you think will produce an interesting relationship between countries.
Submission
You will need to describe how you extracted the data from the web. If you used an existing application you will need to explain which one and any config files or settings you used. If you wrote your own you must submit the source code. You will also need to submit the raw parsed data. Turn in the mySQL instructions for all the queries along with the results of each. Email everything in .zip or .tar format to ugadbf07@gmail.com by 8 am of the due date.
Grading
You will be graded on the data extraction, the correctness of the mySQL instructions, and correctness of the results

Course web page: http://cs.uga.edu/~krishna/DBF07/index.html


TAs: Krishna Bhargava Vangapandu (bhargav@uga.edu), Kavitha Anandan (kavitha@uga.edu)
Krishna Bhargava Vangapandu© 2007 University of Georgia