Program 2: Machine Learning

Due Date

This assignment is due at the beginning of the lecture on Thursday, March 11th.

Partners

You are required to work with one other person on this assignment. Please submit just one solution with both of your names on it.

Purpose

The purpose of this assignment is to introduce you to the Naive Bayes method, the k-nearest neighbors algorithm, and decision stumps augmented with AdaBoost.

Data Set

For this assignment, we will be using the automobile database. The .names file describes the data and the .data file provides the data.

Learning Techniques To Implement

• Naive Bayes Method (p. 718). You will need to decide how to deal with continuous attributes. Conduct an experiment to find out how accurate this method is.
• k-Nearest Neighbors with k = 5 (p. 773). You will need to decide how to deal with continuous attributes. Conduct an experiment to find out how accurate this method is.
• Decision Stumps (p. 666) with AdaBoost (p. 667). Conduct experiments with M = 1, 5, 10, and 20 to find out how accurate this method is. You must use one decision stump for each attribute. Stumps for discrete attributes must have one branch for each value. Stumps for continuous attributes must have two branches: one branch if the value is within a particular range [X..Y] and one branch for all other values.

Report

Write a professional report that includes the following sections:

1. A description of your k-nearest neighbors algorithm and a report on its effectiveness. Use graphs and tables where appropriate.
2. A description of your Naive Bayes method and a report on its effectiveness. Use graphs and tables where appropriate.
3. A description of your decision stump (augmented with AdaBoost) algorithm and a report on its effectiveness. Use graphs and tables where appropriate.

General Requirements

• You may use any programming language you like for this assignment.
• Use 10-fold cross-validation (see page 663) for all of your experiments.
• Be sure to explain your experiments carefully: Describe the experiment. Display the results in a manner that is meaningful (graphs, tables, etc.). Interpret the results.
• Design, conduct and report on two meaningful experiments in addition to the ones that are required.
• In the report, be sure to emphasize any non-standard choices you made (for example, how are you dealing with continuous values in the Naive Bayes Method?).

What to Submit

1. A printout of the source code that you produce.
2. A printout of your program running in a representative fashion.
3. The report.