Program 2: Document Spell Checker

Due Date and Submission Requirements

Background and Directions

In this program, You are going to use Java’s built in HashSet and HashMap classes to build a document checking application. Your program will be able to identify misspelled words, and print out some interesting information about the frequency of words that appear in the document.

Part I: Implementing a menu-like functionality

The user will be prompted with a menu of options. When the user enters a choice, the appropriate action will be executed.
If the user enters an invalid choice, the menu should be printed out, and should ask the user for their choice again.

Part II: Loading your HashMap and HashSet

You must use a Java HashMap to calculate the number of occurrences for each unique word in the input file input.txt. This input file is the document that the user want to be spellchecked. This input file may be several lines long. This method should go through each word in the input file, and calculate the number of occurrence that word appears in the document. The keys of your HashMap will be the word (String), and the value that is linked with the key will be its frequency. For example, this output would mean that the word “a” appears four times in the document. “and” appears once, “check” three times, and so on…
a: 4
and: 1
check: 3
code: 1

Capitalization and punctuation matter here, and you will need to do some String manipulation to handle this. “The”, “the”, “the,” and “tHe?” should all count towards the number of occurrences for the word “the”. The only punctuation you have to worry about are periods, commas, question marks, and exclaimation points. You can assume no other symbols or punctuation will appear in the input.


You will need a database of all known words in the dictionary. These words will come in from words.txt, which is another input file for your program. You must load these words into a Java HashSet. words.txt doesn't have every word in the english language, but it has most of the. You should not modify words.txt.

Part III: Spellchecking the document (Menu option #1)

With your loaded HashSet from part II, you will read in from input.txt again, and check each word to see if they appear in the HashSet or not. If they do not appear in the HashSet, then the word is considered to be misspelled, and needs to flagged in the output. Once again, capitalization and punctuation matter here, and you will likely need to do some similar String manipulation to ensure the spellchecking works. The original input should be printed out, but any misspelled words need to wrapped in angle brackets < >.

For example:

Original Input: Hai there, what is yuor nameee?

Output of program: <Hai> there, what is <yuor> <nameee>?

Part IV: Printing out word frequency alphabetically (Menu option #2)

With your loaded HashMap from part II, you will print out the contents of the HashMap sorted by the key values (alphabetically).

Part V: Printing out word frequency alphabetically (Menu option #3)

With your loaded HashMap from part II, you will print out the contents of the HashMap sorted by the word frequencies from greatest to least.

For example, in the sample input, "a" appears 6 times. "the" and "to" appear 5 times, and "is" appears 4 times. See the sample output for a better understanding.

6: a
5: the, to
4: is
3: spell, program, your, check
2: ...
1: ...


Hint: This one can be tricky. Consider creating a new HashMap. Instead of having a HashMap that computes the Count per Word (this is what you did in part II), write a HashMap that computes the Words per Count. So, the keys will be an Integer, and the value will be a HashSet of Strings that had that count.

Input files

Sample Output

This sample output shows the different menu options being executed. Your program does not need to match this exactly, but it should be similar.

Optional Hints

You can use the .replace() to remove certain characters from a String.

You may need to convert a HashSet or HashMap to something that can be sorted. You can create an ArrayList from an existing Set.

You can use the HashMap method keySet() to get a Set of just the keys

You may find the official HashMap and HashSet documentation to be useful.

Restrictions

You must use a HashSet to store the English words, and you must use a HashMap to keep track of word frequencies.

Grading (100 points)

Criteria Points
HashMap and HashSet are loaded correctly 20
(Menu option #1) Your program correctly identifies mispelled words, even in the cases of capitalization, and correctly prints them in the output 30
Punctuation and Capitalization is maintined correctly during spellchecking 5
Menu option #2 correctly prints out the frequency of words, sorted alphabetically, and handles any punctuation and capitalization 20
Menu #3 Correctly prints out the frequency of words, sorted by the number of occurances, and handles any punctuation and capitalization 20
Your program has a menu that repeatedly asks the user for their choice 5
NOTE: If your code does not compile, correctness cannot be verified, and you won’t receive any points for your code. Turn in code that compiles!





Program 2 solution