Program 3: Movie Actors Graph

Due Date and Submission Requirements

This assignment will take a good chunk of your time. This assignment will be miserable if you wait until the final day. Get started early and ask for help if you need it. This program is worth slightly more points than the other programs

Background and Directions

In this assignment, you will be creating a graph that captures relationships between actors across many movies. Consider actors Margot Robbie and Ryan Gosling. Both acted together in the Movie "Barbie". We can create a graph were the vertices are actors, and edges between actors represent they have acted together in a movie. So, we could construct a very basic graph using this information.

We could continue this process for many movies and actors, and create a graph that looks something like this.

Obviously, that graph is missing many actors and many edges. In reality, this graph would consist of millions of vertices and edges, therefore we will be working with a much smaller dataset.
With a graph like this. We can compute some interesting information about our actors and movies. After the graph is built, you will write some code that will compute interesting information about this graph and achieve similar functionality to The Oracle of Bacon

Part I: Creating the Graph

The first part is to create a Adjacency List (aka our Graph) from an input file that have movie and actor information. This will be an undirected graph, and the edges will all be weighted the same, but each edge will need to keep track of a Movie name. There are a few input files you can use, but I would recommend using actors.txt, which has information for about 30 actors and a handful of movies.
The input file will always follow the same format:

Actor|MovieName

You will parse this file (hint: you will need to use .split("\\|") to split on the "|" character) and create a HashMap<String, LinkedList<Edge>> , where the key is an actor, and the value is a linked list of Edge. Each edge object holds the two connecting actors, and the movie they acted together in.

Here is how one entry in your HashMap should look (see graph above as well). Your order might be different, and that is fine. It is important that for the adjacency list for Vertex1, that for all edges in the linked list, V1 is the first vertex in the edge.

Margot Robbie: [Margot Robbie - Ewan McGregor (Birds of Prey), Margot Robbie - Ryan Gosling (Barbie), Margot Robbie - Will Ferrell (Barbie), Margot Robbie - Brad Pitt (Once Upon a Time in Hollywood), Margot Robbie - Leonardo DiCaprio (Once Upon a Time in Hollywood)]

Margot Robbie - Ewan McGregor (Birds of Prey), Margot Robbie - Ryan Gosling (Barbie), Margot Robbie - Will Ferrell (Barbie), Margot Robbie - Brad Pitt (Once Upon a Time in Hollywood), Margot Robbie - Leonardo DiCaprio (Once Upon a Time in Hollywood)] are all Edge objects. Note that for each of those edges, Margot Robbie was the first vertex listed.

This is what my HashMap looked like before I wrote the toString() method in my Edge class.

This part can be tricky. I would recommend first making a HashMap that maps Strings (Movies) to a LinkedList of Strings (Actors that acted in that movie). This HashMap would look something like this (once again, the order might be different, and that is fine). Once you have that HashMap, you can traverse it and create Edge objects to add to your adjacency list.

By the end of part 1, you should have an adjacency list that looks similar to this output. Your program does not need to print this out when you are ready to submit.

Part II: Computing the Minimum Spanning Tree

Once you have you graph created, You will create menu-like functionality like you did for program 2 (see sample output). For menu option #1, you will compute the minimum spanning tree for the graph. You should use Kruskal's Algorithm to compute the MST of your graph. You should use the code from lecture, and apply it your program 3 code. You will need to make a few minor changes to make it work for your graph. The Connected Component Marker (CCM) array is no longer an array, it will need to be a HashMap. Additionally, our Kruskal's algorithm needed edge weights for the graph. You will need to add edge weights to your graph. You should assign an edge weight of 1 to every edge. Once you have the MST calculated, you should print out the edges of the MST.

Then, you should print all the unique movies from the edges in the MST. This represents the movies one would need to watch if they wanted to see all the actors in the dataset. This list of movies will not be the minimum number of movies to watch. Your list of movies might be slightly longer or shorter than the sample output, and that is fine. You might need to verify that your MST is correct, so it might be easier to test with the smaller input file first. Here is one example MST using the actors.txt file:



When you run menu option #1, it should look something like this

Part III: Computing a shortest path

For menu option #2, you will prompt the user for two actors. Your program should compute the shortest path between the two actors. It is possible this is not a direct connection, and might need to travel through several actors to get to the other actor. Then, you will print out the number of hops (ie number of edges in the path) in the shortest path.You must use Dijkstra's Algorithm to compute the shortest path, and you should be able to copy and paste the code we write in lecture. It is possible that there are several shortest paths with the same value, but you only need to print out one of them. You might need to verify your answer by looking at the graph pictured above.This method should function similar to the The Oracle of Bacon There will be some small changes you need to make to make the algorithm work for your graph. The distance and previous array will need to be a HashMap.

You can assume that the graph will always be connected, but it is possible the user could enter an actor that does not exist in your graph. In this case, there obviously will not be a path to that actor, so "No path exists" should be printed out.

Part IV: Finding longest path in MST

For menu option 3, you will compute the longest path in the minimum spanning tree. You will likely need to convert your MST HashSet to be an adjaceny list of HashMap>. You must use depth first search or breadth first search for this. You should use the aproach the was discussed during lecture on April 1st. It should print out the Actors of the longest path, and the number of hops for the longest path. Note that movies are not used at all for this part, so you can totally ignore them. Your longest path will depend on the MST that you created. It is possible for your longest to be shorter or longer than what is seen in the sample output. You should verify you answer and make sure it makes sense for your MST.

Coding Style and Readability

I want to ensure that you are writing quality, readable code, so 10% of your grade will come from code readability and coding style. These are the things that the TAs and I will be checking for:

Input files

Sample Output

Optional Hints

Grading (100 points)

When grading, the TAs will use actors.txt
Criteria Points
Your program has menu that loops and asks for user input 5
An undirected graph (adjacency list) is properly created from the input file 20
Menu option #1 computes a valid, optimal minimum spanning tree, and the unique movies of the MST 20
Menu option #2 computes a valid, optimal (shortest) path from one actor to another 20
Menu option #2 identifies that there is not a path if the user provides an actor that does not exists 5
Menu option #3 Finds the longest path in the MST correctly 20
Your code follows good style and readability practices 10


NOTE: If your code does not compile, correctness cannot be verified, and you won’t receive any points for your code. Turn in code that compiles!





Program 3 solution