Top 10 Data Mining Algorithms
4. Apriori Algorithm
- Find frequents items from a transaction dataset and derive
association rules.
- Figure 3 shows the Apriori Algorithm
- Example
- Suffers from a number of inefficiencies.
- Improvements have been made. To learn more, take the data mining
course (CS 530) next fall.
5. EM (Expectation Maximization) Algorithm
6. PageRank
- Published by Sergey Bring and Larry Page, the founders of Google, in 1998.
- PageRank is a search ranking algorithm using hyperlinks on the Web. It
is the basis for the Google search engine.
- It is a static method that can be computed offline.
- A hyperlink from page x to page y casts a vote. The value of
the vote is based on the "importance" of page x.
- Terminology: in-links of page i, out-links of page i.
- Equation 12 shows the general idea.
- Equation 17 shows the original idea in its full form. d is
the damping factor and was set to 0.85
- Figure 4 shows the power iteration method used to compute
PageRank values.
7. AdaBoost
- Introduced by Freund and Shapire in 1995. They received the
2003 Gödel prize.
- Ensemble learning method.
- Solid theoretical foundation.
- Accurate.
- Simple.
- Figure 5 shows the AdaBoost algorithm for a 2 class concept.
- AdaBoost example
- AdaBoost has been extended to multi-class learning problems.
- A weak learning algorithm that performs just slightly better
than random guessing can be boosted into an arbitrarily
strong learning algorithm.