Class PageRank
- java.lang.Object
-
- org.apache.flink.examples.java.graph.PageRank
-
public class PageRank extends Object
A basic implementation of the Page Rank algorithm using a bulk iteration.This implementation requires a set of pages and a set of directed links as input and works as follows.
In each iteration, the rank of every page is evenly distributed to all pages it points to. Each page collects the partial ranks of all pages that point to it, sums them up, and applies a dampening factor to the sum. The result is the new rank of the page. A new iteration is started with the new ranks of all pages. This implementation terminates after a fixed number of iterations.
This is the Wikipedia entry for the Page Rank algorithm.Input files are plain text files and must be formatted as follows:
- Pages represented as an (long) ID separated by new-line characters.
For example"1\n2\n12\n42\n63"gives five pages with IDs 1, 2, 12, 42, and 63. - Links are represented as pairs of page IDs which are separated by space characters. Links
are separated by new-line characters.
For example"1 2\n2 12\n1 12\n42 63"gives four (directed) links (1)->(2), (2)->(12), (1)->(12), and (42)->(63).
For this simple implementation it is required that each page has at least one incoming and one outgoing link (a page can point to itself).
Usage:
PageRankBasic --pages <path> --links <path> --output <path> --numPages <n> --iterations <n>
If no parameters are provided, the program is run with default data fromPageRankDataand 10 iterations.This example shows how to use:
- Bulk Iterations
- Default Join
- Configure user-defined functions using constructor parameters.
Note: All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API. This class is retained for testing purposes.
- Pages represented as an (long) ID separated by new-line characters.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classPageRank.BuildOutgoingEdgeListA reduce function that takes a sequence of edges and builds the adjacency list for the vertex where the edges originate.static classPageRank.DampenerThe function that applies the page rank dampening formula.static classPageRank.EpsilonFilterFilter that filters vertices where the rank difference is below a threshold.static classPageRank.JoinVertexWithEdgesMatchJoin function that distributes a fraction of a vertex's rank to all neighbors.static classPageRank.RankAssignerA map function that assigns an initial rank to all pages.
-
Constructor Summary
Constructors Constructor Description PageRank()
-