Class ConnectedComponents
- java.lang.Object
-
- org.apache.flink.examples.java.graph.ConnectedComponents
-
public class ConnectedComponents extends Object
An implementation of the connected components algorithm, using a delta iteration.Initially, the algorithm assigns each vertex an unique ID. In each step, a vertex picks the minimum of its own ID and its neighbors' IDs, as its new ID and tells its neighbors about its new ID. After the algorithm has completed, all vertices in the same component will have the same ID.
A vertex whose component ID did not change needs not propagate its information in the next step. Because of that, the algorithm is easily expressible via a delta iteration. We here model the solution set as the vertices with their current component ids, and the workset as the changed vertices. Because we see all vertices initially as changed, the initial workset and the initial solution set are identical. Also, the delta to the solution set is consequently also the next workset.
Input files are plain text files and must be formatted as follows:
- Vertices represented as IDs and separated by new-line characters.
For example"1\n2\n12\n42\n63"gives five vertices (1), (2), (12), (42), and (63). - Edges are represented as pairs for vertex IDs which are separated by space characters.
Edges are separated by new-line characters.
For example"1 2\n2 12\n1 12\n42 63"gives four (undirected) edges (1)-(2), (2)-(12), (1)-(12), and (42)-(63).
Usage:
ConnectedComponents --vertices <path> --edges <path> --output <path> --iterations <n>
If no parameters are provided, the program is run with default data fromConnectedComponentsDataand 10 iterations.This example shows how to use:
- Delta Iterations
- Generic-typed Functions
Note: All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API. This class is retained for testing purposes.
- Vertices represented as IDs and separated by new-line characters.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classConnectedComponents.ComponentIdFilterEmit the candidate (Vertex-ID, Component-ID) pair if and only if the candidate component ID is less than the vertex's current component ID.static classConnectedComponents.DuplicateValue<T>Function that turns a value into a 2-tuple where both fields are that value.static classConnectedComponents.NeighborWithComponentIDJoinUDF that joins a (Vertex-ID, Component-ID) pair that represents the current component that a vertex is associated with, with a (Source-Vertex-ID, Target-VertexID) edge.static classConnectedComponents.UndirectEdgeUndirected edges by emitting for each input edge the input edges itself and an inverted version.
-
Constructor Summary
Constructors Constructor Description ConnectedComponents()
-