Class WebLogAnalysis
- java.lang.Object
-
- org.apache.flink.examples.java.relational.WebLogAnalysis
-
public class WebLogAnalysis extends Object
This program processes web logs and relational data. It implements the following relational query:SELECT r.pageURL, r.pageRank, r.avgDuration FROM documents d JOIN rankings r ON d.url = r.url WHERE CONTAINS(d.text, [keywords]) AND r.rank > [rank] AND NOT EXISTS ( SELECT * FROM Visits v WHERE v.destUrl = d.url AND v.visitDate < [date] );Input files are plain text CSV files using the pipe character ('|') as field separator. The tables referenced in the query can be generated using the
WebLogDataGeneratorand have the following schemasCREATE TABLE Documents ( url VARCHAR(100) PRIMARY KEY, contents TEXT ); CREATE TABLE Rankings ( pageRank INT, pageURL VARCHAR(100) PRIMARY KEY, avgDuration INT ); CREATE TABLE Visits ( sourceIP VARCHAR(16), destURL VARCHAR(100), visitDate DATE, adRevenue FLOAT, userAgent VARCHAR(64), countryCode VARCHAR(3), languageCode VARCHAR(6), searchWord VARCHAR(32), duration INT );Usage:
WebLogAnalysis --documents <path> --ranks <path> --visits <path> --result <path>
If no parameters are provided, the program is run with default data fromWebLogData.This example shows how to use:
- tuple data types
- projection and join projection
- the CoGroup transformation for an anti-join
Note: All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API. This class is retained for testing purposes.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classWebLogAnalysis.AntiJoinVisitsCoGroupFunction that realizes an anti-join.static classWebLogAnalysis.FilterByRankMapFunction that filters for records where the rank exceeds a certain threshold.static classWebLogAnalysis.FilterDocByKeyWordsMapFunction that filters for documents that contain a certain set of keywords.static classWebLogAnalysis.FilterVisitsByDateMapFunction that filters for records of the visits relation where the year (from the date string) is equal to a certain value.
-
Constructor Summary
Constructors Constructor Description WebLogAnalysis()
-