Class TPCHQuery3


  • public class TPCHQuery3
    extends Object
    This program implements a modified version of the TPC-H query 3. The example demonstrates how to assign names to fields by extending the Tuple class. The original query can be found at http://www.tpc.org/tpch/spec/tpch2.16.0.pdf (page 29).

    This program implements the following SQL equivalent:

    
     SELECT
          l_orderkey,
          SUM(l_extendedprice*(1-l_discount)) AS revenue,
          o_orderdate,
          o_shippriority
     FROM customer,
          orders,
          lineitem
     WHERE
          c_mktsegment = '[SEGMENT]'
          AND c_custkey = o_custkey
          AND l_orderkey = o_orderkey
          AND o_orderdate < date '[DATE]'
          AND l_shipdate > date '[DATE]'
     GROUP BY
          l_orderkey,
          o_orderdate,
          o_shippriority;
     

    Compared to the original TPC-H query this version does not sort the result by revenue and orderdate.

    Input files are plain text CSV files using the pipe character ('|') as field separator as generated by the TPC-H data generator which is available at http://www.tpc.org/tpch/.

    Usage: TPCHQuery3 --lineitem<path> --customer <path> --orders<path> --output <path>

    This example shows how to use:

    • custom data type derived from tuple data types
    • inline-defined functions
    • build-in aggregation functions

    Note: All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API. This class is retained for testing purposes.

    • Constructor Detail

      • TPCHQuery3

        public TPCHQuery3()