Class TPCHQuery10


  • public class TPCHQuery10
    extends Object
    This program implements a modified version of the TPC-H query 10. The original query can be found at http://www.tpc.org/tpch/spec/tpch2.16.0.pdf (page 45).

    This program implements the following SQL equivalent:

    
     SELECT
            c_custkey,
            c_name,
            c_address,
            n_name,
            c_acctbal
            SUM(l_extendedprice * (1 - l_discount)) AS revenue,
     FROM
            customer,
            orders,
            lineitem,
            nation
     WHERE
            c_custkey = o_custkey
            AND l_orderkey = o_orderkey
            AND YEAR(o_orderdate) > '1990'
            AND l_returnflag = 'R'
            AND c_nationkey = n_nationkey
     GROUP BY
            c_custkey,
            c_name,
            c_acctbal,
            n_name,
            c_address
     

    Compared to the original TPC-H query this version does not print c_phone and c_comment, only filters by years greater than 1990 instead of a period of 3 months, and does not sort the result by revenue.

    Input files are plain text CSV files using the pipe character ('|') as field separator as generated by the TPC-H data generator which is available at http://www.tpc.org/tpch/.

    Usage: TPCHQuery10 --customer <path> --orders <path> --lineitem<path> --nation <path> --output <path>

    This example shows how to use:

    • tuple data types
    • inline-defined functions
    • projection and join projection
    • built-in aggregation functions

    Note: All Flink DataSet APIs are deprecated since Flink 1.18 and will be removed in a future Flink major version. You can still build your application in DataSet, but you should move to either the DataStream and/or Table API. This class is retained for testing purposes.

    • Constructor Detail

      • TPCHQuery10

        public TPCHQuery10()