Class AbstractOrcFileInputFormat.OrcVectorizedReader<T,​BatchT>

  • Type Parameters:
    T - The type of the records returned by the reader.
    All Implemented Interfaces:
    Closeable, AutoCloseable, org.apache.flink.connector.file.src.reader.BulkFormat.Reader<T>
    Enclosing class:
    AbstractOrcFileInputFormat<T,​BatchT,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>

    protected static final class AbstractOrcFileInputFormat.OrcVectorizedReader<T,​BatchT>
    extends Object
    implements org.apache.flink.connector.file.src.reader.BulkFormat.Reader<T>
    A vectorized ORC reader. This reader reads an ORC AbstractOrcFileInputFormat.OrcVectorizedReader at a time and converts it to one or more records to be returned. An ORC Row-wise reader would convert the batch into a set of rows, while a reader for a vectorized query processor might return the whole batch as one record.

    The conversion of the VectorizedRowBatch happens in the specific AbstractOrcFileInputFormat.OrcReaderBatch implementation.

    The reader tracks its current position using ORC's row numbers. Each record in a batch is addressed by the starting row number of the batch, plus the number of records to be skipped before.

    • Method Detail

      • readBatch

        @Nullable
        public org.apache.flink.connector.file.src.reader.BulkFormat.RecordIterator<T> readBatch()
                                                                                          throws IOException
        Specified by:
        readBatch in interface org.apache.flink.connector.file.src.reader.BulkFormat.Reader<T>
        Throws:
        IOException
      • seek

        public void seek​(org.apache.flink.connector.file.src.util.CheckpointedPosition position)
                  throws IOException
        The argument of RecordReader.seekToRow(long) must come from RecordReader.getRowNumber(). The internal implementation of ORC is very confusing. It has special behavior when dealing with Predicate.
        Throws:
        IOException