Class Record

  • All Implemented Interfaces:
    Serializable, IOReadableWritable, CopyableValue<Record>, Value

    @Public
    public final class Record
    extends Object
    implements Value, CopyableValue<Record>
    The Record represents a multi-valued data record. The record is a tuple of arbitrary values. It implements a sparse tuple model, meaning that the record can contain many fields which are actually null and not represented in the record. It has internally a bitmap marking which fields are set and which are not.

    For efficient data exchange, a record that is read from any source holds its data in serialized binary form. Fields are deserialized lazily upon first access. Modified fields are cached and the modifications are incorporated into the binary representation upon the next serialization or any explicit call to the updateBinaryRepresenation() method.

    IMPORTANT NOTE: Records must be used as mutable objects and be reused across user function calls in order to achieve performance. The record is a heavy-weight object, designed to minimize calls to the individual fields' serialization and deserialization methods. It holds quite a bit of state consumes a comparably large amount of memory (> 200 bytes in a 64 bit JVM) due to several pointers and arrays.

    This class is NOT thread-safe!

    See Also:
    Serialized Form
    • Constructor Summary

      Constructors 
      Constructor Description
      Record()
      Required nullary constructor for instantiation by serialization logic.
      Record​(int numFields)
      Creates a new record, containing the given number of fields.
      Record​(Value value)
      Creates a new record containing only a single field, which is the given value.
      Record​(Value val1, Value val2)
      Creates a new record containing exactly two fields, which are the given values.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void addField​(Value value)  
      void clear()
      Clears the record.
      void concatenate​(Record record)  
      Record copy()
      Performs a deep copy of this object into a new instance.
      void copy​(DataInputView source, DataOutputView target)
      Copies the next serialized instance from source to target.
      void copyFrom​(Record source, int[] sourcePositions, int[] targetPositions)
      Bin-copies fields from a source record to this record.
      void copyTo​(Record target)
      Performs a deep copy of this object into the target instance.
      Record createCopy()
      Creates an exact copy of this record.
      void deserialize​(DataInputView source)  
      boolean equalsFields​(int[] positions, Value[] searchValues, Value[] deserializationHolders)
      Checks the values of this record and a given list of values at specified positions for equality.
      int getBinaryLength()
      Gets the length of the data type when it is serialized, in bytes.
      <T extends Value>
      T
      getField​(int fieldNum, Class<T> type)
      Gets the field at the given position from the record.
      <T extends Value>
      T
      getField​(int fieldNum, T target)
      Gets the field at the given position.
      boolean getFieldInto​(int fieldNum, Value target)
      Gets the field at the given position.
      boolean getFieldsInto​(int[] positions, Value[] targets)
      Gets the fields at the given positions into an array.
      void getFieldsIntoCheckingNull​(int[] positions, Value[] targets)
      Gets the fields at the given positions into an array.
      int getNumFields()
      Gets the number of fields currently in the record.
      boolean isNull​(int fieldNum)  
      void makeSpace​(int numFields)
      Reserves space for at least the given number of fields in the internal arrays.
      void read​(DataInputView in)
      Reads the object's internal data from the given data input view.
      void removeField​(int fieldNum)
      Removes the field at the given position.
      long serialize​(DataOutputView target)
      Writes this record to the given output view.
      void setField​(int fieldNum, Value value)
      Sets the field at the given position to the given value.
      void setNull​(int field)
      Sets the field at the given position to null.
      void setNull​(long mask)
      Sets the fields to null using the given bit mask.
      void setNull​(long[] mask)
      Sets the fields to null using the given bit mask.
      void setNumFields​(int numFields)
      Sets the number of fields in the record.
      void unionFields​(Record other)
      Unions the other record's fields with this records fields.
      void updateBinaryRepresenation()
      Updates the binary representation of the data, such that it reflects the state of the currently stored fields.
      void write​(DataOutputView out)
      Writes the object's internal data to the given data output view.
    • Constructor Detail

      • Record

        public Record()
        Required nullary constructor for instantiation by serialization logic.
      • Record

        public Record​(Value value)
        Creates a new record containing only a single field, which is the given value.
        Parameters:
        value - The value for the single field of the record.
      • Record

        public Record​(Value val1,
                      Value val2)
        Creates a new record containing exactly two fields, which are the given values.
        Parameters:
        val1 - The value for the first field.
        val2 - The value for the second field.
      • Record

        public Record​(int numFields)
        Creates a new record, containing the given number of fields. The fields are initially all nulls.
        Parameters:
        numFields - The number of fields for the record.
    • Method Detail

      • getNumFields

        public int getNumFields()
        Gets the number of fields currently in the record. This also includes null fields.
        Returns:
        The number of fields in the record.
      • setNumFields

        public void setNumFields​(int numFields)
        Sets the number of fields in the record. If the new number of fields is longer than the current number of fields, then null fields are appended. If the new number of fields is smaller than the current number of fields, then the last fields are truncated.
        Parameters:
        numFields - The new number of fields.
      • makeSpace

        public void makeSpace​(int numFields)
        Reserves space for at least the given number of fields in the internal arrays.
        Parameters:
        numFields - The number of fields to reserve space for.
      • getField

        public <T extends Value> T getField​(int fieldNum,
                                            Class<T> type)
        Gets the field at the given position from the record. This method checks internally, if this instance of the record has previously returned a value for this field. If so, it reuses the object, if not, it creates one from the supplied class.
        Type Parameters:
        T - The type of the field.
        Parameters:
        fieldNum - The logical position of the field.
        type - The type of the field as a class. This class is used to instantiate a value object, if none had previously been instantiated.
        Returns:
        The field at the given position, or null, if the field was null.
        Throws:
        IndexOutOfBoundsException - Thrown, if the field number is negative or larger or equal to the number of fields in this record.
      • getField

        public <T extends Value> T getField​(int fieldNum,
                                            T target)
        Gets the field at the given position. The method tries to deserialize the fields into the given target value. If the fields has been changed since the last (de)serialization, or is null, them the target value is left unchanged and the changed value (or null) is returned.

        In all cases, the returned value contains the correct data (or is correctly null).

        Parameters:
        fieldNum - The position of the field.
        target - The value to deserialize the field into.
        Returns:
        The value with the contents of the requested field, or null, if the field is null.
      • getFieldInto

        public boolean getFieldInto​(int fieldNum,
                                    Value target)
        Gets the field at the given position. If the field at that position is null, then this method leaves the target field unchanged and returns false.
        Parameters:
        fieldNum - The position of the field.
        target - The value to deserialize the field into.
        Returns:
        True, if the field was deserialized properly, false, if the field was null.
      • getFieldsInto

        public boolean getFieldsInto​(int[] positions,
                                     Value[] targets)
        Gets the fields at the given positions into an array. If at any position a field is null, then this method returns false. All fields that have been successfully read until the failing read are correctly contained in the record. All other fields are not set.
        Parameters:
        positions - The positions of the fields to get.
        targets - The values into which the content of the fields is put.
        Returns:
        True if all fields were successfully read, false if some read failed.
      • getFieldsIntoCheckingNull

        public void getFieldsIntoCheckingNull​(int[] positions,
                                              Value[] targets)
        Gets the fields at the given positions into an array. If at any position a field is null, then this method throws a @link NullKeyFieldException. All fields that have been successfully read until the failing read are correctly contained in the record. All other fields are not set.
        Parameters:
        positions - The positions of the fields to get.
        targets - The values into which the content of the fields is put.
        Throws:
        NullKeyFieldException - in case of a failing field read.
      • setField

        public void setField​(int fieldNum,
                             Value value)
        Sets the field at the given position to the given value. If the field position is larger or equal than the current number of fields in the record, than the record is expanded to host as many columns.

        The value is kept as a reference in the record until the binary representation is synchronized. Until that point, all modifications to the value's object will change the value inside the record.

        The binary representation is synchronized the latest when the record is emitted. It may be triggered manually at an earlier point, but it is generally not necessary and advisable. Because the synchronization triggers the serialization on all modified values, it may be an expensive operation.

        Parameters:
        fieldNum - The position of the field, starting at zero.
        value - The new value.
      • addField

        public void addField​(Value value)
        Parameters:
        value -
      • removeField

        public void removeField​(int fieldNum)
        Removes the field at the given position.

        This method should be used carefully. Be aware that as the field is actually removed from the record, the total number of fields is modified, and all fields to the right of the field removed shift one position to the left.

        Parameters:
        fieldNum - The position of the field to be removed, starting at zero.
        Throws:
        IndexOutOfBoundsException - Thrown, when the position is not between 0 (inclusive) and the number of fields (exclusive).
      • isNull

        public final boolean isNull​(int fieldNum)
      • setNull

        public void setNull​(int field)
        Sets the field at the given position to null.
        Parameters:
        field - The field index.
        Throws:
        IndexOutOfBoundsException - Thrown, when the position is not between 0 (inclusive) and the number of fields (exclusive).
      • setNull

        public void setNull​(long mask)
        Sets the fields to null using the given bit mask. The bits correspond to the individual columns: (1 == nullify, 0 == keep).
        Parameters:
        mask - Bit mask, where the i-th least significant bit represents the i-th field in the record.
      • setNull

        public void setNull​(long[] mask)
        Sets the fields to null using the given bit mask. The bits correspond to the individual columns: (1 == nullify, 0 == keep).
        Parameters:
        mask - Bit mask, where the i-th least significant bit in the n-th bit mask represents the (n*64) + i-th field in the record.
      • clear

        public void clear()
        Clears the record. After this operation, the record will have zero fields.
      • concatenate

        public void concatenate​(Record record)
      • unionFields

        public void unionFields​(Record other)
        Unions the other record's fields with this records fields. After the method invocation with record B as the parameter, this record A will contain at field i:
        • Field i from record A, if that field is within record A 's number of fields and is not null.
        • Field i from record B, if that field is within record B 's number of fields.
        It is not necessary that both records have the same number of fields. This record will have the number of fields of the larger of the two records. Naturally, if both A and B have field i set to null, this record will have null at that position.
        Parameters:
        other - The records whose fields to union with this record's fields.
      • copyTo

        public void copyTo​(Record target)
        Description copied from interface: CopyableValue
        Performs a deep copy of this object into the target instance.
        Specified by:
        copyTo in interface CopyableValue<Record>
        Parameters:
        target -
      • getBinaryLength

        public int getBinaryLength()
        Description copied from interface: CopyableValue
        Gets the length of the data type when it is serialized, in bytes.
        Specified by:
        getBinaryLength in interface CopyableValue<Record>
        Returns:
        The length of the data type, or -1, if variable length.
      • copy

        public Record copy()
        Description copied from interface: CopyableValue
        Performs a deep copy of this object into a new instance.

        This method is useful for generic user-defined functions to clone a CopyableValue when storing multiple objects. With object reuse a deep copy must be created and type erasure prevents calling new.

        Specified by:
        copy in interface CopyableValue<Record>
        Returns:
        New object with copied fields.
      • copy

        public void copy​(DataInputView source,
                         DataOutputView target)
                  throws IOException
        Description copied from interface: CopyableValue
        Copies the next serialized instance from source to target.

        This method is equivalent to calling IOReadableWritable.read(DataInputView) followed by IOReadableWritable.write(DataOutputView) but does not require intermediate deserialization.

        Specified by:
        copy in interface CopyableValue<Record>
        Parameters:
        source - Data source for serialized instance.
        target - Data target for serialized instance.
        Throws:
        IOException
        See Also:
        IOReadableWritable
      • createCopy

        public Record createCopy()
        Creates an exact copy of this record.
        Returns:
        An exact copy of this record.
      • copyFrom

        public void copyFrom​(Record source,
                             int[] sourcePositions,
                             int[] targetPositions)
        Bin-copies fields from a source record to this record. The following caveats apply:

        If the source field is in a modified state, no binary representation will exist yet. In that case, this method is equivalent to setField(..., source.getField(..., <class>)). In particular, if setValue is called on the source field Value instance, that change will propagate to this record.

        If the source field has already been serialized, then the binary representation will be copied. Further modifications to the source field will not be observable via this record, but attempting to read the field from this record will cause it to be deserialized.

        Finally, bin-copying a source field requires calling updateBinaryRepresentation on this instance in order to reserve space in the binaryData array. If none of the source fields are actually bin-copied, then updateBinaryRepresentation won't be called.

        Parameters:
        source -
        sourcePositions -
        targetPositions -
      • equalsFields

        public final boolean equalsFields​(int[] positions,
                                          Value[] searchValues,
                                          Value[] deserializationHolders)
        Checks the values of this record and a given list of values at specified positions for equality. The values of this record are deserialized and compared against the corresponding search value. The position specify which values are compared. The method returns true if the values on all positions are equal and false otherwise.
        Parameters:
        positions - The positions of the values to check for equality.
        searchValues - The values against which the values of this record are compared.
        deserializationHolders - An array to hold the deserialized values of this record.
        Returns:
        True if all the values on all positions are equal, false otherwise.
      • updateBinaryRepresenation

        public void updateBinaryRepresenation()
        Updates the binary representation of the data, such that it reflects the state of the currently stored fields. If the binary representation is already up to date, nothing happens. Otherwise, this function triggers the modified fields to serialize themselves into the records buffer and afterwards updates the offset table.
      • read

        public void read​(DataInputView in)
                  throws IOException
        Description copied from interface: IOReadableWritable
        Reads the object's internal data from the given data input view.
        Specified by:
        read in interface IOReadableWritable
        Parameters:
        in - the input view to read the data from
        Throws:
        IOException - thrown if any error occurs while reading from the input stream