4.0.0
io.confluent
kafka-connect-storage-common-parent
5.2.5
kafka-connect-hdfs
jar
kafka-connect-hdfs
Confluent, Inc.
http://confluent.io
http://confluent.io
A Kafka Connect HDFS connector for copying data between Kafka and Hadoop HDFS.
Confluent Community License
http://www.confluent.io/confluent-community-license
repo
scm:git:git://github.com/confluentinc/kafka-connect-hdfs.git
scm:git:git@github.com:confluentinc/kafka-connect-hdfs.git
https://github.com/confluentinc/kafka-connect-hdfs
HEAD
http://packages.confluent.io/maven/
2.0.0-M2
0.11.1
confluent
Confluent
${confluent.maven.repo}
org.apache.kafka
connect-api
provided
org.apache.kafka
connect-json
${kafka.version}
provided
io.confluent
kafka-connect-storage-common
${confluent.version}
io.confluent
kafka-connect-storage-core
${confluent.version}
io.confluent
kafka-connect-storage-format
${confluent.version}
io.confluent
kafka-connect-storage-partitioner
${confluent.version}
io.confluent
kafka-connect-storage-wal
${confluent.version}
com.github.spotbugs
spotbugs-annotations
com.fasterxml.jackson.core
jackson-databind
${jackson.databind.version}
com.fasterxml.jackson.core
jackson-core
${jackson.version}
org.apache.hadoop
hadoop-minicluster
${hadoop.version}
test
org.apache.hadoop
hadoop-minikdc
${hadoop.version}
test
org.apache.directory.jdbm
apacheds-jdbm1
org.apache.directory.jdbm
apacheds-jdbm1
${apacheds-jdbm1.version}
test
io.confluent
${kafka.connect.maven.plugin.version}
kafka-connect-maven-plugin
kafka-connect
Kafka Connect HDFS
https://docs.confluent.io/kafka-connect-hdfs/current/index.html
The HDFS connector allows you to export data from Kafka topics to HDFS files in a variety of formats and integrates with Hive to make data immediately available for querying with HiveQL.
The connector periodically polls data from Kafka and writes them to HDFS. The data from each Kafka topic is partitioned by the provided partitioner and divided into chunks. Each chunk of data is represented as an HDFS file with topic, Kafka partition, start and end offsets of this data chunk in the filename. If no partitioner is specified in the configuration, the default partitioner which preserves the Kafka partitioning is used. The size of each data chunk is determined by the number of records written to HDFS, the time written to HDFS and schema compatibility.
The HDFS connector integrates with Hive and when it is enabled, the connector automatically creates an external Hive partitioned table for each Kafka topic and updates the table according to the available data in HDFS.
Confluent, Inc.
Confluent supports the HDFS sink connector alongside community members as part of its Confluent Platform offering.
https://docs.confluent.io/current/
logos/confluent.png
confluentinc
organization
Confluent, Inc.
https://confluent.io/
logos/confluent.png
confluentinc
cp-kafka-connect
${project.version}
sink
hadoop
hdfs
hive
true
org.apache.maven.plugins
maven-compiler-plugin
-Xlint:all
-Xlint:-deprecation
-Werror
true
false
maven-assembly-plugin
src/assembly/development.xml
src/assembly/package.xml
false
make-assembly
package
single
org.apache.maven.plugins
maven-surefire-plugin
false
1
org.apache.maven.plugins
maven-checkstyle-plugin
validate
validate
checkstyle/suppressions.xml
check
maven-clean-plugin
3.0.0
.
derby.log
metastore_db/
src/main/resources
true
standalone
maven-assembly-plugin
src/assembly/standalone.xml