Featured image of post Getting Started With Apache Kafka

Getting Started With Apache Kafka

Learn how to get started with Apache Kafka, including installation, configuration, and basic operations. This guide is perfect for beginners looking to dive into the world of real-time data streaming.

Apache Kafka is a powerful, distributed event streaming platform capable of handling trillions of events a day. Originally developed by LinkedIn and open-sourced in early 2011, Kafka has evolved into a central backbone for many modern data architectures. In this guide, we will walk you through everything you need to get started with Apache Kafka, from understanding its architecture to setting it up and performing basic operations.

Introduction to Apache Kafka

Apache Kafka is designed to handle real-time data feeds. It works as a high-throughput, low-latency platform for handling data streams. Kafka is often used for building real-time streaming data pipelines and applications that adapt to the data stream. Some common use cases include log aggregation, real-time analytics, and stream processing.

Key Concepts and Terminology

Before diving into the setup and operations, it’s essential to understand some key concepts and terminology in Kafka:

  • Producer: An application that sends messages to a Kafka topic.
  • Consumer: An application that reads messages from a Kafka topic.
  • Topic: A category or feed name to which messages are sent by producers.
  • Broker: A Kafka server that stores and serves Kafka topics.
  • Partition: A division of a topic for scalability and parallel processing.
  • Offset: A unique identifier for each message within a partition.

Setting Up Apache Kafka

Setting up Apache Kafka involves several steps, including downloading the necessary software, configuring it, and starting the services. In this section, we’ll provide a detailed walkthrough to ensure you can get your Kafka environment up and running smoothly.

Prerequisites

Before you start setting up Kafka, make sure your system meets the following prerequisites:

  1. Java Development Kit (JDK): Kafka requires Java 8 or later. You can check your Java version with the following command:

    1
    
    java -version
    

    If Java is not installed, you can download and install it from the Oracle website or use a package manager like apt for Debian-based systems or brew for macOS:

    1
    2
    3
    4
    5
    6
    
    # For Debian-based systems
    sudo apt update
    sudo apt install openjdk-11-jdk
    
    # For macOS
    brew install openjdk@11
    
  2. Apache ZooKeeper: Kafka uses ZooKeeper to manage distributed configurations and synchronization. ZooKeeper is bundled with Kafka, so you don’t need to install it separately.

Download and Install Kafka

  1. Download Kafka: Visit the official Apache Kafka download page and download the latest version of Kafka. As of writing, Kafka 2.8.0 is the latest stable release.

    1
    
    wget https://downloads.apache.org/kafka/2.8.0/kafka_2.13-2.8.0.tgz
    
  2. Extract the Downloaded File: Extract the tar file to a directory of your choice.

    1
    2
    
    tar -xzf kafka_2.13-2.8.0.tgz
    cd kafka_2.13-2.8.0
    
  3. Start ZooKeeper: Kafka requires ZooKeeper to run. Start the ZooKeeper service using the provided configuration file.

    1
    
    bin/zookeeper-server-start.sh config/zookeeper.properties
    

    ZooKeeper should start on the default port 2181. You should see log messages indicating that ZooKeeper is up and running.

  4. Start Kafka Broker: Open a new terminal window and start the Kafka broker using the provided configuration file.

    1
    
    bin/kafka-server-start.sh config/server.properties
    

    Kafka should start on the default port 9092. You should see log messages indicating that the Kafka broker is up and running.

Kafka Configuration

While the default configurations are suitable for development and testing, you may need to customize the settings for a production environment. Some key configuration files include:

  • server.properties: This file contains configurations for the Kafka broker, such as broker ID, log directory, and listeners.
  • zookeeper.properties: This file contains configurations for ZooKeeper, such as data directory and client port.

You can edit these configuration files to suit your needs. For example, to change the log directory, you can edit the log.dirs property in the server.properties file:

1
log.dirs=/path/to/your/kafka-logs

Creating Systemd Service Files

For ease of management, especially on Linux servers, you can create systemd service files for ZooKeeper and Kafka. This allows you to start, stop, and restart these services using systemctl.

  1. ZooKeeper Service File: Create a file named zookeeper.service in the /etc/systemd/system/ directory:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
    [Unit]
    Description=Apache ZooKeeper
    After=network.target
    
    [Service]
    Type=simple
    ExecStart=/path/to/kafka/bin/zookeeper-server-start.sh /path/to/kafka/config/zookeeper.properties
    ExecStop=/path/to/kafka/bin/zookeeper-server-stop.sh
    Restart=on-abnormal
    
    [Install]
    WantedBy=multi-user.target
    
  2. Kafka Service File: Create a file named kafka.service in the /etc/systemd/system/ directory:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    
    [Unit]
    Description=Apache Kafka
    After=zookeeper.service
    
    [Service]
    Type=simple
    ExecStart=/path/to/kafka/bin/kafka-server-start.sh /path/to/kafka/config/server.properties
    ExecStop=/path/to/kafka/bin/kafka-server-stop.sh
    Restart=on-abnormal
    
    [Install]
    WantedBy=multi-user.target
    
  3. Enable and Start Services: Enable and start the services using systemctl:

    1
    2
    3
    4
    5
    
    sudo systemctl enable zookeeper
    sudo systemctl start zookeeper
    
    sudo systemctl enable kafka
    sudo systemctl start kafka
    

    You can now manage ZooKeeper and Kafka using standard systemctl commands (start, stop, status, restart).

Verifying the Installation

To verify that your Kafka setup is working correctly, you can perform some basic operations such as creating a topic, producing messages, and consuming messages.

  1. Creating a Topic:

    1
    
    bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
    

    You should see a confirmation message indicating that the topic has been created successfully.

  2. Producing Messages:

    1
    
    bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092
    

    Type a few messages in the console and press Enter after each message.

  3. Consuming Messages: Open a new terminal window and run:

    1
    
    bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092
    

    You should see the messages you produced in the previous step.

By following these steps, you should have a fully functional Apache Kafka environment set up on your system. This setup forms the foundation for developing and deploying real-time data streaming applications using Kafka.

Conclusion

Getting started with Apache Kafka can seem daunting, but with the right guidance, you can quickly get up to speed. This guide provided a comprehensive introduction to Kafka, from installation to basic operations and building simple producers and consumers. As you continue to explore Kafka, you will uncover its full potential for building robust, real-time data pipelines.

By following this guide, you’ve taken the first steps in mastering Apache Kafka.

Related Article: