Cookie Consent by Free Privacy Policy Generator ๐Ÿ“Œ Query Real Time Data in Kafka Using SQL

๐Ÿ  Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security



๐Ÿ“š Query Real Time Data in Kafka Using SQL


๐Ÿ’ก Newskategorie: Programmierung
๐Ÿ”— Quelle: dev.to

Apache Kafka is a distributed streaming platform that allows you to store and process real-time data streams. It is commonly used in modern data architectures to capture and analyze user interactions with web and mobile applications, as well as IoT device data, logs, and system metrics. It is often used for real-time data processing, data pipelines, and event-driven applications. However, querying data stored in Kafka can be challenging, especially for users who are more comfortable with SQL than with Kafka's native APIs. This is where the streaming SQL engine and database can be helpful. It is actually possible to run SQL directly on streaming data.

In this article, you are going to learn why streaming SQL is beneficial, how to use tools such as RisingWave to query data in a Kafka topic with SQL, and how to perform some basic analysis on the real-time data.

Learning objectives ๐Ÿ“–

You will learn the following throughout the article:

  • SQL-Based Stream Processing.
  • Understand why SQL is your key to querying Kafka.
  • What are the other benefits of SQL over Kafka native APIs?
  • The real-world example use cases for query Kafka with SQL.
  • Analyzing order delivery system performance using materialized views and SQL Queries.

Why SQL-Based Stream Processing?

There are many ways to process and manipulate data. We can do it via a scripting language, in an Excel spreadsheet, or even by writing lower-level code in Java, C++, Python, or some other language.

In effect, there are three options in stream processing:

  • Low-level code or APIs.
  • SQL-based processing.
  • UI-based building blocks that perform transformations at higher levels of definition.

In my opinion, SQL is the best solution โ€“ a great compromise between the other two choices when you consider overall power, speed, and ease of use. And it brings us to the Streaming SQL approach that extends SQL with the ability to process real-time data streams.

Additionally, one of the challenges of working with Kafka is how to efficiently analyze and extract insights from the large volumes of data stored in Kafka topics. Traditional batch processing approaches, such as Hadoop MapReduce or Apache Spark, can be slow and expensive, and may not be suitable for real-time analytics.
To address this challenge, you can use SQL queries with Kafka to analyze and extract insights from the data in real time.

SQL over Kafka provides several benefits

Querying Kafka with SQL can provide several benefits over using Kafka's native APIs. Here are a few reasons why you might want to use SQL to query Kafka:

Familiarity with SQL: Many developers and data analysts are more familiar with SQL than with Kafka's native APIs. Using SQL can make it easier for these users to interact with Kafka and query the data stored in Kafka topics.

Abstraction from Kafka's API complexity: Kafka's native APIs can be complex and require a lot of boilerplate code to read, write, and manipulate data. Using SQL can provide an abstraction layer that simplifies the interaction with Kafka and hides the complexity of the Kafka API.

Standardization: SQL is a standard query language that is widely used in the industry. Using SQL to query Kafka can provide a standardized way to interact with Kafka, making it easier to integrate Kafka with other systems and tools that support SQL.

Flexibility: SQL provides a wide range of query capabilities, including filtering, sorting, aggregating, and joining data. Using SQL to query Kafka can provide more flexibility in querying and analyzing the data stored in Kafka topics.

SQL is also very rich. Itโ€™s easy to define filtering with WHERE clauses, define column transformations, and do conditional manipulations using case statements. Different types of objects can be JOIN ed as well as GROUP BY ed and aggregated. Whereas with databases, youโ€™re typically joining tables, in streaming cases, youโ€™re joining streams, windows, and caches to produce results. Itโ€™s very easy to do that in SQL.

Most streaming database technologies use SQL for these reasons: RisingWave, Materialize, KsqlDB, Apache Flink, and so on offering SQL interfaces. This post explains how to choose the right streaming database.

What are the real-world example use cases for query Kafka with SQL?

There are plenty of real-world use cases for query streaming data with SQL and here I listed out a few of them with their demos using RisingWave respectively.

RisingWave is an open-source distributed SQL database for stream processing. RisingWave accepts data from sources like Apache Kafka, Apache Pulsar, Amazon Kinesis, Redpanda, and databases via native Change data capture connections to MySQL and PostgreSQL sources. It uses the concept of materialized view that involves caching the outcome of your query operations and it is quite efficient for long-running stream processing queries.

Query Real Time Data in Kafka Using SQL

  1. Real-time ad performance analysis upon certain user interactions on websites or mobile applications.

  2. Server performance anomaly detection automation can be life-changing for DevOps teams.

  3. Social media platforms events processing and analyzing real-time activities.

  4. Monitoring live stream metrics, such as video quality and view count.

Basically, you can use this approach in building faster any real-time applications.

Analyzing order delivery performance (Demo)

Analyzing order delivery performance is critical for any e-commerce business. Understanding how quickly and effectively orders are being delivered can help identify bottlenecks, improve customer satisfaction, and ultimately drive revenue. For example, to analyze food order delivery performance, we can leverage Kafka streams, and SQL queries on the RisingWave streaming database to extract and analyze data in real-time.

In the demo tutorial, we'll leverage the following GitHub repository where we assume that all necessary things are set up using Docker compose.

With this configuration, Docker initiates a demo cluster with all RisingWave components, including the frontend node, compute node, metadata node, and MinIO. The workload generator will start to generate random mock data and feed them into Kafka topics. In this demo cluster, data of materialized views will be stored in the MinIO instance.

We have a Kafka topic named delivery_orders that contains events for every order placed on an e-commerce website. Each event includes information about the order, such as the order ID, restaurant ID, and delivery status.

Before You Begin

To complete this tutorial, you need the following:

Step 1: Setting Up the demo cluster

First, clone the RisingWave repository to your local environment.

git clone https://github.com/risingwavelabs/risingwave.git

Then, the integration_tests/delivery directory and start the demo cluster from the docker compose file.

cd risingwave/integration_tests/delivery
docker compose up -d

Make sure that all containers are up and running!

Step 2: Create a data stream Source for Kafka

To connect RisingWave to Kafka, we need to configure a new data ingestion source.

Open a new SQL shell, we are going to use the Postgres interactive terminal psql for running queries and retrieving results from RisingWave. Then, create a Kafka source to allow RisingWave to access messages in the delivery_orders topic.

CREATE SOURCE delivery_orders_source (
    order_id BIGINT,
    restaurant_id BIGINT,
    order_state VARCHAR,
    order_timestamp TIMESTAMP
) WITH (
    connector = 'kafka',
    topic = 'delivery_orders',
    properties.bootstrap.server = 'message_queue:29092',
    scan.startup.mode = 'earliest'
) ROW FORMAT JSON;

Step 3: Define a materialized view

Now we have connected RisingWave to the Kafka streams, but RisingWave has not started to consume data yet. To extract only the data we are interested in and to speed-up the query, we need to define materialized views. After a materialized view is created, RisingWave will start to consume data from the Kafka topic.

Let's assume we want to calculate the number of total orders created from a specific restaurant within the last 15 mins in real-time. We can use the following SQL query to achieve this:

CREATE MATERIALIZED VIEW restaurant_orders AS
SELECT
    window_start,
    restaurant_id,
    COUNT(*) AS total_order
FROM 
    HOP(delivery_orders_source, order_timestamp, INTERVAL '1' MINUTE, INTERVAL '15' MINUTE)
WHERE 
    order_state = 'CREATED'
GROUP BY
    restaurant_id,
    window_start;

Let's look at the materialized view we have just created, it uses hop() time window function to schedule the time interval between the time when the order was created with order_timestamp timestamp and the window size of 15 mins.

Step 4. Run a Streaming Query on the Kafka Topic

Now we can write a simply streaming query that fetches messages from Kafka.

SELECT * FROM restaurant_orders WHERE restaurant_id = 1;

You should see that RisingWave has executed the query and returned the results:

    window_start     | restaurant_id | total_order
---------------------+---------------+-------------
 2023-03-18 16:50:00 |             1 |         120
 2023-03-18 17:20:00 |             1 |          80
 2023-03-18 17:30:00 |             1 |          14
 2023-03-18 18:17:00 |             1 |          18
 2023-03-18 18:41:00 |             1 |         166
 2023-03-18 18:49:00 |             1 |         176
 2023-03-18 19:24:00 |             1 |           1
 2023-03-19 12:22:00 |             1 |           5
 2023-03-19 12:55:00 |             1 |         188
 2023-03-19 13:02:00 |             1 |         214
 2023-03-19 13:46:00 |             1 |         191
 2023-03-18 16:35:00 |             1 |           8
 2023-03-18 16:55:00 |             1 |         147
 2023-03-18 19:08:00 |             1 |          70
 2023-03-18 19:18:00 |             1 |          16
 2023-03-19 12:32:00 |             1 |          42
 2023-03-19 13:19:00 |             1 |         207
 2023-03-19 13:55:00 |             1 |         195
 2023-03-19 14:01:00 |             1 |         188
 2023-03-18 17:06:00 |             1 |         172
 2023-03-18 17:15:00 |             1 |         120
 2023-03-18 17:28:00 |             1 |          22
 2023-03-18 18:36:00 |             1 |         139
 2023-03-18 18:46:00 |             1 |         188
 2023-03-18 18:58:00 |             1 |         144
 2023-03-18 19:12:00 |             1 |          44
 2023-03-19 12:52:00 |             1 |         174

We can also use SQL queries to compute more complex metrics, such as the delivery success rate. Here's an example SQL query that calculates the delivery success rate for each restaurant:

CREATE MATERIALIZED VIEW restaurant_delivery_success_rate AS
SELECT
    restaurant_id,
    SUM(CASE WHEN order_state = 'DELIVERED' THEN 1 ELSE 0 END) / COUNT(*) AS delivery_success_rate
FROM 
    delivery_orders_source
GROUP BY
    restaurant_id;

This query groups the events by restaurant_id and calculates the delivery success rate for each restaurant. The delivery success rate is computed as the number of delivered orders divided by the total number of orders for each restaurant.

 restaurant_id | delivery_success_rate
---------------+-----------------------
             0 |                     0
             1 |                     1
             2 |                     0

More analysis you can do based on your streaming data structure. For example, you may calculate the average delivery time and cost for orders within the certain amount of time.

Optional Step: Stop the demo cluster

When you finish, you remove the containers and the data generated by running docker compose down.

Conclusion

By using SQL queries to extract and analyze the data in real-time, we can gain valuable insights into the delivery performance and identify areas for improvement. We can also use materialized views to store and update the computed metrics in real-time, allowing us to quickly retrieve and visualize the data.

Next Steps

Throughout the demo, we only covered some simple queries from the stream of data on Kafka. It is also possible to create a sink using the materialized view of RisingWave to export the result to another Kafka topic and there are lots of other functions where you can join two streams, handle late events, analyze large numbers of sub-streams, and aggregate output. You can observe them by checking the use case scenarios provided earlier.

Related resources

Recommended content

Community

๐Ÿ™‹ Join the Risingwave Community

About the author

Visit my personal blog: www.iambobur.com 1

...



๐Ÿ“Œ Query Real Time Data in Kafka Using SQL


๐Ÿ“ˆ 54.18 Punkte

๐Ÿ“Œ Apache Kafka Is NOT Real Real-Time Data Streaming!


๐Ÿ“ˆ 41.71 Punkte

๐Ÿ“Œ Unlocking the Potential of IoT Applications With Real-Time Alerting Using Apache Kafka Data Streams and KSQL


๐Ÿ“ˆ 38.84 Punkte

๐Ÿ“Œ The Apache Kafka Handbook โ€“ How to Get Started Using Kafka


๐Ÿ“ˆ 37.34 Punkte

๐Ÿ“Œ Building a Real-Time Data Architecture With Apache Kafka, Flink, and Druid


๐Ÿ“ˆ 33.67 Punkte

๐Ÿ“Œ Building Robust Real-Time Data Pipelines With Python, Apache Kafka, and the Cloud


๐Ÿ“ˆ 33.67 Punkte

๐Ÿ“Œ Why Apache Kafka and Apache Flink Work Well Together to Boost Real-Time Data Analytics


๐Ÿ“ˆ 33.67 Punkte

๐Ÿ“Œ IT Science Case Study: Real-time Customer Support Using Real-time Collaboration


๐Ÿ“ˆ 33.63 Punkte

๐Ÿ“Œ Learning Kafka Part One: What is Kafka?


๐Ÿ“ˆ 32.16 Punkte

๐Ÿ“Œ Data Analysis for Live Streaming: What Happens in Real Time Is Analyzed in Real Time?


๐Ÿ“ˆ 31.81 Punkte

๐Ÿ“Œ End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker


๐Ÿ“ˆ 30.84 Punkte

๐Ÿ“Œ Real-Time Logistics, Shipping, and Transportation With Apache Kafka


๐Ÿ“ˆ 30.31 Punkte

๐Ÿ“Œ A Real-Time Supply Chain Control Tower Powered by Kafka


๐Ÿ“ˆ 30.31 Punkte

๐Ÿ“Œ Real-Time Advertising With Apache Kafka and Flink


๐Ÿ“ˆ 30.31 Punkte

๐Ÿ“Œ Apache Kafka + Vector Database + LLM = Real-Time GenAI


๐Ÿ“ˆ 30.31 Punkte

๐Ÿ“Œ Nelson Open Source ERP 6.3.1 db/utils/query/data.xml query sql injection


๐Ÿ“ˆ 30.11 Punkte

๐Ÿ“Œ Real-Time Payments, Real-Time Fraud Risks?


๐Ÿ“ˆ 28.45 Punkte

๐Ÿ“Œ Real-Time Payments, Real-Time Fraud Risks?


๐Ÿ“ˆ 28.45 Punkte

๐Ÿ“Œ Real-time programming with Linux, part 1: What is real-time?


๐Ÿ“ˆ 28.45 Punkte

๐Ÿ“Œ CVE-2016-15020 | liftkit database up to 2.13.1 src/Query/Query.php processOrderBy sql injection


๐Ÿ“ˆ 26.75 Punkte

๐Ÿ“Œ Carbon Black Launches Real-Time Query and Response Tool


๐Ÿ“ˆ 25.64 Punkte

๐Ÿ“Œ Processing Time Series Data With QuestDB and Apache Kafka


๐Ÿ“ˆ 25.63 Punkte

๐Ÿ“Œ Query HDFS data inside SQL Server big data cluster | Data Exposed


๐Ÿ“ˆ 25.42 Punkte

๐Ÿ“Œ Query HDFS data inside SQL Server big data cluster | Data Exposed


๐Ÿ“ˆ 25.42 Punkte

๐Ÿ“Œ Streaming Data From MongoDB Atlas to SingleStore Kai Using Kafka and CDC


๐Ÿ“ˆ 24.61 Punkte

๐Ÿ“Œ Data Stream Using Apache Kafka and Camel Application


๐Ÿ“ˆ 24.61 Punkte

๐Ÿ“Œ Build a GNN-based real-time fraud detection solution using the Deep Graph Library without using external graph storage


๐Ÿ“ˆ 24.57 Punkte

๐Ÿ“Œ Is Apache Kafka Providing Real Message Ordering?


๐Ÿ“ˆ 24.12 Punkte

๐Ÿ“Œ How to mix NoSQL and SQL Data in s Single Query using MariaDB MaxScale


๐Ÿ“ˆ 23.87 Punkte

๐Ÿ“Œ How To Execute A Query In SQL: Using Queries And Data Views


๐Ÿ“ˆ 23.87 Punkte

๐Ÿ“Œ Using Kusto Query Language (KQL) in Azure Data Studio | Data Exposed


๐Ÿ“ˆ 23.3 Punkte

๐Ÿ“Œ Trend Micro ScanMail for Exchange 12.0 Log Query/Quarantine Query Cross Site Scripting


๐Ÿ“ˆ 22.82 Punkte

๐Ÿ“Œ Trend Micro ScanMail for Exchange 12.0 Log Query/Quarantine Query cross site scripting


๐Ÿ“ˆ 22.82 Punkte

๐Ÿ“Œ More DNS over HTTPS: Become One With the Packet. Be the Query. See the Query, (Thu, Dec 19th)


๐Ÿ“ˆ 22.82 Punkte











matomo