Sitemap

Building a Real-Time Telemetry Platform for Ride-Hailing Services with IoT and Big Data

8 min readOct 8, 2025

In today’s connected world, the ability to capture, process, and analyze data in real time has become a key differentiator for mobility services. This article presents a Proof of Concept (POC) for a real-time telemetry platform for ride-hailing services, combining IoT, event streaming, real-time processing, and time-series databases to deliver operational and analytical insights in real time

The project demonstrates how simulated IoT devices installed in taxis, private cars, and motorcycles can send telemetry data through mTLS-secured MQTT, process it with Apache Spark Streaming, store it in Cassandra, and visualize it in Grafana. The architecture is modular, scalable, and secure, using entirely open-source technologies.

Problem Statement

Ride-hailing services face several challenges when it comes to monitoring and analyzing real-time telemetry:

  1. Limited visibility into ride events
    Without a centralized system to track events such as vehicle type, ride status, number of passengers, fare, and surge multipliers, operators cannot fully understand ride dynamics or identify anomalies quickly
  2. High-volume data processing
    Each vehicle generates multiple events per minute, and the platform must efficiently ingest, process, and store this telemetry data to maintain timely insights
  3. Secure IoT communication
    Sensitive telemetry data must be transmitted securely to prevent unauthorized access and man-in-the-middle attacks, using mutual authentication and encryption
  4. Real-time analytical insights
    Without processing and storing data in real time, it is difficult to detect patterns such as demand surges, anomalous rides, or changes in passenger behavior promptly
  5. Scalability and fault tolerance
    The system must handle thousands of simultaneous events without losing reliability or performance

Platform Architecture

The Real-Time Ride-Hailing Telemetry Platform is designed to handle high-volume IoT events, process them in real time, and provide actionable insights for ride-hailing telemetry. The architecture emphasizes secure ingestion, streaming processing, and time-series storage, all orchestrated via Docker Compose.

Arcchitecture Diagram

Key Components

IoT Devices (Ride Nodes)
Simulated ride-hailing vehicles (taxis, private cars, and motorcycles) act as IoT nodes, emitting telemetry events. Each event contains:

  • Device ID and vehicle type
  • Trip ID
  • Pickup and dropoff zones
  • Distance traveled, fare amount, and surge multiplier
  • Number of passengers
  • Ride status (active or completed)
  • Timestamp of the event

MQTT Broker (Mosquitto)
All telemetry events are sent via MQTT, a lightweight protocol suitable for IoT. Communication is secured using mutual TLS (mTLS), ensuring encrypted and authenticated message transmission from ride nodes to the broker.

Kafka & Kafka Connect
MQTT messages are forwarded to Apache Kafka through Kafka Connect, providing a durable, scalable, and decoupled streaming backbone. Kafka buffers the events and enables downstream consumers to process them in parallel.

Spark Streaming (Ride Stream Processor)
The Spark Streaming application consumes events from Kafka and performs the following operations:

  • Base64 decoding of the payload and parsing into structured JSON
  • Data validation, filtering out events with zero distance, fare, or passengers
  • Conversion of epoch timestamps to Spark TimestampType
  • Aggregations by pickup zone and hour, calculating metrics such as total rides, total fare, average distance, average passengers, average surge, active/completed rides, revenue per km, and revenue per passenger
  • Writing both clean events and aggregated metrics into Cassandra

Cassandra
Apache Cassandra stores both raw events and aggregated metrics. It is optimized for time-series telemetry data, allowing fast writes and low-latency queries for large volumes of ride events.

Grafana Dashboards
Cassandra data is visualized in Grafana, providing operational dashboards that display:

  • Ride counts by pickup zone and hour
  • Average fare, distance, and passengers
  • Surge multiplier trends
  • Active vs. completed rides
  • Revenue per kilometer and per passenger
Grafana Ride Hailing — Dashboard

Architecture Flow

The full pipeline is summarized as:

Ride Nodes → MQTT (mTLS-secured) → Kafka → Spark Streaming → Cassandra → Grafana

This design ensures secure, real-time processing and analytics, while remaining modular and extensible for additional telemetry sources, metrics, or analytics services in the future.

Step-by-Step Data Flow with Metrics Examples

The ride-hailing telemetry pipeline captures, processes, and visualizes events in a real-time, end-to-end manner. Below is a detailed breakdown of each step in the flow:

Telemetry Emission → MQTT

Simulated IoT devices installed in taxis, private cars, and motorcycles periodically send ride events to the Mosquitto MQTT broker. Each event includes:

  • Device ID and vehicle type
  • Trip ID
  • Pickup and dropoff zones
  • Distance traveled (km)
  • Fare amount
  • Surge multiplier
  • Number of passengers
  • Ride status (active or completed)
  • Timestamp

All communication is protected using mutual TLS (mTLS), ensuring only authenticated devices can publish events.

Event Forwarding → Kafka via Kafka Connect

Kafka Connect bridges the MQTT broker to Apache Kafka, providing a scalable, decoupled, and durable buffer. This allows multiple downstream consumers to process the same events independently without interfering with the ingestion process.

Real-Time Processing → Spark Streaming

The Ride Stream Processor consumes Kafka events and executes several key steps:

  1. Decode and parse the Base64 payload into a structured JSON format
  2. Validate data, filtering out events where distance, fare, or passengers are zero
  3. Convert timestamps from epoch seconds to Spark TimestampType
  4. Aggregate metrics by pickup zone and hour, computing:
  • Total rides (total_rides)
  • Total fare (total_fare)
  • Average distance (avg_distance)
  • Average passengers per ride (avg_passengers)
  • Average surge multiplier (avg_surge)
  • Active and completed rides (active_rides / completed_rides)
  • Revenue per kilometer (revenue_per_km)
  • Revenue per passenger (revenue_per_passenger)

Both clean events and aggregated metrics are written to Cassandra for storage and further analysis.

Persistent Storage → Cassandra

Cassandra tables are structured to handle time-series telemetry data efficiently:

  • Raw events table stores individual ride events for historical reference and detailed analytics
  • Aggregates table stores pre-computed metrics by pickup zone and hour for fast queries and visualization

This setup allows the system to handle high-throughput ingestion while ensuring low-latency access to both detailed and aggregated data.

Visualization → Grafana Dashboards

Grafana connects to Cassandra and visualizes both raw and aggregated metrics in interactive dashboards. Example metrics displayed include:

  • Hourly ride counts per pickup zone
  • Average fare, distance, and passengers per ride
  • Surge multiplier trends across zones
  • Active vs. completed rides
  • Revenue per kilometer and per passenger

Dashboards allow operators or analysts to quickly identify trends, anomalies, and performance metrics in real time.

Grafana Ride Hailing — Dashboard

This flow guarantees a secure, end-to-end telemetry pipeline that transforms raw IoT events into actionable operational insights, fully reproducible using open-source technologies.

Technologies Used and Their Role in the Ecosystem

The Real-Time Ride-Hailing Telemetry Platform leverages a set of open-source technologies to build a secure, scalable, and observable IoT streaming pipeline. Each component plays a specific role in the architecture:

Python

Used as the main language for simulating ride-hailing devices and developing the Spark Streaming application. Python enables quick development of IoT clients, data parsing, and integration with Spark and Cassandra.

MQTT (Mosquitto)

A lightweight messaging protocol for IoT telemetry. The Mosquitto broker receives ride events from simulated devices. Communication is protected using mutual TLS (mTLS) to ensure encryption and authentication.

Apache Kafka & Kafka Connect

Kafka acts as a distributed, durable event bus. Kafka Connect bridges MQTT to Kafka, decoupling ingestion from downstream processing. This allows Spark Streaming to consume events in parallel, ensuring reliable, scalable, and fault-tolerant processing.

Apache Spark Streaming

Spark Streaming is the real-time processing engine of the platform. It performs:

  • Base64 decoding and JSON parsing
  • Data validation and cleaning
  • Timestamp conversion
  • Aggregations by pickup zone and hour
  • Computation of metrics such as total rides, total fare, average distance, average passengers, surge, active/completed rides, revenue per km, and revenue per passenger
    Spark ensures low-latency processing of high-volume telemetry streams.

Apache Cassandra

Cassandra stores both raw telemetry events and aggregated metrics in a time-series optimized schema. Its horizontal scalability and fault tolerance make it suitable for large volumes of ride-hailing data, providing fast writes and efficient reads for dashboards and analytics.

Grafana

Grafana visualizes telemetry data stored in Cassandra. Dashboards display operational metrics and aggregates in interactive charts and graphs, enabling real-time monitoring and analytical insights.

Docker & Docker Compose

The entire platform is containerized, allowing consistent deployment across environments. Docker Compose orchestrates all services, including Kafka, Zookeeper, Cassandra, Grafana, MQTT broker, Spark Streaming, and ride node simulations.

Key Benefits of This Tech Stack

  • End-to-end security: mTLS ensures authenticated and encrypted telemetry communication
  • Real-time analytics: Spark Streaming provides instant computation of operational metrics
  • High throughput and reliability: Kafka and Cassandra handle large volumes of events with fault tolerance
  • Extensibility: Modular architecture allows new devices, telemetry types, or analytics to be added easily
  • Observability: Grafana dashboards make key metrics actionable and easy to monitor

Conclusion

The Real-Time Ride-Hailing Telemetry Platform demonstrates how IoT telemetry, event streaming, real-time processing, and time-series storage can be combined to create a secure and scalable analytics pipeline. By simulating ride-hailing vehicles, this POC shows how raw telemetry can be ingested, validated, aggregated, and visualized in real time, providing actionable insights for operational and analytical purposes.

Key takeaways include:

  • Secure ingestion with mTLS ensures that only authenticated devices can publish telemetry, protecting data integrity and privacy.
  • Kafka and Kafka Connect decouple ingestion from processing, providing durability and scalability for high-volume events.
  • Spark Streaming enables low-latency processing, data validation, and metric aggregation, transforming raw events into meaningful analytics.
  • Cassandra provides a robust, time-series optimized backend for both raw and aggregated telemetry data.
  • Grafana dashboards deliver real-time insights, making it easy to monitor performance metrics, detect anomalies, and analyze operational trends.

This platform also highlights a modular and extensible design, allowing future expansion to additional IoT devices, new metrics, or custom analytics pipelines.

While this POC is not intended for production deployment, it provides a strong foundation for learning how to integrate IoT, streaming, and analytics technologies in real-time operational scenarios. It demonstrates that with open-source tools, complex telemetry pipelines can be built, observed, and analyzed efficiently.

--

--

Sergio Sánchez Sánchez
Sergio Sánchez Sánchez

Written by Sergio Sánchez Sánchez

👋 Versatile mobile and backend developer with a passion for computer security and blockchain. Let's code and secure the future! 💻🔒⛓️

No responses yet