DSpace Repository

A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring

Show simple item record

dc.contributor.author Akanbi, Adeyinka
dc.contributor.author Masinde, Muthoni
dc.date.accessioned 2023-05-08T05:45:01Z
dc.date.available 2023-05-08T05:45:01Z
dc.date.issued 2020-06-03
dc.identifier.other https://doi.org/10.3390/s20113166
dc.identifier.uri http://hdl.handle.net/11462/2454
dc.description Article en_US
dc.description.abstract In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical ‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework. en_US
dc.language.iso en en_US
dc.publisher Sensors 2020, 20(11), 3166 en_US
dc.relation.ispartofseries Sensors;2020, 20(11), 3166
dc.subject Big Data en_US
dc.subject Stream processing en_US
dc.subject Middleware en_US
dc.subject Internet of Things en_US
dc.subject Apache Kafka en_US
dc.subject Drought en_US
dc.title A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account