dc.description.abstract |
In recent years, the application and wide adoption of Internet of Things (IoT)-based
technologies have increased the proliferation of monitoring systems, which has consequently
exponentially increased the amounts of heterogeneous data generated. Processing and analysing
the massive amount of data produced is cumbersome and gradually moving from classical
‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance,
in environmental monitoring and management domain, time-series data and historical dataset are
crucial for prediction models. However, the environmental monitoring domain still utilises legacy
systems, which complicates the real-time analysis of the essential data, integration with big data
platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing
middleware framework for real-time analysis of heterogeneous environmental monitoring and
management data is presented and tested on a cluster using open source technologies in a big data
environment. The system ingests datasets from legacy systems and sensor data from heterogeneous
automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect
APIs for processing by the Kafka streaming processing engine. The stream processing engine executes
the predictive numerical models and algorithms represented in event processing (EP) languages
for real-time analysis of the data streams. To prove the feasibility of the proposed framework,
we implemented the system using a case study scenario of drought prediction and forecasting based
on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form
that could be executed by the streaming engine for real-time computing. Secondly, the model is
applied to the ingested data streams and datasets to predict drought through persistent querying of
the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of
the distributed stream processing middleware infrastructure is calculated to determine the real-time
effectiveness of the framework. |
en_US |