The I-BiDaaS platform is illustrated below:
The whole concept of the I-BiDaaS platform is as follows:
The data from heterogeneous sources are ingested in the solution. For early development scenarios when not sufficient real data is available, we will use the IBM’s data fabrication platform. The overall data gets ingested into the batch processing module (structured (non) convex optimization – UNSPMF) and the streaming analytics modules (complex event processing – SAG; GPU-accelerated engine – FORTH) via the message broker (SAG), which is also responsible for the data transformation and pre-processing of the raw data. The streaming analytics modules perform analytics on the ingested streaming data, also referencing historic information where necessary, to identify business patterns that have happened or are about to happen. The batch and real time analytics results are fed to the advanced visualization tools (AEGIS). An innovation is that part of the analytics can be offloaded to the parallel GPU-accelerated engine to further speed-up the execution of streaming analytics. The collected data can be stored in Hecuba (BSC), that uses the Apache Cassandra as a back-end, which can then be processed by the COMPSs pool of distributed machine learning algorithms, to be implemented during the course of the project (BSC’s COMPSs model; UNSPMF’s structured (non) convex optimization algorithms for analytics). Further innovation is that the correlations produced by the analysis are fed back to the data fabrication platform, to be used for training and help building rules that will be used for future data generation purposes. A final innovation is that the real-time processing module feeds the batch-processing module with inputs that enable periodic refinements of models used in machine learning methods (structure of non-convex regulizers, etc.). In terms of interleaving batch and stream processing, one can see that the proposed solution goes beyond the traditional lambda architecture. It uses a complex event analysis system, combined with a hardware-based implementation of streaming analytics that uses many different many-core accelerators (GPUs, Intel Phi, etc.).
User interface for our solutions ease of use by end-users, we will build a multi-purpose interface (AEGIS), that can be used by different categories of users. The interface will provide different levels of abstractions, tailored to different categories of user expertise. First, we will offer a programming API for access to every level of our software stack. This will give the flexibility to experienced IT users to utilise every aspect of our solution, and fine-tune their applications. The API will give access to the high-level application components — such as the advanced machine learning modules and the streaming analytics — as well as to the low-level infrastructure layer, such as the scheduling and the resource management of the underlying infrastructure. Second, we will provide a domain language for access to the application layer. The purpose of this language is to offer an easy way to program data analytics (either batch or stream processing) without caring about scalability issues and infrastructure placement. The user will use it to implement a task, that will be automatically distributed across nodes, scheduling the work through actual resources, and automatically improve the performance at runtime, removing the burden of taking care of these challenges from the software developer. The domain language will further serve as a medium to provide a declarative query language and a simple, platform-independent SQL syntax. Finally, in its simplest form, the interface will provide pre-defined analytics tasks that a non-IT user can easily combine and multiplex with the desired data sources, to form a data processing pipeline. The list of the pre-defined tasks will be easily updated by the experienced IT users, either statically (e.g., during the deployment of our platform), or ad-hoc (e.g., by monitoring the queries submitted by the experienced IT users).