Software environment

In our works we develop data systems used to collect, transform, ingest and analyse data of different modalities. We work with data streams e.g. sensor readings e.g. weather data from meteo stations, public transport delay data, spatial data e.g. road network layers, tabular data, trip matrices from transport models and more.

To make the processing of varied data possible, we have developed extensible software platforms, combining Relational Database Management Systems and Big Data platforms for data collection, storage, and event streaming. These include Apache NiFi, Apache Hadoop, Apache Flink, and Apache Kafka. An important part of our platforms are custom modules, including batch and stream jobs relying on stream processing engines such as Apache Flink and RESTful services calculating features used for machine learning tasks such as travel mode choice prediction. We use these platforms for inter alia detecting changes in data streams with stream mining methods, modelling delays and predicting travel mode choices.

Fig.1 - Sample system architecture used to calculate level of service attrributes for trip data to enable improved prediction of travel mode choices.
Source: Grzenda, M., Luckner, M., Wrona, P. (2023). Urban Traveller Preference Miner: Modelling Transport Choices with Survey Data Streams. In: Amini, MR., et al (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. LNCS, vol 13718. Springer https://doi.org/10.1007/978-3-031-26422-1_50

StreamUDataSys CLUSTER

This environment is used to host virtual machines used to run:

the production environment used for 24x7 data collection and preprocessing for urban use cases, including inter alia Apache NiFi, Apache Hadoop, Apache Kafka, Apache Flink, and RDBMS instances
development and testing environment used as a testbed for research on novel data processing methods,
and analytical environment used for computationally demanding data processing, e.g. parallel development of multiple machine learning models.

StreamUDataSys cluster

CPU: 2x AMD EPYC 7413, 48 physical cores
RAM: 1.5TiB RAM
STORAGE: 64TiB HDD and 1TiB SSD

EDEN Cluster

When even more extensive computing resources are needed, we rely on the EDEN cluster, present at the Faculty of Mathematics and Information Science (FMIS) at which our group is located. This cluster managed by the FMIS HPC center, includes inter alia:

1 x NVIDIA DGX A100 node

CPU: 2x AMD Rome 7742, 128 physical cores in total
GPU: 8x NVIDIA A100 40 GB
RAM: 2 TiB
STORAGE: 3,8 TiB + 15 TiB SSD
GPU Interconnect: 200Gb/s
Network: 100Gb/s

3 x NVIDIA DGX A100 nodes

CPU: 2x AMD Rome 7742, 128 physical cores in total
GPU: 8x NVIDIA A100 40 GiB
RAM: 1 TiB
STORAGE: 3,8 TiB + 15 TiB SSD
GPU Interconnect: 200Gb/s
Network: 100Gb/s

Disk Array DDN SS9012 i DDN AI400X

Storage space: 1,6 PiB
Cache DDN AI400X 256 TiB
write speed 34 GB/s
read speed 48 GB/s

3 x Lenovo ThinkSystem SR665 nodes

CPU: 2x AMD EPYC 7413 Processor, 48 physical cores in total
RAM: 3 TiB
STORAGE: 56 TiB HDD
Network: 100Gb/s

1 x Lenovo ThinkSystem SR675 node

CPU: 2x AMD EPYC 9534 Processor, 64 physical cores in total
GPU: 8x NVIDIA H100 PCIe 80 GiB
RAM: 1 TiB
STORAGE: 14 TiB HDD
Network: 100Gb/s

Get In Touch

Warsaw University of Technology
Faculty of Mathematics and Information Science
Koszykowa 75, Warsaw, Poland

maciej.grzenda at pw.edu.pl

Quick Links

Publications Urban analytics platform Education Contact Us

Urban analytics platform