Software environment

In our works we develop data systems used to collect, transform, ingest and analyse data of different modalities. We work with data streams e.g. sensor readings e.g. weather data from meteo stations, public transport delay data, spatial data e.g. road network layers, tabular data, trip matrices from transport models and more.

To make the processing of varied data possible, we have developed extensible software platforms, combining Relational Database Management Systems and Big Data platforms for data collection, storage, and event streaming. These include Apache NiFi, Apache Hadoop, Apache Flink, and Apache Kafka. An important part of our platforms are custom modules, including batch and stream jobs relying on stream processing engines such as Apache Flink and RESTful services calculating features used for machine learning tasks such as travel mode choice prediction. We use these platforms for inter alia detecting changes in data streams with stream mining methods, modelling delays and predicting travel mode choices.

Fig.1 - Sample system architecture used to calculate level of service attrributes for trip data to enable improved prediction of travel mode choices.
Source: Grzenda, M., Luckner, M., Wrona, P. (2023). Urban Traveller Preference Miner: Modelling Transport Choices with Survey Data Streams. In: Amini, MR., et al (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. LNCS, vol 13718. Springer https://doi.org/10.1007/978-3-031-26422-1_50

Hardware environment

StreamUDataSys CLUSTER

This environment is used to host virtual machines used to run:

  • the production environment used for 24x7 data collection and preprocessing for urban use cases, including inter alia Apache NiFi, Apache Hadoop, Apache Kafka, Apache Flink, and RDBMS instances
  • development and testing environment used as a testbed for research on novel data processing methods,
  • and analytical environment used for computationally demanding data processing, e.g. parallel development of multiple machine learning models.

StreamUDataSys cluster

  • CPU: 2x AMD EPYC 7413, 48 physical cores
  • RAM: 1.5TiB RAM
  • STORAGE: 64TiB HDD and 1TiB SSD
EDEN Cluster

When even more extensive computing resources are needed, we rely on the EDEN cluster, present at the Faculty of Mathematics and Information Science (FMIS) at which our group is located. This cluster managed by the FMIS HPC center, includes inter alia:

1 x NVIDIA DGX A100 node

  • CPU: 2x AMD Rome 7742, 128 physical cores in total
  • GPU: 8x NVIDIA A100 40 GB
  • RAM: 2 TiB
  • STORAGE: 3,8 TiB + 15 TiB SSD
  • GPU Interconnect: 200Gb/s
  • Network: 100Gb/s

3 x NVIDIA DGX A100 nodes

  • CPU: 2x AMD Rome 7742, 128 physical cores in total
  • GPU: 8x NVIDIA A100 40 GiB
  • RAM: 1 TiB
  • STORAGE: 3,8 TiB + 15 TiB SSD
  • GPU Interconnect: 200Gb/s
  • Network: 100Gb/s

Disk Array DDN SS9012 i DDN AI400X

  • Storage space: 1,6 PiB
  • Cache DDN AI400X 256 TiB
  • write speed 34 GB/s
  • read speed 48 GB/s

3 x Lenovo ThinkSystem SR665 nodes

  • CPU: 2x AMD EPYC 7413 Processor, 48 physical cores in total
  • RAM: 3 TiB
  • STORAGE: 56 TiB HDD
  • Network: 100Gb/s

1 x Lenovo ThinkSystem SR675 node

  • CPU: 2x AMD EPYC 9534 Processor, 64 physical cores in total
  • GPU: 8x NVIDIA H100 PCIe 80 GiB
  • RAM: 1 TiB
  • STORAGE: 14 TiB HDD
  • Network: 100Gb/s

Get In Touch

Warsaw University of Technology
Faculty of Mathematics and Information Science
Koszykowa 75, Warsaw, Poland

maciej.grzenda at pw.edu.pl

© Stream Mining and Urban Data Systems Group. All Rights Reserved. Designed by HTML Codex