Which historical backend should I use?

The ThinkingCity platform provides several backends where historical datasets related to real-time context information can be created and persisted. Among these backends we can find:

  • PostgreSQL, the well-known relational database manager.
  • MongoDB, a highly scalable NoSQL database used for large-scale historical storage.

Currently, these are the two most commonly used backends. However, there are a large number of backends available, the key question is: which one best suits my use case? This usually depends on the following variables:

  • Amount of data. If you can estimate the volume of data in the short, medium, and long term, this will be one of the most relevant decisions.
  • Privacy considerations. Maybe your data can be made public as Open Data; or not.
  • Desired analysis. The main goal of storing data is to analyze it to obtain valuable information.
  • Visualization tools. Closely related to the above, graphical tools are useful when handling large volumes of information.

A fifth variable could be whether the backend allows static data loading, meaning uploading other datasets independent from the real-time contextual data.

Let's analyze these two main options.

PostgreSQL

PostgreSQL is the foundation for the solution's relational data persistence, due to its robustness, extensibility, and strict adherence to SQL standards.

This backend is ideal for managing large volumes of data. PostgreSQL ensures transactional integrity and offers powerful extensions that significantly enhance its functionality.

For example, PostGIS is the most widely used extension for geospatial data analysis and visualization in dashboards, while TimescaleDB is designed for the efficient handling of time-series data, such as those generated by sensors, performance metrics, or event logs, optimizing fast data ingestion, time-based aggregations, and historical data compression.

These extensions make PostgreSQL a high-performance database specialized in the storage, querying, and analysis of both geospatial and time-series data.

PostgreSQL provides standard aggregate functions, along with the ability to define custom aggregates, work with complex expressions, and use window functions to perform advanced analysis on large and complex aggregated data sets.

The stored data originates both from the platform's internal processes and from periodic batch loading processes (ETL/ELT). This data can then be analyzed or visualized, facilitating interpretation and utilization by analysts and end users.

MongoDB

MongoDB is currently the most widely used persistence solution in the Big Data environment. As a document-oriented NoSQL database, it allows managing massive volumes of data with high efficiency and horizontal scalability.

This backend is ideal for managing large amounts of data. Moreover, MongoDB is flexible in terms of data schema, which makes it easier to adapt to changes in input formats without redesigning structures.

Regarding analysis, although MongoDB does not provide visual analytics tools by itself, it is used to store operational information for the Dashboard tool, not as a direct data source. It allows complex aggregate queries to be run through its aggregation framework or easily integrating with external tools such as Grafana, Metabase, among others.

In terms of privacy, MongoDB manages access through roles and authentication at the collection and database level, allowing fine-grained control over permissions.

MongoDB also supports static data loading, so you can integrate your own datasets (e.g., historical records, catalogs, etc.) and analyze them along with the data generated by the ThinkingCity platform.

Summary

Backend Capacity Privacy Analysis Visualization Static Load
PostgreSQL High Role-based access control, Row-level security Standard & custom aggregates Window functions Geospatial analysis(PostGIS) Integration with Grafana/Metabase Yes
MongoDB High Per MongoDB roles and users Complex aggregations
BI integration
Integration with Grafana/Metabase Yes

Top