As enterprises expand into big data and IoT, we are seeing the 15 minute data collection intervals that were common in the past giving way to scenarios in which data is collected every minute, 30 seconds, or even every second. At the same time, the number of devices generating data in IoT scenarios is reaching the millions. With this many devices sending this much data to the cloud, it is no easy task for database management systems to ingest, store, and query it.
These changes have caused the following drawbacks of traditional real-time databases to become evident:
- They lack horizontal scalability, forcing them to scale up as data sets increase in size.
- Their architectures are outdated, reliant on hard disk arrays, and mostly run on Windows.
- They do not have strong support for data analysis or relevant interfaces.
- They cannot be deployed in the cloud or as a PaaS.
Many enterprises have moved to a general-purpose architecture based on Hadoop with other components such as Kafka, Redis, and Spark. However, in time-series applications, these architectures suffer from inefficient development and operations, complex system maintenance, and slow time to market.
For a scenario like IoT, what you really need is a modern, purpose-built time-series database. That said, none of the TSDBs on the market today has been able to take over the IoT market segment due to the following drawbacks:
- Insufficient feature set: integration with third-party components is required to support time-series applications
- No support for standard SQL: users are forced to learn a new language
- Limited scalability: solutions are not truly cloud native
TDengine 3.0 is the first time-series database management system that is designed for IoT. It can support billions of time series and includes built-in caching, stream processing, and data subscription components. This article describes how the TDengine 3.0 architecture enables high performance and low costs in IoT scenarios.
Unique Distributed Architecture
As shown in the figure, each data node (dnode) in a TDengine 3.0 cluster is virtualized into virtual nodes (vnodes), management nodes (mnodes), query nodes (qnodes), and stream nodes (snodes). Qnodes are the compute component of a TDengine cluster. The scheduler assigns one or more qnodes to perform computations for each query task. Snodes are the stream processing component of a TDengine cluster.
For metadata management at the database level, TDengine 3.0 ensures the consistency of key operations by means of two-phase commits. As an example, consider the supertable creation shown in the preceding figure. The SQL statement used to create the supertable is parsed into a request that is sent to the mnode. The mnode forwards the request to the appropriate vnodes. The vnodes complete the request and notify the mnode, which sends a response to the client.
However, a real-world scenario has much more metadata at the table level than at the database level. Unlike previous versions of TDengine, in which this metadata was centrally managed, TDengine 3.0 stores it on the vnodes themselves, enabling horizontal scalability. When an application queries or writes to a table, the request that it generates is sent to the appropriate vnode based on the hash value of the specified table name.
Another issue that can affect performance is the vast amounts of tag data that can be in a TDengine table. For example, if the metadata for a scenario with 1 million devices and 1,000 tags per device — around a gigabyte of metadata — is stored in memory, the system will become extremely slow.
To address this issue, TDengine 3.0 uses a least recently used (LRU) replacement policy so that only data accessed frequently is stored in memory. A TDengine 3.0 deployment with tens of millions of metadata entries can return table queries in milliseconds, even when the memory is not sufficient to cache all metadata. TDengine also includes a TTL mechanism so that tables intended for short-term storage can be deleted upon expiration.
With LRU cache replacement, TDengine can manage table with tens of millions of tables. Unlike Prometheus, TDengine supports multiple fields per table. With 10 metrics per table, this means TDengine can support billions of timelines in a deploment.
Routing information is another important aspect for consideration. In previous versions, TDengine recorded the routing information for each table on the mnode, which would then forward it to vnodes as needed. However, this lengthens the system warmup time as the number of transaction increases. Furthermore, the required tables must all be created upon deployment, putting a burden on the system.
TDengine 3.0 innovates in this regard by using a consistent hashing algorithm for table routing information. The range of this hash value can also be adjusted to scale the system in or out. In environments containing more than 10,000 devices, data is distributed in an extremely even manner, and any skew in smaller environments can be managed by configuring an appropriate number of vgroups.
Query Optimizations
Reducing the number of disk read and writes is the basis of all optimization of query performance. With read and write speeds hovering around 100 MB/s and 70 MB/s, respectively, the only way to enable database performance beyond those numbers is to downsample and decrease stored data.
TDengine 3.0 introduces the following three query optimizations:
Blockwise SMA
- Trigger write to disk when cached data reaches a specified size
- Convert from row-oriented to column-oriented storage
- Organize data on disk in blocks
- Store aggregation results (max, min, sum, and count) for each data block in the SMA file
- Determine which aggregation results to use based on the query plan
Rollup SMA
- Three different data storage levels are supported, specifying the aggregation period and storage duration for each level
- Level 1 stores one or more original data points, and level 2 and 3 store the aggregated data.
- Rollup supports avg, sum, min, max, last, and first functions.
- This is suitable for DevOps and other scenarios that focus on data trends and in essense reduces storage overhead, speeds up queries, and discards raw data.
Time range-wise SMA
- Suitable for query scenarios where interval is used frequently.
- Uses the same logic as ordinary stream computation so you can set watermark to handle delayed data, and the actual query results will be delayed accordingly.
Tag Indexing
TDengine 3.0 features optimized tag indexing and a new TDB module, which features LRU storage of tag data. TDB queries are suitable for devices with infrequent tag changes or structured tags, such as text and numeric types.
For JSON tags, which are characterized by a large amount of tags but small data content, TDengine uses the FST indexing method. The reconstruction of indexes has always been a complex part of FST, and TDengine uses an asynchronous approach to reconstruct index data, so that table creation or table modification are not overly time-consuming.
For data caching, the basic LRU library in TDengine 3.0 can put the most recent data in memory, so that you can query recently used data even with a small amount of memory.
Elastic Compute Scaling
As shown in the figure above, when the client wants to make a detailed data query, it needs to read out the original data from the disk or memory continuously. TDengine 3.0 performs detailed data queries on the vnode so that network and disk overhead are minimized.
However, some aggregate queries, such as observing how long all vehicles are started each day in a connected cars scenario, are carried out on the qnode. You can configure different numbers of qnodes for different scenarios.
Reduced Third-Party Dependency
The vnode in TDengine 3.0 allows different consumers to consume data at the same time. With structured data, you can subscribe to only the part of the data set that you are interested in, meaning that the total amount of data transferred when using TDengine data subscription is much less than Kafka, which likely needs to pull all the data from the server and then filter the data in the client.
In addition, you can perform some common calculations on the snode and store the results in a separate vnode. This stream processing engine greatly reduces the dependence on Flink and Spark.
Conclusion
The architectural improvements of TDengine 3.0 have enabled elasticity and resilience in addition to horizontal scalability. With these improvements, TDengine is truly a simplified solution for time-series data processing.
Originally published at https://tdengine.com/tdengine-3-0-architecture/