The Storage Model and Data Partitioning/Sharding in TDengine

TDengine is a time-series database designed and optimized for IoT. You can get it on GitHub. This blog will introduce the Storage Model and Data Partitioning/Sharding mechanism for you.

Storage Model

The data stored by TDengine include collected time-series data, metadata related to libraries and tables, tag data, etc. These data are specifically divided into three parts:

  • Time-series data: stored in vnode and composed of data, head and last files. The amount of data is large and query amount depends on the application scenario. Out-of-order writing is allowed, but delete operation is not supported for the time being, and update operation is only allowed when update parameter is set to 1. By adopting the model with one table for each collection point, the data of a given time period is continuously stored, and the writing against one single table is a simple add operation. Multiple records can be read at one time, thus ensuring the insert and query operation of a single collection point with best performance.

Compared with the typical NoSQL storage model, TDengine stores tag data and time-series data completely separately, which has two major advantages:

  • Greatly reduce the redundancy of tag data storage: general NoSQL database or time-series database adopts K-V storage, in which Key includes timestamp, device ID and various tags. Each record carries these duplicates, so wasting storage space. Moreover, if the application needs to add, modify or delete tags on historical data, it has to traverse the data and rewrite again, which is extremely expensive to operate.

Data Sharding

For large-scale data management, to achieve scale-out, it is generally necessary to adopt the a Partitioning strategy as Sharding. TDengine implements data sharding via vnode, and time-series data partitioning via one data file for each time range.

VNode (Virtual Data Node) is responsible for providing writing, query and calculation functions for collected time-series data. To facilitate load balancing, data recovery and support heterogeneous environments, TDengine splits a data node into multiple vnodes according to its computing and storage resources. The management of these vnodes is done automatically by TDengine and completely transparent to the application.

For a single data collection point, regardless of the amount of data, a vnode (or vnode group, if the number of replicas is greater than 1) has enough computing resource and storage resource to process (if a 16-byte record is generated per second, the original data generated in one year will be less than 0.5 G), so TDengine stores all the data of a table (a data collection point) in one vnode instead of distributing the data to two or more dnodes. Moreover, a vnode can store data from multiple data collection points (tables), and the upper limit of the tables’ quantity for a vnode is one million. By design, all tables in a vnode belong to the same DB. On a data node, unless specially configured, the number of vnodes owned by a DB will not exceed the number of system cores.

When creating a DB, the system does not allocate resources immediately. However, when creating a table, the system will check if there is an allocated vnode with free tablespace. If so, the table will be created in the vacant vnode immediately. If not, the system will create a new vnode on a dnode from the cluster according to the current workload, and then a table. If there are multiple replicas of a DB, the system does not create only one vnode, but a vgroup (virtual data node group). The system has no limit on the number of vnodes, which is just limited by the computing and storage resources of physical nodes.

The meda data of each table (including schema, tags, etc.) is also stored in vnode instead of centralized storage in mnode. In fact, this means sharding of meta data, which is convenient for efficient and parallel tag filtering operations.

Data Partitioning

In addition to vnode sharding, TDengine partitions the time-series data by time range. Each data file contains only one time range of time-series data, and the length of the time range is determined by DB’s configuration parameter “days”. This method of partitioning by time rang is also convenient to efficiently implement the data retention strategy. As long as the data file exceeds the specified number of days (system configuration parameter ‘keep’), it will be automatically deleted. Moreover, different time ranges can be stored in different paths and storage media, so as to facilitate the cold/hot management of big data and realize tiered-storage.

In general, TDengine splits big data by vnode and time as two dimensions, which is convenient for parallel and efficient management with scale-out.

Load Balancing

Each dnode regularly reports its status (including hard disk space, memory size, CPU, network, number of virtual nodes, etc.) to the mnode (virtual management node) for declaring the status of the entire cluster. Based on the overall state, when an mnode finds an overloaded dnode, it will migrate one or more vnodes to other dnodes. In the process, external services keep running and the data insertion, query and calculation operations are not affected.

If the mnode has not received the dnode status for a period of time, the dnode will be judged as offline. When offline lasts a certain period of time (the duration is determined by the configuration parameter ‘offlineThreshold’), the dnode will be forcibly removed from the cluster by mnode. If the number of replicas of vnodes on this dnode is greater than one, the system will automatically create new replicas on other dnodes to ensure the replica number. If there are other mnodes on this dnode and the number of mnodes replicas is greater than one, the system will automatically create new mnodes on other dnodes to ensure t the replica number.

When new data nodes are added to the cluster, with new computing and storage are added, the system will automatically start the load balancing process.

The load balancing process does not require any manual intervention without application restarted. It will automatically connect new nodes with completely transparence. Note: load balancing is controlled by parameter “balance”, which determines to turn on/off automatic load balancing.

TAOS Data's flagship product, TDengine, is an open-source AIoT big data platform GitHub: https://github.com/taosdata/TDengine