TDengine is a time-series database designed and optimized for IoT. You can get it on GitHub. This blog will introduce the Storage Model and Data Partitioning/Sharding mechanism for you.
The data stored by TDengine include collected time-series data, metadata related to libraries and tables, tag data, etc. These data are specifically divided into three parts:
- Time-series data: stored in vnode and composed of data, head and last files. The amount of data is large and query amount depends on the application scenario. Out-of-order writing is allowed, but delete operation is not supported for the time being, and update operation is only allowed when update parameter is set to 1. By adopting the model with one table for each collection point, the data of a given time period is continuously stored, and the writing against one single table is a simple add operation. Multiple records can be read at one time, thus ensuring the insert and query operation of a single collection point with best performance.
- Tag data: meta files stored in vnode support four standard operations of add, delete, modify and check. The amount of data is not large. If there are N tables, there are N records, so all can be stored in memory. If there are many tag filtering operations, queries will be very frequent and TDengine supports multi-core and multi-threaded concurrent queries. As long as the computing resources are sufficient, even in face of millions of tables, the filtering results will return in milliseconds.
- Metadata: stored in mnode, including system node, user, DB, Table Schema and other information. Four standard operations of add, delete, modify and query are supported. The amount of these data are not large and can be stored in memory, moreover the query amount is not large because of the client cache. Therefore, TDengine uses centralized storage management, however, there will be no performance bottleneck.
Compared with the typical NoSQL storage model, TDengine stores tag data and time-series data completely separately, which has two major advantages:
- Greatly reduce the redundancy of tag data storage: general NoSQL database or time-series database adopts K-V storage, in which Key includes timestamp, device ID and various tags. Each record carries these duplicates, so wasting storage space. Moreover, if the application needs to add, modify or delete tags on historical data, it has to traverse the data and rewrite again, which is extremely expensive to operate.
- Realize extremely efficient aggregation query between multiple tables: when doing aggregation query between multiple tables, it firstly finds out the tag filtered tables, and then find out the corresponding data blocks of these tables to greatly reduce the data sets to be scanned, thus greatly improving the query efficiency. Moreover, tag data is managed and maintained in a full-memory structure, and tag data queries in tens of millions can return in milliseconds.
For large-scale data management, to achieve scale-out, it is generally necessary to adopt the a Partitioning strategy as Sharding. TDengine implements data sharding via vnode, and time-series data partitioning via one data file for each time range.
VNode (Virtual Data Node) is responsible for providing writing, query and calculation functions for collected time-series data. To facilitate load balancing, data recovery and support heterogeneous environments, TDengine splits a data node into multiple vnodes according to its computing and storage resources. The management of these vnodes is done automatically by TDengine and completely transparent to the application.
For a single data collection point, regardless of the amount of data, a vnode (or vnode group, if the number of replicas is greater than 1) has enough computing resource and storage resource to process (if a 16-byte record is generated per second, the original data generated in one year will be less than 0.5 G), so TDengine stores all the data of a table (a data collection point) in one vnode instead of distributing the data to two or more dnodes. Moreover, a vnode can store data from multiple data collection points (tables), and the upper limit of the tables’ quantity for a vnode is one million. By design, all tables in a vnode belong to the same DB. On a data node, unless specially configured, the number of vnodes owned by a DB will not exceed the number of system cores.
When creating a DB, the system does not allocate resources immediately. However, when creating a table, the system will check if there is an allocated vnode with free tablespace. If so, the table will be created in the vacant vnode immediately. If not, the system will create a new vnode on a dnode from the cluster according to the current workload, and then a table. If there are multiple replicas of a DB, the system does not create only one vnode, but a vgroup (virtual data node group). The system has no limit on the number of vnodes, which is just limited by the computing and storage resources of physical nodes.
The meda data of each table (including schema, tags, etc.) is also stored in vnode instead of centralized storage in mnode. In fact, this means sharding of meta data, which is convenient for efficient and parallel tag filtering operations.
In addition to vnode sharding, TDengine partitions the time-series data by time range. Each data file contains only one time range of time-series data, and the length of the time range is determined by DB’s configuration parameter “days”. This method of partitioning by time rang is also convenient to efficiently implement the data retention strategy. As long as the data file exceeds the specified number of days (system configuration parameter ‘keep’), it will be automatically deleted. Moreover, different time ranges can be stored in different paths and storage media, so as to facilitate the cold/hot management of big data and realize tiered-storage.
In general, TDengine splits big data by vnode and time as two dimensions, which is convenient for parallel and efficient management with scale-out.
Each dnode regularly reports its status (including hard disk space, memory size, CPU, network, number of virtual nodes, etc.) to the mnode (virtual management node) for declaring the status of the entire cluster. Based on the overall state, when an mnode finds an overloaded dnode, it will migrate one or more vnodes to other dnodes. In the process, external services keep running and the data insertion, query and calculation operations are not affected.
If the mnode has not received the dnode status for a period of time, the dnode will be judged as offline. When offline lasts a certain period of time (the duration is determined by the configuration parameter ‘offlineThreshold’), the dnode will be forcibly removed from the cluster by mnode. If the number of replicas of vnodes on this dnode is greater than one, the system will automatically create new replicas on other dnodes to ensure the replica number. If there are other mnodes on this dnode and the number of mnodes replicas is greater than one, the system will automatically create new mnodes on other dnodes to ensure t the replica number.
When new data nodes are added to the cluster, with new computing and storage are added, the system will automatically start the load balancing process.
The load balancing process does not require any manual intervention without application restarted. It will automatically connect new nodes with completely transparence. Note: load balancing is controlled by parameter “balance”, which determines to turn on/off automatic load balancing.