01 Characteristics of Time-Series Data

Let’s observe some time-series data, and try to summarize characteristics of time-series data.

  1. Some time-series data keep constant for a long time with anomalous fluctuations at some point
  2. Some time-series data changing in a trend in a period
  3. Some time-series data fluctuate around a constant value in a certain range with different frequencies
  4. ……

02 The Data Model of TDengine

TDengine’s data model mainly has the following characteristics:

  1. One table for one data collection point
  2. Data in a table is contiguously stored in the form of block in a file
  3. The size of a data block can be configured
  4. Use Block Range INdex (BRIN) technique

03 TSDB Storage Engine

TSDB is TDengine’s storage engine developed to efficiently process time-series data by taking advantage of the data features. In addition to storage interfaces, TSDB also provides query interfaces. TSDB stores the META data and time-series data for the tables in a vnode, where time-series data can be stored in both row and columnar format (Row storage is available after TDengine 2.0). In RAM, time-series data is indexed through SkipList, whereas in hard disk, it is indexed through Block Range INdex (BRIN).

META Data

TSDB stores the META data for tables in a vnode. META includes SCHEMA of Tables/Super Tables, child tables’ TAG values, TAG SCHEMA and child tables / Super Tables‘ dependability. The creation, update, and deletion and other executions of META data will be performed in RAM first, and then serialized and written into hard disk.

META Data Persistent Storage

When META data is written into RAM, serialized record will be created at the same time, and stored into RAM’s buffer using the append only form. As RAM data reaches a certain amount, commit operation will be triggered. During the commit, updated serialized META data will be written into the META file in the hard disk in the append only form. Every table’s latest status, the update and deletion of tables will be appended to the META file, and serialize into a record.

Time-Series Data

TSDB is also responsible for storing the time-series data (collected data) for tables in a vnode. Time-series data will be written into TSDB’s pre-allocated RAM buffer region, and when the data in the buffer region reaches a certain amount, commit will be triggered, and persistent storage will be performed.

Time-Series Data Persistent Storage

When data within TSDB’s memory reaches a certain amount, commit will be triggered. During the commit, time-series data changes from row storage format to column storage format. In addition, BRIN query is maintained to introduce the LAST file and SUB-BLOCK mechanism to process file fragmentation. Row storage format is as shown below.

TSDB Workflow

When TSDB is started, a BUFFER POOL is pre-allocated as the writing buffer (16 MB*6=96 MB by default), buffer block’s size and number can be paired, and the block’s number can be edited. META data and time-series data applies for writing space from the buffer block, the writing engine applies for buffer clock from BUFFER POOL, and commit is triggered when fully written buffer block occupies 1/3 of the total buffer block. During the commit, the buffer block’s data is written into META and other files, and buffer blocks are return to the BUFFER POOL after the placement is ends, forming a cycling mechanism. Merged query is performed during query for MEM, IMEM, and data in data files as shown in the figure below.

Advantages of TSDB’s Design

  1. High efficiency for query in a time range for a single table
  2. Memory is fully utilized and can buffer more data
  3. The columnar format for data in file can achieve higher data compression ratio
  4. Avoid extra file merge compared with LSM
  5. Tag data is stored separately from the time-series data

Summary

This article primarily explained the TDengine’s storage engine. However, fine-tuned compression algorithms and the design of the querying model also leads to TDengine’s high performance and low storage space. Follow us for upcoming contents to explain why TDengine has such high performance.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
TDengine

TDengine

Open-source, cloud-native time-series database optimized for IoT. See more at https://tdengine.com or https://github.com/taosdata/TDengine