TDengine’s Implementation Across KYE Group: Reduced The Number of Servers From 21 to 3
Introduction: KYE Group Co., Ltd. was founded in 2007. It has honorary titles such as “National AAAAA-level Logistics Enterprise”, “National High-tech Enterprise”, “Top 30 Excellent Brands in China’s Logistics Industry”, “Famous Brand in China’s E-commerce Logistics Industry”, and “Guangdong Province Integrity Logistics Enterprise”. In the “2018 Q3 Hurun Greater China Unicorn Index” and “2019 First Quarter Hurun Greater China Unicorn Index” released by the Hurun Research Institute, KYE Group was listed twice, with a valuation of about 20 billion Chinese Yuan, together with Cainiao Network, JD Logistics, Dada-JD and other enterprises were selected as unicorn enterprises in China’s logistics service industry.
As a logistics company, how to efficiently record and process vehicle trajectory information is critical to the overall delivery efficiency.
A few years ago, the vehicle trajectory positioning storage engine project was established. Tens of thousands of vehicles purchased by the KYE Group reported information to the GPS-AGENT gateway through the on-board positioning device. The service parsing message was sent to the Apache Kafka message middleware, and then historical location positioning information is written into Apache HBase, and the latest vehicle location information is written into Redis, which is provided to business services for real-time monitoring and analysis of vehicles.
The original business structure is shown in the following figure:
In the actual operation of the original system, we also encountered many pain points. For example, because the data is stored in HBase, when we need to query the data for a large span of time, the performance of the system will drop significantly.
Specifically, it can be summarized as follows:
So we started to think, how can we improve the system to solve these pain points?
Before starting the new technology selection, we re-organized the business scenarios, which can be summarized by the following picture.
Let’s take a look at it in turn:
(1) The data is not updated nor deleted: The trajectory information is reported according to the timestamp of the actual vehicle information, and there is no need to update and delete it. They are only required to to save for a certain time limit.
(2) Transaction processing without traditional database: Because data does not need to be updated, there is no need to use transactions to ensure update security like traditional databases.
(3) The flow is stable, and the number of vehicles and the frequency of reporting within a period of time can be determined.
(4) The query analysis of data is based on time period and space area, which is related to business needs.
(5) In addition to storage and query operations, various statistics and real-time calculations need to be performed according to the actual needs of the business.
(6) The amount of data is huge, more than 50 million pieces of data are collected in one day, and it will grow with the continuous growth of the business scale.
It can be seen from the above analysis that the vehicle trajectory is typical time series data, so it is more efficient to process it with a special time series database. During the research phase, we compared several representative time series database products.
The comprehensive comparison results are as follows:
- The InfluxDB cluster version is charged, and the hardware cost is relatively high
- CTSDB Tencent Cloud Time Series Database has high memory usage and relatively high cost;
- OpenTSDB’s underlying base is still HBase, and the introduction does not make the architecture simple;
- TDengine cluster function is open source, with typical distributed database characteristics, and the compression ratio is also very high.
Through the comparisons, we believe that many excellent features of TDengine can meet our business scenarios.
So we conducted preliminary research and training based on TDengine. Specifically, it includes the following aspect:
We have tested the functions and performance of TDengine from various aspects, and the functions can fully meet our needs. The performance and compression rate have brought us great surprises.
After completing the basic function and performance tests, we conducted scenario tests and training in combination with the business, which mainly included the following aspects:
- The cluster expansion and shrinkage when data is written
- Is the application of cacheLast valid?
- Statistical aggregation analysis of some business scenario applications of interval and interp
- Override scenarios for update parameters
- Common business query statements, data comparison of the same query scope
3. Deeply Explored
Before the actual implementation of TDengine, we also deeply studied the architecture and design of the system. Here we will share briefly the core concepts of TDengine.
(1) TDengine Structure
If this is your first time using TDengine, you can take a look at the following picture. The dnode is the physical node that actually stores the data. In the dnode box, the small boxes such as V2 and V7 are called vnodes, which are virtual nodes. The m0 and m1 are the metadata management nodes that store some cluster information and table information. For those who are familiar with distributed middleware can intuitively feel that TDengine has very typical distributed database characteristics.
TDengine has a concept of a super table. For example, in the KYE Group business scenario, all vehicles become a sub-table, and all sub-tables will inherit a parent table called a super table. The super table defines the structure specification of the sub-table. , does not store the actual physical data. We can do statistical analysis and query of the data by only querying the super table, instead of summarizing each sub-table.
(3) High Compression Features
TDengine adopts a two-stage compression strategy. The first-stage compression will use delta-delta encoding, simple 8B method, zig-zag encoding, LZ4 and other algorithms, and the second-stage compression will use the LZ4 algorithm. The first-stage compression will perform specific algorithm compression for each data type, and the second-stage compression will be performed again for general purpose, provided that the parameter comp is set to 2 when building the library.
(4) The Architecture After Introducing TDengine
After thorough testing and verification, we introduced TDengine into our system. The new system architecture is shown in the following figure:
From the architecture diagram above, the in-vehicle data is still sent to Apache Kafka through the GPS-AGENT gateway for message parsing, and then one more Kafka group is opened through the application to consume messages at the same time, so as to achieve the consistency of the data at both ends.
The latest vehicle location information of the business system is no longer read through Redis, which simplifies the architecture. The query only reads TDengine, and HBase will go offline after a certain period of time.
(5) Optimization Effect
After the introduction of TDengine, from the perspective of various indicators, the data is very eye-catching.
(a) Compression Ratio
As shown in the figure, we see a table with 50,000 rows, each row is more than 600 bytes, the compressed disk size is 1665KB, and the compression rate is as high as 1%. Next, let’s look at a subtable with millions of rows.
It actually occupies a disk size of 7839KB. Our compression effect is much better than various official tests of TDengine, which should be related to the relatively high repetition of our business data.
(b) Daily Increment
Our current business daily write volume exceeds 50 million, and for TDengine, the increasing disk size is basically maintained at about 1.4G per unit.
(c) Overall Comparison of Indicators
The figure below is a comparison of various indicators before and after our actual landing.
The figure below is a comparison of data increments.
It can be seen from the comparison that TDengine has indeed greatly reduced our various costs.
(6) Questions & Suggestions
A relatively new system will inevitably encounter some problems during use. We also work with the TDengine R&D team to locate and solve them.
For example, the following is the problem we encountered in the process of using JDBC. We also notified the official PR to fix it. This is the charm of open source, that everyone can participate.
There are two places we also hope that TDengine can be further optimized:
- The monitoring function below 2.3.0.x is relatively simple, and we expect that later versions can provide stronger and more detailed monitoring. We noticed that the newly released version introduced a monitoring tool called TDinsight, and we will try it soon.
- The current interval function does not support group by common columns by business column, and we hope to be supported in the future.
Finally, in the process of trying and implementing TDengine, we have also received strong support from many colleagues of TAOS Data, and we would like to express our gratitude here.