Mastering Python UDFs in TDengine: From Beginner to Pro | TDengine

TDengine
7 min readJan 2, 2025

With the release of TDengine 3.0.4.0, we unveiled an exciting new feature: the ability to create User-Defined Functions (UDFs) using Python. This innovative capability delivers unparalleled flexibility for database operations while making customization more accessible-even for those new to programming. With Python UDFs, users can easily tailor and manage their databases, seamlessly integrating these custom functions into SQL queries as if they were built-in. In this article, we’ll explore how to leverage this feature effectively and help you take the first steps toward creating your own database customizations.

Getting Started with UDFs

Creating a UDF

Here’s how to create a Python UDF function in TDengine:

       CREATE  [OR REPLACE] [AGGREGATE] FUNCTION function_name
as library_path OUTPUTTYPE output_type [BUFSIZE buffer_size] [LANGUAGE 'C‘ | 'Python']

Options Explained:

  1. CREATE [OR REPLACE]: Use CREATE for new functions and add OR REPLACE to update existing ones.
  2. AGGREGATE: Optional. Indicates an aggregate function. Without it, the function defaults to a scalar function.
  3. function_name: The name of the UDF, which can be up to 64 bytes long. Names exceeding this limit will be truncated.
  4. OUTPUTTYPE: Specifies the output data type of the UDF. Supported types include:

5. BUFSIZE: Sets the memory buffer size for aggregate functions (up to 256 KB). This buffer is allocated for the lifetime of the aggregation process, making it useful as a global variable.

6. LANGUAGE: Specifies the programming language. Currently, TDengine supports Python and C.

To remove a UDF, use the following command:

Deleting a UDF

To remove a UDF, use the following command:

DROP FUNCTION function_name;

Viewing UDFs

To view all created UDFs:

SHOW FUNCTIONS;

For detailed information:

SELECT * FROM information_schema.ins_functions\G;

Setting Up the Environment

Before diving into Python UDF development, ensure your environment is ready:

  1. CMake: Minimum version 3.0.2.
  2. GCC: Required for compiling the shared library (.so file). Minimum version: 7.5.
  3. Python: Version 3.7 or higher.

Install the UDF plugin with the following command:

python3 -m pip install taospyudf 
ldconfig

Your development environment is now ready!

Writing UDFs

UDFs in TDengine fall into two categories: scalar and aggregate functions. Before diving into the details of the functions, let’s first take a look at the process of how custom functions are invoked:

How UDFs Work

TDengine processes data in blocks for efficiency. When a UDF is invoked, the input data is a block, and you can access any cell in the block using the data(row, col) method. This approach minimizes the overhead of calls between the C framework and Python, boosting performance.

  • Scalar Functions: Must return the same number of rows as the input.
  • Aggregate Functions: Aggregate data and return a single row.

Creating a Scalar Function

Let’s create a Python UDF equivalent of the CONCAT string concatenation function:

  1. Write the Function

The function must include the following methods:

  • init: Called once during the initialization of the UDF module. Use this to perform any necessary setup tasks.
  • destroy: Called once when the UDF module exits. Use this for cleanup tasks.
  • : The main function that processes each incoming data block. Use the shape() method to retrieve the number of rows and columns in the data block.
  • nrows: Returns the number of rows in the data block.
  • : Returns the number of columns, which corresponds to the number of parameters passed to the concat() function.

Return Values:

  • A scalar function must return a list. If the return value is not a list, an error will occur.
  • The number of elements in the returned list must match the number of rows (nrows) in the data block. Mismatched row counts will result in an error.

2. Create the Function

create function py_concat as 'https://eujqw4hwudm.exactdn.com/home/py_concat.py' outputtype varchar(256) language 'Python';

3. Execute the function Use the UDF like any built-in function:

select sf_concat(factory_name,room_name), concat(factory_name,room_name) from devices;

Creating an Aggregate Function

Aggregate functions perform computations on data and output a single aggregated result. Let’s create a Python UDF for counting rows:

  1. Write the Function

Implementation Principles:

  • During the start initialization callback, the value 0 is stored in the buffer (buf) as the initial value for counting.
  • In the reduce function, as data blocks are processed, any non-None values are added to the cumulative count. The result is stored back into the buffer (buf), which is passed as a parameter during subsequent calls to reduce. This allows the buffer to be reused across iterations.
  • Finally, in the finish function, the buffer (buf) is passed as a parameter. The value stored in the buffer is retrieved and returned as the final result, representing the total count.

Return Value:

  • The return type of the function must match the OUTPUTTYPE specified when the UDF function was created. If the return type is incorrect, an error will be triggered.
  • Returning a None object is allowed, but it must be handled appropriately.

2. Create the UDF

create aggregate function af_count as ''https://eujqw4hwudm.exactdn.com/home/af_count.py'' outputtype bigint bufsize 4096 language 'Python';

3. Execute the UDF

select af_count(col1) from devices;

Data Type Mappings

Python interacts with C through specific data type mappings. Key points include:

  1. Binary, NCHAR, and VARCHAR map to Python’s bytes object. Use appropriate decoding:

a. For binary: bytes.decode('utf-8')

b. For nchar: bytes.decode('utf_32_le')

2. When returning a str object:

a. For binary: str.encode('utf-8')

b. For nchar: str.encode('utf_32_le')

Development Tips

Updating Function Code

The .py file path specified during UDF creation is only used at that moment. The file’s content is stored in the mnode, making the function accessible across the cluster. After creation, the .py file is no longer needed.

To update the UDF code, you must update mnode with the new content. Use the OR REPLACE option to overwrite the existing function and apply the updated code for future calls.

CREATE [OR REPLACE] FUNCTION function_name AS library_path OUTPUTTYPE output_type [LANGUAGE 'C' | 'Python'];

Logging

Python UDFs in TDengine do not support direct debugging of Python code but do support logging. You can use commonly used libraries like logging within UDF functions. It is recommended to output logs to a file for review. Note that messages printed using the print function will not be visible, so avoid using it for output. For example:

Exception Handling

Raise exceptions to terminate queries if needed. Exceptions are logged in taosudfpy.log.

Checking UDF Framework Logs

If the UDF framework returns an error, check the log files in the TDengine log directory for details:

  1. taospyudf.log Logs Python UDF activity, including function loading, execution errors, and exceptions. Use this log when debugging Python UDFs.
  2. udfdlog.0 Logs framework-level issues for all UDF languages (C, Python, etc.). Check this log for broader framework errors, though it's rarely needed.

Common Errors and Solutions

Errors 10 and 12 are the most common issues when developing Python UDFs. For detailed causes, refer to the taospyudf.log file.

Examples and Resources

TDengine’s open-source repository provides several Python UDF test examples for reference:

Steps to Run Test Cases:

  1. Ensure Python3 is correctly installed.
  2. Clone the TDengine OSS code.
  3. Navigate to the TDengine/tests/system-test/ directory.
  4. Run the test case using:
python3 test.py -f others/udfpy_main.py

Notes:

1. Common Issues with UDF Scalar Functions

  • Row Count Mismatch: The number of rows returned by a scalar function must match the number of input rows.

a. If the logic is complex, it’s easy to miss rows during processing. Be cautious to avoid such errors.

2. OUTPUTTYPE Mismatch

  • Output Type Consistency: The return type of the UDF must match the OUTPUTTYPE specified during its creation.

a. Mismatched types will result in an error.

  • Scalar Functions: The return value of the process function must be a list, and all elements in the list must match the data type specified by OUTPUTTYPE.
  • Aggregate Functions: The final return value in the finish function must match the OUTPUTTYPE as a Python data type.

3. Error: Unable to Load libtaospyudf.so

  • Check Installation: Ensure the taospyudf plugin was installed properly.

a. The library is typically installed in /usr/local/lib/libtaospyudf.so or Python’s plugin directory.

b. Use the command below to locate the file:

find / -name 'libtaospyudf.so'
  • Missing File: If the file is missing, the plugin installation may have failed. Reinstall the plugin to resolve the issue.
  • File Exists but Still Fails:

a. Use ldd to check the library’s dependencies:

ldd /usr/local/lib/libtaospyudf.so
  • Missing Python dependencies are a common issue. For example, an improperly installed Python environment can cause errors.
  • Solution: Reinstall Python (e.g., Python 3.9) correctly to resolve the dependency issue.

Conclusion

The Python UDF feature in TDengine is an exciting step forward, offering unmatched customization for your database. This is just the beginning—we’re continuously improving and expanding its capabilities. We invite you to explore this feature, unleash your creativity, and share your feedback to make it even better!

Originally published at https://tdengine.com on January 2, 2025.

Sign up to discover human stories that deepen your understanding of the world.

TDengine
TDengine

Written by TDengine

A High-Performance, Scalable Time-Series Database for Industrial IoT. See more at https://tdengine.com

No responses yet

Write a response