Interfacing with dynamic libraries without extensions

Thanks to ctypes (a module in the standard library) or cffi (an external package), you can integrate just about every compiled dynamic/shared library in Python no matter in what language it was written. And you can do that in pure Python without any compilation steps, so this is an interesting alternative to writing extensions in C.

This does not mean you don't need to know anything about C. Both solutions require from you a reasonable understanding of C and how dynamic libraries work in general. On the other hand, they remove the burden of dealing with Python reference counting and greatly reduce the risk of making painful mistakes. Also interfacing with C code through ctypes or cffi is more portable than writing and compiling the C extension module.

ctypes

ctypes is the most popular module to call functions from dynamic or shared libraries without the need of writing custom C extensions. The reason for that is obvious. It is part of the standard library, so it is always available and does not require any external dependencies. It is a foreign function interface (FFI) library and provides an API for creating C-compatible datatypes.

Loading libraries

There are four types of dynamic library loaders available in ctypes and two conventions to use them. The classes that represent dynamic and shared libraries are ctypes.CDLL, ctypes.PyDLL, ctypes.OleDLL, and ctypes.WinDLL. The last two are only available on Windows, so we won't discuss them here. The differences between CDLL and PyDLL are as follows:

  • ctypes.CDLL: This class represents loaded shared libraries. The functions in these libraries use the standard calling convention, and are assumed to return int. GIL is released during the call.
  • ctypes.PyDLL: This class works like CDLL, but GIL is not released during the call. After execution, the Python error flag is checked and an exception is raised if it is set. It is only useful when directly calling functions from Python/C API.

To load a library, you can either instantiate one of the preceding classes with proper arguments or call the LoadLibrary() function from the submodule associated with a specific class:

  • ctypes.cdll.LoadLibrary() for ctypes.CDLL
  • ctypes.pydll.LoadLibrary() for ctypes.PyDLL
  • ctypes.windll.LoadLibrary() for ctypes.WinDLL
  • ctypes.oledll.LoadLibrary() for ctypes.OleDLL

The main challenge when loading shared libraries is how to find them in a portable way. Different systems use different suffixes for shared libraries (.dll on Windows, .dylib on OS X, .so on Linux) and search for them in different places. The main offender in this area is Windows, that does not have a predefined naming scheme for libraries. Because of that, we won't discuss the details of loading libraries with ctypes on this system and concentrate mainly on Linux and Mac OS X that deal with this problem in a consistent and similar way. If you are anyway interested in Windows platform, refer to the official ctypes documentation that has plenty of information about supporting that system (refer to https://docs.python.org/3.5/library/ctypes.html).

Both library loading conventions (the LoadLibrary() function and specific library-type classes) require you to use the full library name. This means all the predefined library prefixes and suffixes need to be included. For example, to load the C standard library on Linux, you need to write the following:

>>> import ctypes
>>> ctypes.cdll.LoadLibrary('libc.so.6')
<CDLL 'libc.so.6', handle 7f0603e5f000 at 7f0603d4cbd0>

Here, for Mac OS X, this would be:

>>> import ctypes
>>> ctypes.cdll.LoadLibrary('libc.dylib')

Fortunately, the ctypes.util submodule provides a find_library() function that allows to load a library using its name without any prefixes or suffixes and will work on any system that has a predefined scheme for naming shared libraries:

>>> import ctypes
>>> from ctypes.util import find_library
>>> ctypes.cdll.LoadLibrary(find_library('c'))
<CDLL '/usr/lib/libc.dylib', handle 7fff69b97c98 at 0x101b73ac8>
>>> ctypes.cdll.LoadLibrary(find_library('bz2'))
<CDLL '/usr/lib/libbz2.dylib', handle 10042d170 at 0x101b6ee80>
>>> ctypes.cdll.LoadLibrary(find_library('AGL'))
<CDLL '/System/Library/Frameworks/AGL.framework/AGL', handle 101811610 at 0x101b73a58>

Calling C functions using ctypes

When the library is successfully loaded, the common pattern is to store it as a module-level variable with the same name as library. The functions can be accessed as object attributes, so calling them is like calling a Python function from any other imported module:

>>> import ctypes
>>> from ctypes.util import find_library
>>> 
>>> libc = ctypes.cdll.LoadLibrary(find_library('c'))
>>> 
>>> libc.printf(b"Hello world!
")
Hello world!
13

Unfortunately, all the built-in Python types except integers, strings, and bytes are incompatible with C datatypes and thus must be wrapped in the corresponding classes provided by the ctypes module. Here is the full list of compatible datatypes that comes from the ctypes documentation:

ctypes type

C type

Python type

c_bool

_Bool

bool (1)

c_char

char

1-character bytes object

c_wchar

wchar_t

1-character string

c_byte

char

int

c_ubyte

unsigned char

int

c_short

short

int

c_ushort

unsigned short

int

c_int

int

int

c_uint

unsigned int

int

c_long

long

int

c_ulong

unsigned long

int

c_longlong

__int64 or long long

int

c_ulonglong

unsigned __int64 or unsigned long long

int

c_size_t

size_t

int

c_ssize_t

ssize_t or Py_ssize_t

int

c_float

float

float

c_double

double

float

c_longdouble

long double

float

c_char_p

char * (NUL terminated)

bytes object or None

c_wchar_p

wchar_t * (NUL terminated)

string or None

c_void_p

void *

int or None

As you can see, the preceding table does not contain dedicated types that would reflect any of the Python collections as C arrays. The recommended way to create types for C arrays is to simply use the multiplication operator with the desired basic ctypes type:

>>> import ctypes
>>> IntArray5 = ctypes.c_int * 5
>>> c_int_array = IntArray5(1, 2, 3, 4, 5)
>>> FloatArray2 = ctypes.c_float * 2
>>> c_float_array = FloatArray2(0, 3.14)
>>> c_float_array[1]
3.140000104904175

Passing Python functions as C callbacks

It is a very popular design pattern to delegate part of the work of function implementation to custom callbacks provided by the user. The most known function from the C standard library that accepts such callbacks is a qsort() function that provides a generic implementation of the Quicksort algorithm. It is rather unlikely that you would like to use this algorithm instead of the default Python Timsort that is more suited for sorting Python collections. Anyway, qsort() seems to be a canonical example of an efficient sorting algorithm and a C API that uses the callback mechanism that is found in many programming books. This is why we will try to use it as an example of passing the Python function as a C callback.

The ordinary Python function type will not be compatible with the callback function type required by the qsort() specification. Here is the signature of qsort() from the BSD man page that also contains the type of accepted callback type (the compar argument):

void qsort(void *base, size_t nel, size_t width,
           int (*compar)(const void *, const void *));

So in order to execute qsort() from libc, you need to pass:

  • base: This is the array that needs to be sorted as a void* pointer.
  • nel: This is the number of elements as size_t.
  • width: This is the size of the single element in the array as size_t.
  • compar: This is the pointer to the function that is supposed to return int and accepts two void* pointers. It points to the function that compares the size of two elements being sorted.

We already know from the Calling C functions using ctypes section how to construct the C array from other ctypes types using the multiplication operator. nel should be size_t, and it maps to Python int, so it does not require any additional wrapping and can be passed as len(iterable). The width value can be obtained using the ctypes.sizeof() function once we know the type of our base array. The last thing we need to know is how to create the pointer to the Python function compatible with the compar argument.

The ctypes module contains a CFUNTYPE() factory function that allows us to wrap Python functions and represents them as C callable function pointers. The first argument is the C return type that the wrapped function should return. It is followed by the variable list of C types that the function accepts as its arguments. The function type compatible with the compar argument of qsort() will be:

CMPFUNC = ctypes.CFUNCTYPE(
    # return type
    ctypes.c_int,
    # first argument type
    ctypes.POINTER(ctypes.c_int),
    # second argument type
    ctypes.POINTER(ctypes.c_int),
)

Note

CFUNTYPE() uses the cdecl calling convention, so it is compatible only with the CDLL and PyDLL shared libraries. The dynamic libraries on Windows that are loaded with WinDLL or OleDLL use the stdcall calling convention. This means that the other factory must be used to wrap Python functions as C callable function pointers. In ctypes, it is WINFUNCTYPE().

To wrap everything up, let's assume that we want to sort a randomly shuffled list of integer numbers with a qsort() function from the standard C library. Here is the example script that shows how to do that using everything that we have learned about ctypes so far:

from random import shuffle

import ctypes
from ctypes.util import find_library

libc = ctypes.cdll.LoadLibrary(find_library('c'))

CMPFUNC = ctypes.CFUNCTYPE(
    # return type
    ctypes.c_int,
    # first argument type
    ctypes.POINTER(ctypes.c_int),
    # second argument type
    ctypes.POINTER(ctypes.c_int),
)


def ctypes_int_compare(a, b):
    # arguments are pointers so we access using [0] index
    print(" %s cmp %s" % (a[0], b[0]))

    # according to qsort specification this should return:
    # * less than zero if a < b
    # * zero if a == b
    # * more than zero if a > b
    return a[0] - b[0]


def main():
    numbers = list(range(5))
    shuffle(numbers)
    print("shuffled: ", numbers)

    # create new type representing array with length
    # same as the length of numbers list
    NumbersArray = ctypes.c_int * len(numbers)
    # create new C array using a new type
    c_array = NumbersArray(*numbers)

    libc.qsort(
        # pointer to the sorted array
        c_array,
        # length of the array
        len(c_array),
        # size of single array element
        ctypes.sizeof(ctypes.c_int),
        # callback (pointer to the C comparison function)
        CMPFUNC(ctypes_int_compare)
    )
    print("sorted:   ", list(c_array))


if __name__ == "__main__":
    main()

The comparison function provided as a callback has an additional print statement, so we can see how it is executed during the sorting process:

$ python ctypes_qsort.py 
shuffled:  [4, 3, 0, 1, 2]
 4 cmp 3
 4 cmp 0
 3 cmp 0
 4 cmp 1
 3 cmp 1
 0 cmp 1
 4 cmp 2
 3 cmp 2
 1 cmp 2
sorted:    [0, 1, 2, 3, 4]

CFFI

CFFI is a Foreign Function Interface for Python that is an interesting alternative to ctypes. It is not a part of the standard library but is easily available as a cffi package on PyPI. It is different from ctypes because it puts more emphasis on reusing plain C declarations instead of providing extensive Python APIs in a single module. It is way more complex and also has a feature that also allows you to automatically compile some parts of your integration layer into extensions using C compiler. So it can be used as a hybrid solution that fills the gap between C extensions and ctypes.

Because it is a very large project, it is impossible to shortly introduce it in a few paragraphs. On the other hand, it would be a shame to not say something more about it. We have already discussed one example of integrating the qsort() function from the standard library using ctypes. So, the best way to show the main differences between these two solutions will be to re-implement the same example with cffi. I hope that one block of code is worth more than a few paragraphs of text:

from random import shuffle

from cffi import FFI

ffi = FFI()

ffi.cdef("""
void qsort(void *base, size_t nel, size_t width,
           int (*compar)(const void *, const void *));
""")
C = ffi.dlopen(None)


@ffi.callback("int(void*, void*)")
def cffi_int_compare(a, b):
    # Callback signature requires exact matching of types.
    # This involves less more magic than in ctypes
    # but also makes you more specific and requires
    # explicit casting
    int_a = ffi.cast('int*', a)[0]
    int_b = ffi.cast('int*', b)[0]
    print(" %s cmp %s" % (int_a, int_b))

    # according to qsort specification this should return:
    # * less than zero if a < b
    # * zero if a == b
    # * more than zero if a > b
    return int_a - int_b


def main():
    numbers = list(range(5))
    shuffle(numbers)
    print("shuffled: ", numbers)

    c_array = ffi.new("int[]", numbers)

    C.qsort(
        # pointer to the sorted array
        c_array,
        # length of the array
        len(c_array),
        # size of single array element
        ffi.sizeof('int'),
        # callback (pointer to the C comparison function)
        cffi_int_compare,
    )
    print("sorted:   ", list(c_array))

if __name__ == "__main__":
    main()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset