turbodbc
Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. For maximum performance, turbodbc offers built-in NumPy and Apache Arrow support and internally relies on batched data transfer instead of single-record communication as other popular ODBC modules do.
Building turbodbc with pyarrow support has some caveats as it has build time detection if pyarrow is installed and needs pybind and several debian dev packages to get the C++ compilation.
By using docker multistage builds we can natively build turbodbc with pyarrow support without getting the dev packages into the final image.
First step is the base image that has all necessary debian packages to run turbodbc later on:
# syntax=docker/dockerfile:1
FROM debian:bullseye as base
# Create user, must not be ROOT and UID should be greater than 1000
RUN useradd --uid 1100 app --create-home
RUN apt-get update
RUN --mount=type=cache,target=/var/cache/apt apt-get install --yes python3 python3-venv git
RUN --mount=type=cache,target=/var/cache/apt apt-get install --yes libodbc1 odbcinst odbcinst1debian2 binutils-x86-64-linux-gnu
RUN python3 -m venv /opt/venv
ENV PATH="/opt/venv/bin:${PATH}"
WORKDIR /app/
ENV PYTHONPATH=/app/
In the second stage we install the build requirements that are only needed to compile turbodbc with arrow support. There are two important notes:
Firstly pyarrow has to be installed before turbodbc is build as the turbodbc build process automatically detects if pyarrow is available.
To make the detection work you need to pass --no-build-isolation
to
the turbodbc install and make sure the arrow libraries are linked correctly.
FROM base as builder
RUN --mount=type=cache,target=/var/cache/apt apt-get -yq install \
build-essential \
gdb \
lcov \
libbz2-dev \
libffi-dev \
libgdbm-dev \
liblzma-dev \
libboost-dev \
libncurses5-dev \
libreadline6-dev \
libsqlite3-dev \
libssl-dev \
lzma \
lzma-dev \
python3-dev \
tk-dev \
unixodbc-dev \
uuid-dev \
xvfb \
zlib1g-dev
RUN pip3 install -U pip==22.0.4 setuptools==45.2.0 wheel==0.37.1
RUN pip3 install -U pybind11==2.10.1 numpy==1.23.5 pandas==1.5.2 six==1.16.0 pyarrow==5.0.0
RUN python3 -c "import pyarrow; pyarrow.create_library_symlinks()" \
&& CPPFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" pip3 install --no-build-isolation turbodbc==4.5.5
In the third stage we create a fresh stage and only reuse venv with the turbodbc build packages
FROM base as runner
COPY --from=builder /opt/venv /opt/venv
COPY requirements.txt /app/requirements.txt
RUN --mount=type=cache,target=/root/.cache pip install --requirement /app/requirements.txt
# Set the User we created above
USER 1100
CMD []