Windows Apache Arrow Development Environment with RTools 4.0

arrow
dev-env
Author

Will Jones

Published

April 2, 2022

This guide shows how to set up Visual Studio Code to build and debug the Arrow R package. By the end of this guide, you’ll be able to:

The Arrow R package has it’s own documentation for building and testing the package, which you should also read if using this guide. It’s not as opinionated as this guide (it doesn’t specify what editor you should use), but it provides troubleshooting tips for common build errors as well as linting guidance.

If you’d like to see a video demo of the dev environment, there is a video in part II.

Part I: Setting up the environment

Install Dependencies

First install the following:

Clone the Repo

Open Git Bash (from Git for Windows), and enter

git clone https://github.com/apache/arrow.git

You can add your own fork as a remote later with:

git remote rename origin upstream
git remote add origin git@github.com:<username>/arrow.git

Create Environment Script

Open VS Code and create a new batch script file with the following contents:

:: Prepend rtools bin on path so dependency libraries can be found
set PATH=C:\rtools40\mingw64\bin;C:\rtools40\mingw64\lib;C:\rtools40\usr\bin;%PATH%

:: Prepend ARROW_HOME\bin on path so libarrow can be found
set PATH=%USERPROFILE%\arrow\cpp\build\user-r-debug-windows\dist\bin;%PATH%

:: Open VS Code
cd %USERPROFILE%\arrow
call code .

pause

This script sets environment variables needed by the build system before launching Arrow. The rtools40 bin contains the compiler toolchain (gcc, ld) that VS Code needs to be able to find.

Save the file. Move it to the desktop for easy access. Whenever you launch VS Code to debug the R Arrow package, you’ll use this script instead of opening the program.

Configuring VS Code

Use the launch script to open VS Code.

First create a new file at .vscode/settings.json and paste in:

{
    "cmake.sourceDirectory": "${workspaceFolder}/cpp",
    "cmake.buildDirectory": "${workspaceFolder}/cpp/build/",
    "cmake.configureOnOpen": false,
    "terminal.integrated.profiles.windows": {
        "RTools 4.0 64-bit Bash": {
            "path": "C:\\rtools40\\usr\\bin\\bash.exe",
            "args": [
                "--login",
                "-i"
            ],
            "env": {
                "MSYSTEM": "MINGW64",
                "CHERE_INVOKING": "1"
            }
        }
    },
    "task.autoDetect": "off"
}

The settings.json file tells CMake where the CPP folder is and provides terminal integration with the RTools Bash.

Next create a new file at cpp/CMakeUserPresets.json and paste in:

{
  "version": 3,
  "cmakeMinimumRequired": {
    "major": 3,
    "minor": 21,
    "patch": 0
  },
  "configurePresets": [
    {
      "name": "user-base",
      "hidden": true,
      "binaryDir": "${sourceDir}/build/${presetName}",
      "cacheVariables": {
        "ARROW_INSTALL_NAME_RPATH": "OFF",
        "CMAKE_INSTALL_PREFIX": "${sourceDir}/build/${presetName}/dist"
      }
    },
    {
      "name": "user-r-debug-windows",
      "inherits": ["user-base", "ninja-release", "features-filesystems"],
      "cacheVariables": {
        "ARROW_DEPENDENCY_SOURCE": "AUTO",
        "ARROW_DEPENDENCY_USE_SHARED": "OFF",
        "ARROW_EXTRA_ERROR_CONTEXT": "ON",
        "CMAKE_BUILD_TYPE": "Debug",
        "ARROW_GCS": "OFF",
        "ARROW_MIMALLOC": "OFF",
        "ARROW_BUILD_SHARED": "ON",
        "ARROW_S3": "OFF"
      }
    }
  ]
}

These presets define the build variables used when building the libarrow C++ code. Modify these settings to enable or disable optional components, such as S3 or mimalloc.

Next create a new file at tasks.json and paste in:

{
    "version": "2.0.0",
    "tasks": [
        {
            "type": "process",
            "label": "Build R package (Debug)",
            "command": "R",
            "args": [
                "CMD",
                "INSTALL",
                ".",
                "--preclean"
            ],
            "options": {
                "cwd": "${workspaceFolder}/r",
                "env": {
                    "ARROW_HOME": "${workspaceFolder}/cpp/build/user-r-debug/dist"
                }
            },
            "group": "build",
            // Windows needs to use RTools toolchain
            "windows": {
                "args": [
                    "CMD",
                    "INSTALL",
                    ".",
                    "--no-multiarch",
                    "--preclean",
                    // "--debug"
                ],
                "options": {
                    "env": {
                        "ARROW_HOME": "${workspaceFolder}/cpp/build/user-r-debug-windows/dist",
                    }
                }
            }
        },
        {
            "type": "process",
            "label": "Install C++ (R)",
            "command": "cmake",
            "args": [
                "--build",
                ".",
                "--target",
                "install",
            ],
            "options": {
                "cwd": "${workspaceFolder}/cpp/build/user-r-debug/",
                "env": {
                    "ARROW_HOME": "${workspaceFolder}/cpp/build/user-r-debug/dist"
                }
            },
            "group": "build",
            "windows": {
                "options": {
                    "cwd": "${workspaceFolder}/cpp/build/user-r-debug-windows/",
                    "env": {
                        "ARROW_HOME": "${workspaceFolder}/cpp/build/user-r-debug-windows/dist"
                    }
                }
            }
        }
    ]
}

These define build tasks in VS Code for building the C++ libraries and the R package.

Finally, create a new file at .vscode/launch.json and paste in:

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "GDB Attach to R",
      "type": "cppdbg",
      "request": "attach",
      "program": "C:\\Users\\voltron\\Documents\\R\\R-4.1.2\\bin\\x64\\R.exe",
      "processId": "${command:pickProcess}",
      "MIMode": "gdb",
      "miDebuggerPath": "C:\\rtools40\\mingw64\\bin\\gdb.exe",
      "externalConsole": false,
      "setupCommands": [
        {
          "description": "Load Arrow pretty printers",
          "text": "source ${workspaceFolder}\\cpp\\gdb_arrow.py",
          "ignoreFailures": false
        },
        {
          "description": "Enable pretty-printing for gdb",
          "text": "-enable-pretty-printing",
          "ignoreFailures": true
        },
      ]
    }
  ]
}

In .vscode/launch.json, replace the program key with the path to your R installation. Note that you need to use double backslash (\\) since you are typing the string in JSON.

Finally, open the extensions tab (on the left) and install the following extensions:

  • Test Explorer UI
  • C/C++
  • CMake
  • CMake Tools
  • R
  • R Test Explorer

Use CTRL+SHIFT+P, type “Reload Window”, and hit enter to refresh.

Setup RTools Environment

In VS Code Terminal pane, you now have the option for “RTools 4.0 64-bit Bash”. Open a new instance of that terminal.

Install Arrow dependencies with:

pacman --sync --refresh --noconfirm \
  ${MINGW_PACKAGE_PREFIX}-{ccache,cmake,ninja,openssl,boost,brotli,lz4,protobuf,snappy,thrift,zlib,zstd,aws-sdk-cpp,re2,libutf8proc}

Build GDB with Python Support

Currently, RTools GDB doesn’t come with Python support. So we need to install with a workaround.

Open Git Bash again. Run:

git clone https://github.com/wjones127/rtools-packages.git
cd rtools-packages
git checkout gdb-python

Next go back to RTools bash in VS Code and run:

cd ~/rtools-packages/mingw-w64-gdb/
makepkg-mingw --cleanbuild --syncdeps --force --noconfirm
pacman -U *.pkg.tar.xz

Part II: Using the dev environment

Configure with CMake

To choose the R debug CMake preset do: CTRL+SHIFT+P > CMake: Select Configure Preset > user-r-debug-windows

Reconfigure CMake Cache

Do this whenever you have changed something about your preset.

CTRL+SHIFT+P > “reconfigure” (CMake: Delete Cache and Reconfigure)

Build C++

To build C++, use CTRL+SHIFT+B > Install C++ (R).

Build R Package

To build the R package, use CTRL+SHIFT+B > Build R package (Debug).

Running Unit tests

C++ unit tests aren’t supported in this environment (since R Tools lacks the Google test C++ libraries). But the R ones are supported, of course.

You can run them by opening an R terminal and entering:

devtools::test()
devtools::test(filter="Array") # specific test suite

There is also a UI provided by the Test Explorer plugin. Open the flask icon on the left-hand side of VS Code and you should see a section for “R”. From there, you can explore and run all tests, individual files, or even individual tests. I generally find it’s faster to run all unit tests in the terminal, but I usually use the UI to run individual tests.

Using GDB with R

In a new terminal, open r/ directory and then start R:

cd r
R

Then, get the process ID for your R session

Sys.getpid()

Press F5 to start the debugger and then type in the process ID. After a few seconds, GDB will be attached and you will see the running threads in the lower left corner.

By default, VS Code will be set to break if there is an error. However, if you know a specific line you want to stop at, you can add a breakpoint there by finding the line of C++ code and clicking to the left of that line (or press F9 with your cursor on the line).

Note that you will need to press CTRL+C in the R terminal in order for the breakpoints to attach, due to a known issue, and then press F5 to continue afterward. The breakpoint marker will appear red if it is found, and an empty grey circle if not. If you have not yet loaded the arrow library, then it is normal to not be loaded yet.

You can then either run code in your R terminal or can run the unit test with the usual commands.

Conclusion

That’s how I use VS Code to debug issues in the Arrow R package on Windows. When I first started working with this environment, I struggled a lot to get it to work; I hope this post saves my fellow Arrow contributors a lot of time.