Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guidance on developing DuckDB extensions in Rust #54

Open
t-kalinowski opened this issue Jul 13, 2024 · 7 comments
Open

Guidance on developing DuckDB extensions in Rust #54

t-kalinowski opened this issue Jul 13, 2024 · 7 comments

Comments

@t-kalinowski
Copy link

Hello DuckDB Team!

I am exploring the possibility of writing an extension for DuckDB and am particularly interested in developing primarily in Rust. I anticipate I'll mostly be using the C API crate libduckdb_sys, but I am unsure if there are existing examples or templates in Rust that I could refer to.

Since I haven't come across any community extensions written in Rust, I wanted to inquire whether you are aware of any, or if there are any plans to support such developments. Any guidance on how to get started, as well as any relevant documentation, would be immensely helpful.

Thank you for your assistance!

@carlopi
Copy link
Collaborator

carlopi commented Jul 13, 2024

There are at the moment a few rust based community extensions:

And a rust-based core duckdb extension: https://github.com/duckdb/duckdb_delta

They use different approaches (@rustyconover's one, @ywelsch's one and @samansmink's in wrapping delta-rs), I would recommend to have a look and see what would fit best with your constraints, and possibly clone one and start playing with that.

Also pinging the authors since they might have something to add.
We should improve docs on how to get started, this would also be handy to have.

Also on DuckDB's discord channels about extensions or rust there are helpful conversations around this.

@samansmink
Copy link
Collaborator

@t-kalinowski There's also ongoing work for a new extension API based on the C API This will allow writing pure Rust extensions. For now, the extensions linked by @carlopi demonstrate the way to go.

@t-kalinowski
Copy link
Author

Thanks for the links!

Please correct me if I've misunderstood, but it appears the linked extensions still contain a significant amount of C++ code. This code seems to require an in-depth understanding of the undocumented, and potentially internal, aspects of the DuckDB C++ API.

Seeing the PR for extensions that only use the C API is exciting! Do you think we could start a "community-extension-rust" template repo soon, where the example "quack" function is written in Rust, using primarily the DuckDB C API via duckdb::ffi?

What do you think?

@samansmink
Copy link
Collaborator

Please correct me if I've misunderstood

That is completely correct.

Do you think we could start a "community-extension-rust" template repo soon

While I can't give any promises on soon, I can say that this is certainly pretty high on the priority list. It's one of the main goals of introducing the C API for extensions.

@0xcaff
Copy link

0xcaff commented Jul 23, 2024

I'm developing an external (non-C++) DuckDB plugin, which you can find at https://github.com/0xcaff/duckdb_protobuf. It appears I might be creating the first all-Rust plugin, as I've encountered several issues along the way. The duckdb-rs bindings lack local initialization for vtables, and the build tooling requires a custom metadata writer (I've created my own at https://github.com/0xcaff/duckdb_protobuf/blob/master/packages/duckdb_metadata/src/lib.rs). Additionally, the bundled feature doesn't correctly pin versions, and integration with the community repository is problematic.

I'd appreciate a way to publish to the community repository without having to adopt DuckDB's build tooling. There are numerous ways to build a DuckDB extension using the C ABI (for example, see the ongoing work on the Zig plugin SDK). It would be beneficial to have a method for out-of-tree builds.

This is particularly important because data formats evolve slowly, and the current process of bridging Rust to C++ to DuckDB involves many steps before deriving value from the integration. Simplifying this process could make it easier for folks to adopt DuckDB.

@samansmink
Copy link
Collaborator

Hey @0xcaff!

You make an interesting point. We decided to to go with our current approach where the build tooling is fixed. This has some clear advantages:

  • It nudges early extension developers to use (and contribute to) the same standardized tooling, improving the tooling for everyone along the way
  • It does not increase the maintenance workload on the DuckDB Labs developers by too much: we are using the same tooling to distribute the core extensions.
  • It avoids a situation where the third-party extension ecosystem becomes very heterogenous with everyone inventing re-inventing the wheel on how to build DuckDB extensions, making it confusing for new extension developers to get started.
  • The current process guarantees the software that is PR-ed is open-source and is cryptographically tied to a specific version of that open-source code.

Our plan is to add the aforementioned C API allowing more flexibility build-tooling wise. The main advantage of going that route is that it will allow for an easy, standardised way to write DuckDB extensions in whatever language supports calling C code easily. This way we keep the maintenance sane across extension build tooling in the various languages.

My main question is, would the aforementioned C API and corresponding (to be developed) build tooling solve your problems with the current setup?

@0xcaff
Copy link

0xcaff commented Jul 23, 2024

Thanks for sharing some of the why behind this design, it makes a lot more sense now. When you said C API, I thought you were speaking about the existing C API https://duckdb.org/docs/api/c/api.html I see where the build tooling complexity comes from, I was not aware of the intricacies of dynamic linking across platforms. It seems the new C API will basically move the linking of external functions into userland, making it much easier to build and link (no more need for dynamic linker). I think this solves for my use case, can't wait to take it for a spin once its ready!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants