Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

null pointers not safety-checked across ABI boundary #22005

Open
nektro opened this issue Nov 17, 2024 · 5 comments
Open

null pointers not safety-checked across ABI boundary #22005

nektro opened this issue Nov 17, 2024 · 5 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@nektro
Copy link
Contributor

nektro commented Nov 17, 2024

Zig Version

0.13.0

Steps to Reproduce and Observed Behavior

struct Foo {
  int a;
  int b;
};

struct Foo *bar(void) {
    return 0;
}
const Foo = extern struct {
    a: c_int,
    b: c_int,
};

extern fn bar() *Foo;

const std = @import("std");
pub fn main() void {
    const f = bar();
    std.log.warn("{d}", .{f.a});
}
❯ zig run ./test.zig ./test.c 
Segmentation fault at address 0x0
Panicked during a panic. Aborting.
[1]    96085 abort      zig run ./test.zig ./test.c

Expected Behavior

❯ zig run ./test.zig ./test.c
thread 53596148 panic: attempt to use null value
/Users/meghandenny/src/bun/test.zig:10:20: 0x1001f4c2f in main (test)
    const f = bar();
                 ^
/opt/homebrew/Cellar/zig/0.13.0/lib/zig/std/start.zig:514:22: 0x1001f498b in main (test)
            root.main();
                     ^
???:?:?: 0x19af54273 in ??? (???)
???:?:?: 0x0 in ??? (???)
[1]    99333 abort      zig run ./test.zig ./test.c
@nektro nektro added the bug Observed behavior contradicts documented or intended behavior label Nov 17, 2024
@Vexu Vexu added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. and removed bug Observed behavior contradicts documented or intended behavior labels Nov 17, 2024
@Vexu Vexu added this to the unplanned milestone Nov 17, 2024
@rohlem
Copy link
Contributor

rohlem commented Nov 18, 2024

As I see it the source of the issue in the given reproduction is the fact that the Zig interface of the C function is declared as *Foo, when the C implementation returns null.
If it is instead declared as extern fn bar() ?*Foo;, the compiler should tell you to somehow unwrap this optional.

What is the intended scope of this "bug" (now labeled proposal)?

  • I assume "safety-checked" here alludes to only checking this if runtime safety is enabled, i.e. debug builds? (Maybe also in @setRuntimeSafety(true) scopes?)
  • Is the compiler meant to somehow poison/mark the return types of extern fn to be "potentially-null"?
    Would this use the ? optional type construction of status-quo, or be a separate ("hidden") marker? (EDIT: Probably the latter to not change types depending on runtime-safety?)
    Is there a reason you don't declare the return type as an optional pointer in your use case?
  • Is every call to an extern fn supposed to be checked for invalid values? (Slightly related: Proposal: Introduce safety-checked UB for reading invalid bit patterns via @bitCast, pointers, and untagged unions #6784)
    Should this only cover functions implemented in C (maybe all functions using non-internal calling convention?), or also apply to Zig module / library boundaries? Are runtime function pointers included?

@lysergicdream
Copy link

lysergicdream commented Nov 18, 2024

I think Zig is behaving correctly and as expected in this case. By default, any function that returns a pointer in C maps semantically to ?* in Zig. It would be correct to use a bare pointer if, for example, the function in C is decorated with __attribute__((nonnull)), or the programmer is completely sure that the function will never return a null value. I don't think Zig needs any changes regarding this issue.

@nektro
Copy link
Contributor Author

nektro commented Nov 18, 2024

As I see it the source of the issue in the given reproduction is the fact that the Zig interface of the C function is declared as *Foo, when the C implementation returns null. If it is instead declared as extern fn bar() ?*Foo;, the compiler should tell you to somehow unwrap this optional.

Is there a reason you don't declare the return type as an optional pointer in your use case?

the bug was that the extern definition was wrong. i opened this issue to propose Zig should verify that assumption (that f is indeed a valid *Foo) in builds with safety checks enabled at the point of the boundary crossing where the value becomes under Zig's control before it continues execution.

indeed the way i produced the "expected message" was by changing the definition to ?*Foo and adding .? and then re-running the program. so i believe it would really help debugging if Zig had done this for me.

given that, its also very easy to take the stance that "i should've written it right the first time" but part of Zig's mantra is enabling maintainable code and its very easy for me to forsee a case where perhaps when the extern definition was first written it was correct but then the C implementation changed without knowing that it was also changing the contract with the respective Zig code, thus causing a user to run into this exact scenario as well.

@190n
Copy link
Contributor

190n commented Nov 19, 2024

I brought up similar safety checks in #21495 and I still support them. The nuclear option would be to completely forbid export parameters or extern return values (places where a pointer passes from C to Zig) from being non-optional non-allowzero pointers, but I'm not sure if that would be productive.

As I brought up in that issue, there are already lots of places Zig tries to prevent you from creating a *T that is null (such as the safety check on @ptrFromInt and the inability to coerce **T to *?*T), so I don't think adding a check here is unprecedented.

@Jarred-Sumner
Copy link
Contributor

As I see it the source of the issue in the given reproduction is the fact that the Zig interface of the C function is declared as *Foo, when the C implementation returns null. If it is instead declared as extern fn bar() ?*Foo;, the compiler should tell you to somehow unwrap this optional.

Clang’s -fsanitize=null behaves as @nektro suggests Zig should - when something is declared to not be nullable in C/C++ but it is actually null, then it reports an error at the line where this happened. Today, Zig’s tooling is less safe than C/C++’s tooling in this case.

Debug safety checks exist to verify assertions are correct and help us catch bugs earlier. Instead of a segfault at a memory address wherever the pointer was used most recently, this could (and imo, should) be a panic at the call site of the extern fn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

6 participants