Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find -size <number-without-suffix> not POSIX #499

Open
stephane-chazelas opened this issue Apr 21, 2024 · 7 comments
Open

find -size <number-without-suffix> not POSIX #499

stephane-chazelas opened this issue Apr 21, 2024 · 7 comments

Comments

@stephane-chazelas
Copy link

stephane-chazelas commented Apr 21, 2024

Per POSIX

find . -size n

is meant to return the files whose size rounded up to an integer number of 512-byte units is 1.

For instance, find . -size 1 is meant to report the files whose size ranges from 1 to 512 bytes (the ones that would typically occupy one sector of disk space in the olden days).

But for toybox (and busybox, which shares the same non-conformance), it only reports files whose size is exactly 512.

There are similar problems for find . -size +n and find . -size -n.

Like for the test utility (#498), there's also the separate problem that find . -size 010c finds the files of size 8 instead of 10.

Note the behaviour when using suffixes other than c is fine as out of the POSIX scope and is aligned with most other implementations that support those or some of those suffixes (except GNU find).

See https://unix.stackexchange.com/questions/774817/what-are-the-file-size-options-for-find-size-command/774840#774840 for more of the gory details including comparison with other implementations.

@stephane-chazelas
Copy link
Author

There's a similar problem with the -Xtime [-+]<n> predicates, there the <n> is not treated as an integer number of days and more like the -Xtime [-+]<n>d of FreeBSD.

For instance -mtime 0 is meant to reports files last modified in the last 24 hours, while toybox find only reports the ones last modified exactly now.

@landley
Copy link
Owner

landley commented Apr 23, 2024

Busybox having behaved this way 2007 and nobody noticed seem that strong an argument. Do you have a use case that broke because of this?

512 seems irrelevant (minimum block size of ext2 was 1024 back in the 1990s, even fat32 defaults to at least 4k these days). If we gave m units presumably it should round to the megabyte?

@landley
Copy link
Owner

landley commented Apr 24, 2024

Sharp edge here is that -size has any supplied units override the default (including c=bytes), but -time and -min don't (1kd days is 1000 days).

@landley
Copy link
Owner

landley commented Apr 25, 2024

Debian's find -size also implicitly selects -type f.

@stephane-chazelas
Copy link
Author

Debian's find -size also implicitly selects -type f.

Why would it do that?

$ find . -size 5542c -prune -ls
      258      0 drwxr-xr-x   1 chazelas chazelas     5542 Apr 25 19:29 .
$ find /etc/mtab -size 19c -prune -ls
   187446      4 lrwxrwxrwx   1 root     root           19 Jun 27  2021 /etc/mtab -> ../proc/self/mounts
$ find --version
find (GNU findutils) 4.9.0
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Eric B. Decker, James Youngman, and Kevin Dalley.
Features enabled: D_TYPE O_NOFOLLOW(enabled) LEAF_OPTIMISATION FTS(FTS_CWDFD) CBO(level=2)
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux trixie/sid
Release:        n/a
Codename:       trixie

@stephane-chazelas
Copy link
Author

Busybox having behaved this way 2007 and nobody noticed seem that strong an argument. Do you have a use case that broke because of this?

512 seems irrelevant (minimum block size of ext2 was 1024 back in the 1990s, even fat32 defaults to at least 4k these days). If we gave m units presumably it should round to the megabyte?

It's quite well known busybox is not standard compliant and that one needs to adapt their script when porting to busybox.

Common denominator for block device block size is still 512 bytes.

But that's hardly relevant (and not the point of this question).

Find's -size <number-without-suffix> is a well known almost 50 years old API which checks the size based on number of 512 byte units. If you want your tool to use different unit, don't call it find or use a separate API that doesn't break backward compatibility like the find -size 12k of FreeBSD or GNU (incompatible between themselves), or introduce a new one and convince other implementations to adopt it so it can be suggested as a standard to POSIX and used portably in a few decades.

@terefang
Copy link

terefang commented Jun 2, 2024

But that's hardly relevant (and not the point of this question).

Find's -size <number-without-suffix> is a well known almost 50 years old API which checks the size based on number of 512 byte units. If you want your tool to use different unit, don't call it find or use a separate API that doesn't break backward compatibility like the find -size 12k of FreeBSD or GNU (incompatible between themselves), or introduce a new one and convince other implementations to adopt it so it can be suggested as a standard to POSIX and used portably in a few decades.

@stephane-chazelas while you have a point that there is deviation from the POSIX standard here, toybox is only claiming reasonably standards-compliant and possible more inclined to follow busybox compatiblity here.

also the POSIX standard is imprecise, vague or outright lacking in many places, having been the playground of many corporate interests in the past decades.

if you have an interesting solution to that problem, you can always submit a patch or pull request for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants