Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use C++ for unit tests #646

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open

Conversation

aws-nslick
Copy link
Contributor

feat(test): parse as c++ source

All included headers are fully safe to be parsed as C++, so these unit
tests can now use C++. Rename them to `.cc' files and use a few c++
keywords to enforce that the compiler is actually treating them as such.
Actual C++ cleanups to follow later, hopefully alongside a unit test
framework like gtest or catch2.

Signed-off-by: Nicholas Sielicki [email protected]

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@aws-nslick aws-nslick changed the title Cpp unittests use C++ for unit tests Oct 4, 2024
@aws-nslick aws-nslick force-pushed the cpp-unittests branch 2 times, most recently from 1869778 to 5aaf8a6 Compare October 4, 2024 07:07
CI has not been running unit tests for neuron, even though it's
possible. This doesn't make functional tests work on neuron yet, and it
remains automatically disabled. Separate out the enablement of the
functional tests from the unit tests. Make --disable-tests not build the
unit tests or the functional tests.

Signed-off-by: Nicholas Sielicki <[email protected]>
858f448 stopped building libfabric from git, but didn't change the name,
so all of these builds have been named "libfabric@git" when they're
using the prebuilt efa installer. Remove the token.

Signed-off-by: Nicholas Sielicki <[email protected]>
Remove the existing unit-tests action -- pay the install cost time only
once, and just run the tests after we run the build.

Prefer to not pass -j to make -- this can create jumbled output that is
harder to debug. These are running in parallel, it's okay if each
individual TU takes longer.

Signed-off-by: Nicholas Sielicki <[email protected]>
The only reason you are ever looking at these logs is because the job
failed, in which case, it's a lot easier to debug if you can just see
precisely what the compiler called. Enable that here.

Signed-off-by: Nicholas Sielicki <[email protected]>
This was a mess previously, and a lot of jobs were mislabeled as a
result.

Signed-off-by: Nicholas Sielicki <[email protected]>
Fixing the compiler being misapplied in CI has exposed another gcc13
-fanalyzer report. This is likely a false positive; it seems to gets
confused about how the error status is being stored within the allocated
req before being freed and is unable to reason about the relationship
between ret and s_comm. Just add another check.

Signed-off-by: Nicholas Sielicki <[email protected]>
Similar to this commits parent, fixing the compiler being misapplied in
CI has exposed another gcc13 -fanalyzer warning. The compiler wants to
be able to trust that if the rank is not 0, it must be 1 and vice-versa.

Unnecessarily checking for:

  if (rank == 0) {} else if (rank == 2) {}

instead of just

  if (rank == 0) {} else {}

means that the compiler considers cases where neither branch is taken.
Assert at init that both world size is two and further that the rank
assigned to each job is within {0,1}. Then, also make the else-ifs into
elses to silence the warning.

Signed-off-by: Nicholas Sielicki <[email protected]>
Add a macro that uses AX_CHECK_COMPILE_FLAG to test for the existence of
a potential flag, and appends into the picky flags variable if and only
if the compiler has the flag.

Signed-off-by: Nicholas Sielicki <[email protected]>
GCC manual says: -Wjump-misses-init is included in -Wc++-compat. It can
be disabled with the -Wno-jump-misses-init option.

This is seemingly only true in newer GCC, or is just bugged outright,
because we had -Wc++-compat enabled and did not see these warnings.

Add a macro for appending "picky" compiler flags and use it for existing
options, then ensure that -Wjump-misses-init is passed.

Signed-off-by: Nicholas Sielicki <[email protected]>
a global `provider_filter` was made redundant with 159bfed and this is
now passed to every user directly, but the provider_filter string that
now exists in nccl_net_ofi_create_plugin was shadowing the global one
that was exported everywhere in nccl_ofi.h. Delete the global one.

rdma had two cases of declaring a new variable, shadowing the old
variable, pointing to the same thing. Delete the second useless
declaration. sendrecv had a case where it was shadowing a "req" variable
in an inner scope, but pointing to a different req. Just rename the
variable in the inner scope to be more clear.

mr.c needed to remove a useless include, which removes the shadow on
system_page_size. This unfortunately exposed an issue in the header for
mr.h; config.h was included late, it was transiently relying on an
inclusion for the definition of offsetof and, because we do not build
with C11, stdbool. The conditional inclusion of fi_domain.h was not
correct in all cases. So add <stddef.h> there, always include
fi_domain.h, and fix the config.h misordering.

Rename some things in tests to avoid shadowing.

With these fixed, enable -Wshadow whenever -Werror is enabled.

Signed-off-by: Nicholas Sielicki <[email protected]>
Enable -Wformat=2, which is stricter than Wall's -Wformat:
> -Wformat=2
>> Enable -Wformat plus additional format checks. Currently equivalent to
>> -Wformat -Wformat-nonliteral -Wformat-security -Wformat-y2k.

Fix a warning that this exposed: this literal in topo.c can be made
completely const and should be.

Signed-off-by: Nicholas Sielicki <[email protected]>
Add missing case statements and enable -Wimplicit-fallthrough.

Signed-off-by: Nicholas Sielicki <[email protected]>
Without this, gcc complains:
> cast from 'char *' to 'struct nccl_ofi_freelist_elem_t *' increases
> required alignment from 1 to 8

Fix problematic areas in container_of and freelist, then enable
-Wcast-align whenever -Werror is enabled.

Related-commit: 4e64913
Signed-off-by: Nicholas Sielicki <[email protected]>
Many TU-local functions were missing a `static` keyword. Add it.

Signed-off-by: Nicholas Sielicki <[email protected]>
We get this for free, likely fixed in aws#627 or with some of the
-fanalyzer fixes. Enable -Wnull-dereference whenever -Werror is enabled.

Signed-off-by: Nicholas Sielicki <[email protected]>
When constructing include directories, use -isystem instead of -I. This
accomplishes largely the same thing, but informs the compiler that we
don't control these headers, which is useful for filtering warnings.

Signed-off-by: Nicholas Sielicki <[email protected]>
C99 does not define bool as a keyword (unlike C11 and beyond), which
means that files are free to use `bool` as something other than a type
name. When <stdbool.h> is not included through some other mechanism,
-Wc++-compat treats it as such and complains that it's a reserved
keyword in C++.

freelist.h uses bool, but does not bring in stdbool.h; add <stdbool.h>
Don't put 3p header-only dependencies (meaning, nccl's ext-net, nccl's
ext-tuner, uthash) under the same directory as our headers, put them
under a separate top-level 3rd-party/ directory. Ensure that a license
file is present in each directory.

On the build side, use AC_SUBST to make $device_interface accessible
outside of the top-level configure.ac, then use this to generate any
accelerator-specific include paths. It is now not possible to include
the wrong copy of the nccl interface files, because when compiling under
that accelerator it is simply not in the include path.

Move the tuner to its own Makefile.am for the src/tuner subdirectory;
don't try to accomplish this in the single src/Makefile.am.

When moving over the includes to the new expected format, prefer
<header.h> to "header.h" so that it's clear that these are system
headers, which also has the benefit of telling the compiler that
warnings in those files are different than any warnings in our actual
source files.

Signed-off-by: Nicholas Sielicki <[email protected]>
This was missed in review. It doesn't work this way. In C++20, the
following is legal:

> enum foo { bar, baz };
> using enum foo;

But it does not support the glob syntax done here, and this is also
missing the `enum` specifier required, and it also doesn't work with
C++17. It's unnecessary in this file, anyway, we can just use the bare
names here. Delete it so that this file can be parsed with `-x c++`
without causing build failures.
g++7 fails to initialize C-style tagged unions with the following error
message:

> sorry, unimplemented: non-trivial designated initializers not
> supported

Note that gcc7 handles this fine when the file is parsed as C, as well
as g++8 and beyond, and the problem only exists when operating in C++
mode under g++7. AL2 is still using this ancient toolchain, so add a
hack in the case that it is used. This is pretty horrible, and should be
cleaned up sooner rather than later.
On builds where lttng is enabled, the inclusion of nccl_ofi_tracepoint.h
pulls in <lttng/ust-utils.h>, which defines templates when __cplusplus
is defined. Those templates cannot have C linkage.

This file does not really need nccl_ofi_tracepoint.h; it only needs
config.h to test for whether NVTX tracing is enabled and nvtx3 headers
for conditionally defining the nvtx domain handles.

It's also simultaneously true that the large majority of these headers
are consumed only by rdma.c, so another option is to just move the guard
down to where it's actually required.

Signed-off-by: Nicholas Sielicki <[email protected]>
All included headers are fully safe to be parsed as C++, so these unit
tests can now use C++. Rename them to `.cc' files and use a few c++
keywords to enforce that the compiler is actually treating them as such.
Actual C++ cleanups to follow later, hopefully alongside a unit test
framework like gtest or catch2.

Signed-off-by: Nicholas Sielicki <[email protected]>
@bwbarrett
Copy link
Contributor

I had comments on some of the early patches in this PR from other PRs, so dont' feel comfortable approving the PR. That said, the last commit (the uniq one across your PRs) looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants