-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use C++ for unit tests #646
base: master
Are you sure you want to change the base?
Commits on Oct 5, 2024
-
fix(build): treat functional+unit tests separately
CI has not been running unit tests for neuron, even though it's possible. This doesn't make functional tests work on neuron yet, and it remains automatically disabled. Separate out the enablement of the functional tests from the unit tests. Make --disable-tests not build the unit tests or the functional tests. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2331505 - Browse repository at this point
Copy the full SHA 2331505View commit details -
fix(ci/gha): remove "libfabric" from job names
858f448 stopped building libfabric from git, but didn't change the name, so all of these builds have been named "libfabric@git" when they're using the prebuilt efa installer. Remove the token. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6552704 - Browse repository at this point
Copy the full SHA 6552704View commit details -
feat(ci/gha): enable unit tests for neuron builds
Remove the existing unit-tests action -- pay the install cost time only once, and just run the tests after we run the build. Prefer to not pass -j to make -- this can create jumbled output that is harder to debug. These are running in parallel, it's okay if each individual TU takes longer. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d51bba6 - Browse repository at this point
Copy the full SHA d51bba6View commit details -
feat(ci/gha): add V=1 to all make invocations
The only reason you are ever looking at these logs is because the job failed, in which case, it's a lot easier to debug if you can just see precisely what the compiler called. Enable that here. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for dfd2669 - Browse repository at this point
Copy the full SHA dfd2669View commit details -
feat(ci/gha): use /etc/alternatives for cc/cxx
This was a mess previously, and a lot of jobs were mislabeled as a result. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 64e4346 - Browse repository at this point
Copy the full SHA 64e4346View commit details -
fix(rdma): assert on create_send_comm result
Fixing the compiler being misapplied in CI has exposed another gcc13 -fanalyzer report. This is likely a false positive; it seems to gets confused about how the error status is being stored within the allocated req before being freed and is unable to reason about the relationship between ret and s_comm. Just add another check. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 818afbb - Browse repository at this point
Copy the full SHA 818afbbView commit details -
fix(test) handle more -fanalyzer warnings
Similar to this commits parent, fixing the compiler being misapplied in CI has exposed another gcc13 -fanalyzer warning. The compiler wants to be able to trust that if the rank is not 0, it must be 1 and vice-versa. Unnecessarily checking for: if (rank == 0) {} else if (rank == 2) {} instead of just if (rank == 0) {} else {} means that the compiler considers cases where neither branch is taken. Assert at init that both world size is two and further that the rank assigned to each job is within {0,1}. Then, also make the else-ifs into elses to silence the warning. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8c6480c - Browse repository at this point
Copy the full SHA 8c6480cView commit details -
feat(build): add a macro for managing cflags
Add a macro that uses AX_CHECK_COMPILE_FLAG to test for the existence of a potential flag, and appends into the picky flags variable if and only if the compiler has the flag. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 98840e1 - Browse repository at this point
Copy the full SHA 98840e1View commit details -
feat(build): enable -Wjump-misses-init
GCC manual says: -Wjump-misses-init is included in -Wc++-compat. It can be disabled with the -Wno-jump-misses-init option. This is seemingly only true in newer GCC, or is just bugged outright, because we had -Wc++-compat enabled and did not see these warnings. Add a macro for appending "picky" compiler flags and use it for existing options, then ensure that -Wjump-misses-init is passed. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 13c025d - Browse repository at this point
Copy the full SHA 13c025dView commit details -
feat(build): enable -Wshadow, remove all shadowing
a global `provider_filter` was made redundant with 159bfed and this is now passed to every user directly, but the provider_filter string that now exists in nccl_net_ofi_create_plugin was shadowing the global one that was exported everywhere in nccl_ofi.h. Delete the global one. rdma had two cases of declaring a new variable, shadowing the old variable, pointing to the same thing. Delete the second useless declaration. sendrecv had a case where it was shadowing a "req" variable in an inner scope, but pointing to a different req. Just rename the variable in the inner scope to be more clear. mr.c needed to remove a useless include, which removes the shadow on system_page_size. This unfortunately exposed an issue in the header for mr.h; config.h was included late, it was transiently relying on an inclusion for the definition of offsetof and, because we do not build with C11, stdbool. The conditional inclusion of fi_domain.h was not correct in all cases. So add <stddef.h> there, always include fi_domain.h, and fix the config.h misordering. Rename some things in tests to avoid shadowing. With these fixed, enable -Wshadow whenever -Werror is enabled. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e2007aa - Browse repository at this point
Copy the full SHA e2007aaView commit details -
feat(build): enable -Wformat=2
Enable -Wformat=2, which is stricter than Wall's -Wformat: > -Wformat=2 >> Enable -Wformat plus additional format checks. Currently equivalent to >> -Wformat -Wformat-nonliteral -Wformat-security -Wformat-y2k. Fix a warning that this exposed: this literal in topo.c can be made completely const and should be. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 1ec4605 - Browse repository at this point
Copy the full SHA 1ec4605View commit details -
feat(build): enable -Wimplicit-fallthrough
Add missing case statements and enable -Wimplicit-fallthrough. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 55da767 - Browse repository at this point
Copy the full SHA 55da767View commit details -
feat(build): enable -Wcast-align
Without this, gcc complains: > cast from 'char *' to 'struct nccl_ofi_freelist_elem_t *' increases > required alignment from 1 to 8 Fix problematic areas in container_of and freelist, then enable -Wcast-align whenever -Werror is enabled. Related-commit: 4e64913 Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e86adb8 - Browse repository at this point
Copy the full SHA e86adb8View commit details -
feat(build): enable -Wmissing-declarations
Many TU-local functions were missing a `static` keyword. Add it. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 186b679 - Browse repository at this point
Copy the full SHA 186b679View commit details -
feat(build): enable -Wnull-dereference
We get this for free, likely fixed in aws#627 or with some of the -fanalyzer fixes. Enable -Wnull-dereference whenever -Werror is enabled. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6301635 - Browse repository at this point
Copy the full SHA 6301635View commit details -
feat(build): use -isystem instead of -I
When constructing include directories, use -isystem instead of -I. This accomplishes largely the same thing, but informs the compiler that we don't control these headers, which is useful for filtering warnings. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for df17f32 - Browse repository at this point
Copy the full SHA df17f32View commit details -
fix(freelist): add missing stdbool.h include
C99 does not define bool as a keyword (unlike C11 and beyond), which means that files are free to use `bool` as something other than a type name. When <stdbool.h> is not included through some other mechanism, -Wc++-compat treats it as such and complains that it's a reserved keyword in C++. freelist.h uses bool, but does not bring in stdbool.h; add <stdbool.h>
Configuration menu - View commit details
-
Copy full SHA for 3ac7f56 - Browse repository at this point
Copy the full SHA 3ac7f56View commit details -
feat(build): handle 3p headers better
Don't put 3p header-only dependencies (meaning, nccl's ext-net, nccl's ext-tuner, uthash) under the same directory as our headers, put them under a separate top-level 3rd-party/ directory. Ensure that a license file is present in each directory. On the build side, use AC_SUBST to make $device_interface accessible outside of the top-level configure.ac, then use this to generate any accelerator-specific include paths. It is now not possible to include the wrong copy of the nccl interface files, because when compiling under that accelerator it is simply not in the include path. Move the tuner to its own Makefile.am for the src/tuner subdirectory; don't try to accomplish this in the single src/Makefile.am. When moving over the includes to the new expected format, prefer <header.h> to "header.h" so that it's clear that these are system headers, which also has the benefit of telling the compiler that warnings in those files are different than any warnings in our actual source files. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 70418f4 - Browse repository at this point
Copy the full SHA 70418f4View commit details -
fix(cuda): delete broken using directives
This was missed in review. It doesn't work this way. In C++20, the following is legal: > enum foo { bar, baz }; > using enum foo; But it does not support the glob syntax done here, and this is also missing the `enum` specifier required, and it also doesn't work with C++17. It's unnecessary in this file, anyway, we can just use the bare names here. Delete it so that this file can be parsed with `-x c++` without causing build failures.
Configuration menu - View commit details
-
Copy full SHA for dc427a3 - Browse repository at this point
Copy the full SHA dc427a3View commit details -
fix(mr_ckey): add C++ compatibility with GCC7
g++7 fails to initialize C-style tagged unions with the following error message: > sorry, unimplemented: non-trivial designated initializers not > supported Note that gcc7 handles this fine when the file is parsed as C, as well as g++8 and beyond, and the problem only exists when operating in C++ mode under g++7. AL2 is still using this ancient toolchain, so add a hack in the case that it is used. This is pretty horrible, and should be cleaned up sooner rather than later.
Configuration menu - View commit details
-
Copy full SHA for 36e6948 - Browse repository at this point
Copy the full SHA 36e6948View commit details -
feat(rdma): constrain C linkage to init
On builds where lttng is enabled, the inclusion of nccl_ofi_tracepoint.h pulls in <lttng/ust-utils.h>, which defines templates when __cplusplus is defined. Those templates cannot have C linkage. This file does not really need nccl_ofi_tracepoint.h; it only needs config.h to test for whether NVTX tracing is enabled and nvtx3 headers for conditionally defining the nvtx domain handles. It's also simultaneously true that the large majority of these headers are consumed only by rdma.c, so another option is to just move the guard down to where it's actually required. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for db13eef - Browse repository at this point
Copy the full SHA db13eefView commit details -
feat(test): parse as c++ source
All included headers are fully safe to be parsed as C++, so these unit tests can now use C++. Rename them to `.cc' files and use a few c++ keywords to enforce that the compiler is actually treating them as such. Actual C++ cleanups to follow later, hopefully alongside a unit test framework like gtest or catch2. Signed-off-by: Nicholas Sielicki <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4ac7d2b - Browse repository at this point
Copy the full SHA 4ac7d2bView commit details