Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build_recorder(3) kills parallelism #151

Open
fvalasiad opened this issue Jan 24, 2023 · 0 comments
Open

build_recorder(3) kills parallelism #151

fvalasiad opened this issue Jan 24, 2023 · 0 comments

Comments

@fvalasiad
Copy link
Collaborator

I felt that #131 was quite chaotic so I am going to group everything I've found so far in a different more specific issue about the problem described at #99.

The problem

I am going to use some pseudo-code to describe tracer_main(3) and explain what I think is the problem.

while(running) {
    pid = wait();
    handle_syscall(pid);
    continue(pid);
}

That's build_recorder so far, each time a process runs into a syscall, it stops and sends a signal to tracer_main(3), waiting for tracer_main(3) to send it a signal back instructing it to continue.

Before that happens though, handle_syscall(3) is called.

handle_syscall(3) can be an expensive operation(for example, computing the hash of a file), essentially delaying the continuation of the process being handled. This is inevitable as we cannot afford to have the handled process running while we trace its data(file descriptors, virtual memory, etc...).

One awful side effect though is the fact that the currently handled process isn't the only one waiting. It's highly likely that a bunch of child processes have already met their next syscall and have sent a signal to tracer_main(3) waiting to be handled and instructed to continue.

Especially considering the fact that most syscalls are ignored(we only care about the ones handled in the corresponding switch statement), an arbitrary amount of processes are sleeping, not running concurrently for no reason at all.

Essentially we are facing a responsiveness issue.

What can be done

the solution

The guidelines of how to create a GUI application can help us:

Avoid running expensive operations in the GUI thread.

The inevitable issues...

Multiple thread_main(3)s

Ideally we would like each child process to have its own tracer_main loop. What's stopping us?

ptrace(2)!
for ptrace(2) a tracer and a tracee are both threads and not processes, as described in ptrace(2)'s manual second paragraph:

A tracee first needs to be attached to the tracer. Attachment and
subsequent commands are per thread: in a multithreaded process, every
thread can be individually attached to a (potentially different)
tracer, or left not attached and thus not debugged. Therefore,
"tracee" always means "(one) thread", never "a (possibly multithreaded)
process".

As a result, the different threads we spawn to trace new processes, all need to be attached separately.
I didn't find any ptrace(2) command helping with this and as a result I doubt ptrace(2) was designed to be used that way.

One thread_main(3)s

What are the alternatives? The GUI way! Instead of multiple tracer_main(3) loops, we only have one, and that one spawns different threads in a threadpool that handle expensive operations. What's stopping us here?

Again... ptrace(2).

Only the "gui" thread can run ptrace(2) commands(continue, peek, etc...)! And yet handle_syscall(3) calls do it quite often, which means they cannot simply be thrown into a threadpool to execute.

This solution is more promising though because there is nothing stopping us of implementing a run_in_gui_thread(3) method.

Conclusion

Rather complicated issue that forces us to change a lot of things. We have to start caring about thread safety as well regarding our shared state(finfo and pinfo) if we were to make such changes.

@zvr What do you propose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant