Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: mcontrol/mcontrol6 on fetch #1459

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

YenHaoChen
Copy link
Collaborator

@YenHaoChen YenHaoChen commented Sep 13, 2023

The fetch address and data are in refill_icache() instead of fetch_slow_path(). This commit moves the mcontrol/mcontrol6 checkings from fetch_slow_path() to refill_icache() for correct fetch values (2128daa).

The last commit (1779c35) is for performance improvement. I need help verifying the effectiveness.

Copy link
Collaborator

@timsifive timsifive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'm not really familiar with this part of the code.

@scottj97
Copy link
Contributor

@YenHaoChen can you explain the failing scenario that 2128daa fixes?

I can see one of my testcases was previously taking a fetch breakpoint, and no longer is after that change. I will study it to see which behavior is correct.

@scottj97
Copy link
Contributor

scottj97 commented Sep 13, 2023

The case where I see behavior change:

tdata1[2]=0x200000000010010c (mcontrol match=2(ge) u=1 execute=1)
tdata1[2]=0x0000000080002795
PC that triggered is 80002794, which contains a 32-bit opcode

After this PR, this PC executes without taking the breakpoint trap.

I don't think the spec says one way or the other about this case, but it is a functional change that needs to be documented in the PR. I guess it's only comparing the first byte of the fetch, instead of each byte?

The fetching instruction data are in refill_icache() instead of
fetch_slow_path(). This commit moves the mcontrol/mcontrol6 checking
from fetch_slow_path() to refill_icache() for the correct data value.

For example, consider an instruction 0xe9830313 (addi t1, t1, -360). The
previous implementation fires mcontrol data triggers with tdata2=0xe983
and tdata2=0x0313 but does not fire one with tdata2=0xe9830313, which is
incorrect. This commit fixes the issue.
The fetching PC is in refill_icache() instead of fetch_slow_path().
This commit moves the address mcontrol/mcontrol6 checking from
fetch_slow_path() to refill_icache() for the correct fetching PC.

For example, consider a 4-byte instruction at PC=0x4000. The Debug
specification recommends to compare PC={0x4000,0x4001,0x4002,0x4003};
otherwise, the Debug specification requires to compare PC=0x4000. The
previous implementation compares PC={0x4000,0x4002}, which is undesired.
@YenHaoChen
Copy link
Collaborator Author

YenHaoChen commented Sep 14, 2023

@scottj97
Split 2128daa into two commits with additional commit messages for better readability.

The first commit 0578364 fixes the instruction data comparison. For instance, an instruction data 0xe9830313 cannot match a data trigger with tdata2=0xe9830313 but instead ones with tdata2={0xe983,0x0313} without this commit.
The second commit 146e41e fixes the instruction address (PC) comparison. For instance, a 4-byte instruction at PC 0x80002794 also matches an address trigger with tdata2=0x80002796 without this commit. (The spec requires to match tdata2=0x80002794 and recommends to match tdata2={0x80002794,80002795,80002796,80002797}. Matching tdata2={0x80002794,80002796} only does not make sense from a user perspective.)

image

Copy link
Contributor

@scottj97 scottj97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have good test coverage of data triggers, so I'm not surprised that I didn't see that difference.

The last commit I am unsure about and @aswaterman should review.

riscv/mmu.h Outdated Show resolved Hide resolved
auto access_info = generate_access_info(addr, FETCH, {false, false, false});
check_triggers(triggers::OPERATION_EXECUTE, addr, access_info.effective_virt);
if (unlikely(tlb_insn_tag[vpn % TLB_ENTRIES] != vpn))
check_triggers(triggers::OPERATION_EXECUTE, addr, access_info.effective_virt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will spuriously invoke check_triggers on ITLB misses. Note that fetches from MMIO devices accesses always result in ITLB misses, so this means check_triggers will be invoked on every MMIO fetch. Maybe this is functionally correct and is only a perf issue, but it seems undesirable.

What's wrong with my original suggestion of moving this check back into fetch_slow_path? By construction, we should not need to check the triggers on the fetch fast path...

Copy link
Collaborator Author

@YenHaoChen YenHaoChen Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation splits instruction fetching into multiple 2-byte operations, which call fetch_slow_path separately. This results in that a 4-byte instruction 0xe9830313 matches data triggers with tdata2={0xe983, 0x0313} instead of one with tdata2=0xe9830313 incorrectly. Additionally, a 4-byte instruction with PC at 0x4000 matches address triggers with tdata2={0x4000, 0x4002}, which is unexpected from a user perspective.

The behavior of the data trigger in the current implementation does not conform to the specification. It is weird to me that the address trigger matches PCs of 2-byte granularity. Clarify whether the specification allows the address trigger to match an arbitrary subset of the accessing addresses: riscv/riscv-debug-spec#877.

Update 2023.09.20:
The riscv/riscv-debug-spec#877 clarifies that the specification allows the address trigger to match an arbitrary subset of the accessing addresses. Thus, the specification allows the original behavior of the address trigger. Possibly, we can improve performance by keeping address trigger checking in fetch_slow_path. However, the data trigger truly does not work in fetch_slow_path. Moving address triggers out from fetch_slow_path also provides a more reasonable behavior from a user perspective.
@aswaterman How do you feel about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants