Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RaptorCS Blackbird support #3341

Merged
merged 1 commit into from
Dec 17, 2019
Merged

RaptorCS Blackbird support #3341

merged 1 commit into from
Dec 17, 2019

Conversation

stewartsmith
Copy link
Contributor

Hostboot config is basically Witherspoon but with LPC PORT 80h turned on
defconfig is from Witherspoon, with Blackbird XML setup from Raptor's tree.

NOTE: without linux v5.4.3 (or at least anything more recent than v5.3.7) my Intel SSD doesn't show up in Petitboot.

Also note that the blackbird-xml project doesn't yet existing in the open-power org, so you need to grab it from my repo (or Raptor's). Consider this a request to branch mine in :)

Signed-off-by: Stewart Smith [email protected]

Hostboot config is basically Witherspoon but with LPC PORT 80h turned on
defconfig is from Witherspoon, with Blackbird XML setup from Raptor's tree

Signed-off-by: Stewart Smith <[email protected]>
@stewartsmith
Copy link
Contributor Author

I've sent the skiboot patch upstream: https://patchwork.ozlabs.org/patch/1208967/

@sharkcz
Copy link
Contributor

sharkcz commented Dec 15, 2019

Hmm, I guess I should make the same for Talos :-)

@madscientist159
Copy link

Did the IPL observer stuff ever make its way in to upstream? Especially w/ Talos II it's pretty critical. ;)

@stewartsmith
Copy link
Contributor Author

stewartsmith commented Dec 16, 2019 via email

@sharkcz
Copy link
Contributor

sharkcz commented Dec 16, 2019

I have started https://wiki.raptorcs.com/wiki/Firmware_Upstreaming (it got few changes from @merklort for the hostboot part) some time ago, probably based on the Talos GA branch (2018-04-19), not on the v2 firmware (2019-04-16).

@madscientist159
Copy link

madscientist159 commented Dec 16, 2019

I know that was one of the main blockers to upstreaming -- we don't want to have the support issues associated with OCC reads during IPL (or completely disable fan controls, for that matter). The former in particular was quite expensive as the symptoms lead to spurious mainboard / CPU RMA in some cases and escalation to higher level support in many others. The alternarnative, no fan controls, leads to attempted customer initiated RMA as no one expects a desktop to be that loud (and also ends up.being a bit of a PR black eye for POWER itself).

While I'd really like to see upstream support, we need to have the IPL observer support merged into the subcomponents first. Any chance that can happen?

@sharkcz
Copy link
Contributor

sharkcz commented Dec 16, 2019

Can't we start with the machine specific "overlay" in openpower/patches/talos-patches until they are properly upstreamed? How much of the non-upstream patches is shared between Blackbird and Talos?

@stewartsmith
Copy link
Contributor Author

stewartsmith commented Dec 16, 2019 via email

Copy link
Contributor

@dcrowell77 dcrowell77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine from my perspective

@dcrowell77
Copy link
Contributor

There were some internal network issues recently so I'm guessing that is what the CI fail probably is.

@dcrowell77
Copy link
Contributor

retest this please

@madscientist159
Copy link

madscientist159 commented Dec 16, 2019

Can't we start with the machine specific "overlay" in openpower/patches/talos-patches until they are properly upstreamed? How much of the non-upstream patches is shared between Blackbird and Talos?

Yes, a local set of patches combined with a RCS overlay directory would work. Most of the modifications are common between Talos II / Blackbird / Condor; please feel free to pull whatever you need from our repos as we have retained the original Apache / GPL / BSD licenses for the modified components (newly written components are normally GPL v3 from our side, licenses are in the file headers).

I just don't want a functionally broken upstream version. As to how we get there, I'm quite flexible. :). That being said, we do rely extensively on the RFC LPC communication (both for boot status and for other tasks); the BMC firmware assumes it's available and makes assumptions about what kind of information is provided, so I don't want to modify the "API" of sorts that it's using.

The Talos II beta firmware is far closer to the Blackbird firmware; the intent was to merge all three of the RCS hardware platforms onto a mostly common codebase. Don't bother with the older codebase associated with the currently shipping Talos II production firmware; it was largely rewritten for Blackbird and the Talos II beta FW.

@stewartsmith
Copy link
Contributor Author

The v2 for skiboot patch, which talks IPL Observer: https://patchwork.ozlabs.org/patch/1210958/

@oohal
Copy link
Contributor

oohal commented Dec 17, 2019

retest this please

@oohal oohal merged commit 592eff7 into open-power:master Dec 17, 2019
@madscientist159
Copy link

madscientist159 commented Dec 17, 2019

@stewartsmith Thanks for that!

Getting the IPL observer stuff into the SBE and hostboot would also be good -- in particular, the SBE patches (in our tree) drive the progress bar during the first part of the Blackbird boot, and have proven very useful over time to determine bad CPUs / sockets from hostboot issues at a glance.

@stewartsmith
Copy link
Contributor Author

stewartsmith commented Dec 17, 2019 via email

@madscientist159
Copy link

I'd be willing to test any patches to upstream, since I do have access to the recovery system...

@madscientist159
Copy link

Just dug this back up:
open-power/sbe#15

@sharkcz
Copy link
Contributor

sharkcz commented Dec 17, 2019

I have updated https://wiki.raptorcs.com/wiki/Firmware_Upstreaming with SBE, HCODE, OCC and Hostboot. Seems there is just few changes in the Raptor branches for these components.

@madscientist159
Copy link

Most center around the IPL observer, bit frustrating to have the simple PRs sit for > 8 months TBH.

@dcrowell77
Copy link
Contributor

Is the SBE PR the only one outstanding? I understand the frustration, I'll try to shake a tree.

@sharkcz
Copy link
Contributor

sharkcz commented Dec 17, 2019

If I see right in the wiki page above, then we need https://git.raptorcs.com/git/talos-hostboot/commit/?id=d90e6c513094231f622a427030f3dbca1eeb5ed5 for the hostboot part. @madscientist159 will know if there has been a PR already.

@dcrowell77
Copy link
Contributor

Of course the link was added after I read it. It looks like 3 new commits. Make some PRs and I'll at least get them into the pipeline (with no promises on getting a lot of attention).

@sharkcz
Copy link
Contributor

sharkcz commented Dec 17, 2019

op-build WIP with Talos support = https://github.com/sharkcz/op-build/tree/talos
skiboot WIP with Talos using LPC observer = https://github.com/sharkcz/skiboot/tree/talos-lpc

be aware, not even compile tested

@madscientist159
Copy link

FWIW 0xfefe is the IPL observer code for Linux online and userspace launched. Skiboot should NOT be sending it; it confuses two parts of the process and tells the BMC IPL has successfully finished when in fact it's only partially done.

@sharkcz
Copy link
Contributor

sharkcz commented Dec 20, 2019

@madscientist159, I understand your position. The open question is, whether there is a way to do the signalling correctly and upstreamable. How do the other platforms handle this kind of communication between the host and BMC?

@madscientist159
Copy link

madscientist159 commented Dec 20, 2019

@sharkcz I've seen a bunch of different methods over the years including writes to scratch registers in the southbridge. What exactly is wrong with a small userspace tool poking the last status code out the LPC port? If it's just that the kernel doesn't have a standardized LPC API, maybe all we need is a simple powernvlpc module?

The idea is that 0xfefe is only poked out once petitboot is either about to start (we've already tested our initramfs is working, shell works, etc.) or once petitboot is actually starting. If we reach end of skiboot status codes, and don't reach 0xfefe in under 15 seconds, something's wrong and a firmware rebuild / reflash is in order.

@sharkcz
Copy link
Contributor

sharkcz commented Dec 20, 2019

@madscientist159, nothing is wrong, it's for my education and overview :-) Also I'm thinking how it could be integrated into upstream.

@madscientist159
Copy link

@sharkcz we actually need an LPC access module for other reasons, something that can allow root to read / write bytes, nothing fancy. I can see why it might be desirable to disable the debug interface, but LPC access is a useful and reasonable thing to have.

I wonder if we can do a simple device node that takes basic commands like IO_LPC_SET_TARGET_ADDRESS, IO_LPC_WRITE_BYTE, IO_LPC_READ_BYTE?

@stewartsmith
Copy link
Contributor Author

stewartsmith commented Dec 20, 2019 via email

@stewartsmith
Copy link
Contributor Author

stewartsmith commented Dec 20, 2019 via email

@madscientist159
Copy link

madscientist159 commented Dec 20, 2019

Would there be an objection to an LPC access module in secure mode, if we were to write one?

I would be open to engaging fan control on skiboot exit, but that's a BMC change. Could we send 0x80ff at skiboot exit (IIRC 0x80 is the skiboot prefix code)?

@stewartsmith
Copy link
Contributor Author

stewartsmith commented Dec 20, 2019 via email

@madscientist159
Copy link

madscientist159 commented Dec 20, 2019

Yeah was looking at arbitrary access, see my post a few above about a possible IO_ access mechanism around /dev/lpc or similar. Sounds like we'd need a secure mode filter -- e.g. if in secure mode restrict destination address to 0x80-0x82?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants