Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude courses which have the download button disabled from mirror drives #2282

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

umar8hassan
Copy link
Contributor

@umar8hassan umar8hassan commented Aug 28, 2024

What are the relevant tickets?

closes https://github.com/mitodl/hq/issues/4473

Description (What does it do?)

This excludes the sites in offline mass build which has hide_download button set for them.

How can this be tested?

  1. Switch main branch in ocw-studio
  2. Create a site locally at localhost:8043/sites or use existing one
  3. In metadata form for site, set hide_download to True
  4. Trigger a mass_build: docker-compose exec web ./manage.py mass_publish <live/draft> --filter <site-name> and wait for the build to complete successfully.
  5. Open minio console at localhost:9001 and check course data in ocw-content-offline-<live/draft>-local/courses/<site-name> for your site.
  6. Verify that data has been updated recently.
  7. Switch to umar/4473-exclude-courses-which-have-the-download-button-disabled-from-mirror-drives in ocw-studio
  8. Repeat steps 2 - 6 and data should not be updated for offline version sites.

@ibrahimjaved12 ibrahimjaved12 self-requested a review August 28, 2024 13:26
@ibrahimjaved12 ibrahimjaved12 self-assigned this Aug 28, 2024
Copy link
Contributor

@ibrahimjaved12 ibrahimjaved12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update the testing steps pertaining to local testing with mass build.

Copy link
Contributor

@ibrahimjaved12 ibrahimjaved12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the changes using upsert_mass_build_pipeline --offline and I can confirm that the courses are excluded from offline mass build that have download button disabled.

I also confirmed that they are excluded only from offline mass build, not the online one.
Looks good to me.

@pdpinch
Copy link
Member

pdpinch commented Sep 3, 2024

Where can I find a description about how this works? We have a goal of keeping studio indifferent to the project's data schema, and I feel like we're mixing things up here.

For example, if we had a project without a "show download" field, will this api still work?

@umar8hassan
Copy link
Contributor Author

Where can I find a description about how this works? We have a goal of keeping studio indifferent to the project's data schema, and I feel like we're mixing things up here.

For example, if we had a project without a "show download" field, will this api still work?

Yes, it should be alright. It'd return an empty list instead of raising an error if the hide_download field does not exist in metadata property.

@gumaerc
Copy link
Contributor

gumaerc commented Sep 10, 2024

Where can I find a description about how this works? We have a goal of keeping studio indifferent to the project's data schema, and I feel like we're mixing things up here.

For example, if we had a project without a "show download" field, will this api still work?

@pdpinch I did suggest this solution, but yea now that you mention it this does go against our philosophy of not adding any application code to ocw-studio that makes decisions based on metadata properties. However, I don't really see a good way to avoid it in this case. The "hide download" option is a course level setting defined in the metadata, and the mass build is a deployment detail. For the online site, we simply don't render the download button in the theme if hide_download is true. With the offline build, there isn't really any code we could add to the Hugo code in the theme to not render the entire site if the flag is set. When you run hugo, a site is going to be built regardless of any settings so a decision has to be made somewhere outside of that to not run hugo to generate the offline site at all. I think in this case the best we can do is document the hide_download setting in the ocw-studio readme and detail exactly what it does within the context of the mass build.

@pdpinch
Copy link
Member

pdpinch commented Sep 10, 2024 via email

@gumaerc
Copy link
Contributor

gumaerc commented Sep 10, 2024

Could we modify the code in the mass build script instead of the code in Studio? For example, exiting the build early if the download condition isn't met? Or, another option, could be build something else in this case, instead of the site? Maybe a page with a link?

The mass build pipeline is generated by ocw-studio. We could jump through the hoops required to parse out and read this setting, then choose not to execute the build within the pipeline, but I don't see how that would be worth it. At that point, we're still "crossing content configuration and deployment details" because the pipeline would be looking for hide_download.

If we do nothing with the ocw-studio code at all and tried to handle this in ocw-hugo-themes by somehow getting Hugo to render a page with a link in the offline theme, I don't think that would get us where we want either. You could potentially override the homepage and render a different template if hide_download is true, but you would still end up with the rest of the files that Hugo would generate and the site would still show up in the index. The existing solution here prevents the former. Since the site is never included in the pipeline, it's never built in the first place and the files won't be present in the mirror drive.

@umar8hassan The only thing I might add to this from a code review perspective is that we will also need to do this in update_websites_in_root_website. When ocw-studio publishes the "root website" (ocw-www) it publishes content along with that of type website that defines every website known to ocw-studio that is not the root website - https://github.mit.edu/mitocwcontent/ocw-www/tree/main/content/websites. This content is used to build the Fuse offline search index, and the site not being present in the mass build pipeline will not keep it out of here. We may want to just switch this function over to get the websites that should be included by calling get_publishable_sites. It also currently only updates or creates these records; it isn't set up to remove them.

@umar8hassan umar8hassan force-pushed the umar/4473-exclude-courses-which-have-the-download-button-disabled-from-mirror-drives branch from 86665de to 63b241a Compare September 20, 2024 05:50
@gumaerc gumaerc self-assigned this Sep 26, 2024
Copy link
Contributor

@gumaerc gumaerc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for the mass build pipeline is working as expected. I marked a course with hide_download, regenerated the pipeline and inspected its contents. I looked for the site I marked for exclusion and it was indeed excluded from the offline mass build. This will prevent the offline site from being built and deployed with the offline mass build. I still have a few concerns though. As I mentioned in my above comment, we need a way to keep hide_download sites out of the offline search index. I see you have done that here in the way I suggested, but now that I've thought about it some more there are some issues with this approach. The update_websites_in_root_website function is only called when a new site is published for the first time or if a site is being unpublished. We will need to do the following after this PR is deployed if we go with this approach:

  • Open a Django shell in the deployed environment and run update_websites_in_root_website manually
  • Run the following management commands:
    • ./manage.py reset_sync_states --filter ocw-www --skip-sync
    • ./manage.py sync_website_to_backend --filter ocw-www --delete
  • This will force all the in-progress content on ocw-www to be published, so we will need to check with OCW

Alternatively, there is another approach we could take instead that is much more simple. In ocw-hugo-themes, take a look at the following code from www-offline/layouts/search/section.html:

      {{ range $websites }}
      {{ $urlPath := strings.TrimPrefix "/" .Params.url_path }}
      {{ $url := printf "%s/%s/index.html" $pathToRoot $urlPath }}
      {{ $delimiter := ", " }}
      {
        "title": "{{- .Title -}}",
        "primary_course_number": "{{- .Params.primary_course_number -}}",
        "url": "{{- $url -}}",
        {{ if .Params.level }}
        "level": "{{- delimit .Params.level $delimiter -}}"
        {{ else }}
        "level": ""
        {{ end }}
      },
      {{ end }}

This code is responsible for iterating content/websites in ocw-www and creating the offline FuseJS search index. Here, you could skip over a site if .Params.hide_download is true. I still think it's worth leaving in the code that excludes sites from having a WebsiteContent of type website created, but implementing the above change would mean that updating existing content might be easier as we don't have to delete the sites from the content/websites folder in ocw-www.

Now, the reason I have requested changes on this PR is because there is another situation we have to handle in regards to offline sites being deployed that has been overlooked: the single site pipelines. When a single site publish is triggered, the online site build job is followed by the offline job:
image

We could make a code change to site_pipeline.py to exclude the offline build job if hide_download is true, but I'm not sure this would be the best approach. In order for this to take effect, a given pipeline needs to be backpopulated into Concourse. If a content author sets a site to hide download, the pipeline definition in Concourse is not updated as part of that.

With all this taken into consideration, I'm thinking about how @pdpinch suggested that we instead prevent the whole build from happening within the pipeline somehow. One way to do this would be to add another get step that the offline job depends on similar to offline_build_gate, a keyval type resource. Its job would be to check hide_download and not pass to the offline build job if it's true.

@pdpinch
Copy link
Member

pdpinch commented Oct 1, 2024

@umar8hassan @gumaerc what's next with this? Is it worth merging as is, and returning to "prevent the whole build from happening within the pipeline somehow" at a later date -- or should we stop this now?

@gumaerc
Copy link
Contributor

gumaerc commented Oct 1, 2024

@pdpinch I believe we should deal with this now. If this is merged as-is, offline versions of sites marked with hide_download will still be deployed when individual publish actions are taken, because of the reasons I described above. I believe we should add the "gate" resource between the online and offline build like I talked about , which would prevent that issue. Then, a follow-up PR should be made in ocw-hugo-themes to prevent these sites from showing up in the offline search index in the event they are already deployed.

@umar8hassan umar8hassan force-pushed the umar/4473-exclude-courses-which-have-the-download-button-disabled-from-mirror-drives branch from 63b241a to 8799e5f Compare October 4, 2024 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants