Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add external_urls filter #1495

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions assets/javascripts/lib/page.coffee
Original file line number Diff line number Diff line change
Expand Up @@ -179,15 +179,21 @@ onclick = (event) ->
link = link.parentNode while link and link.tagName isnt 'A'

if link and not link.target and isSameOrigin(link.href)
event.preventDefault()
path = link.pathname + link.search + link.hash
path = path.replace /^\/\/+/, '/' # IE11 bug
page.show(path)

if link.className.match('_list-item') or not isSameOriginDifferentDoc(link)
event.preventDefault()
path = link.pathname + link.search + link.hash
path = path.replace /^\/\/+/, '/' # IE11 bug
page.show(path)

return

isSameOrigin = (url) ->
url.indexOf("#{location.protocol}//#{location.hostname}") is 0

isSameOriginDifferentDoc = (url) ->
url.pathname.split('/')[1] != location.pathname.split('/')[1]

updateCanonicalLink = ->
@canonicalLink ||= document.head.querySelector('link[rel="canonical"]')
@canonicalLink.setAttribute('href', "https://#{location.host}#{location.pathname}")
Expand Down
1 change: 1 addition & 0 deletions docs/filter-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ The `call` method must return either `doc` or `html`, depending on the type of f
* [`AttributionFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/attribution.rb) — appends the license info and link to the original document
* [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) — prepends the document with a title (disabled by default)
* [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) — abstract filter for extracting the page's metadata
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb) — replaces external URLs for relative URLs of existant devdocs documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/existant/existing


## Custom filters

Expand Down
5 changes: 5 additions & 0 deletions docs/scraper-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ Additionally:

* [`TitleFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/title.rb) is a core HTML filter, disabled by default, which prepends the document with a title (`<h1>`).
* [`EntriesFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/entries.rb) is an abstract HTML filter that each scraper must implement and responsible for extracting the page's metadata.
* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb) is an HTML filter that replaces external URLs found in `<a>` tags to urls pointing to existant devdocs documentation.

### Filter options

Expand Down Expand Up @@ -185,6 +186,10 @@ More information about how filters work is available on the [Filter Reference](.

_Note: this filter is disabled by default._

* [`ExternalUrlsFilter`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/filters/core/external_urls.rb)

- `:external_urls` [Hash or Proc] If it is a Hash, replaces all URLs found in `<a>` tags for URLs of existant devdocs documentation. If it is a Proc, it is called with an URL (string) as argument and should return a relative URL pointing to an existant devdocs documentation. See [`backbone.rb`](https://github.com/freeCodeCamp/devdocs/blob/master/lib/docs/scrapers/backbone.rb)

## Keeping scrapers up-to-date

In order to keep scrapers up-to-date the `get_latest_version(opts)` method should be overridden. If `self.release` is defined, this should return the latest version of the documentation. If `self.release` is not defined, it should return the Epoch time when the documentation was last modified. If the documentation will never change, simply return `1.0.0`. The result of this method is periodically reported in a "Documentation versions report" issue which helps maintainers keep track of outdated documentations.
Expand Down
2 changes: 1 addition & 1 deletion lib/docs/core/scraper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def stub(path, &block)
self.html_filters = FilterStack.new
self.text_filters = FilterStack.new

html_filters.push 'apply_base_url', 'container', 'clean_html', 'normalize_urls', 'internal_urls', 'normalize_paths', 'parse_cf_email'
html_filters.push 'apply_base_url', 'container', 'clean_html', 'normalize_urls', 'internal_urls', 'normalize_paths', 'parse_cf_email', 'external_urls'
text_filters.push 'images' # ensure the images filter runs after all html filters
text_filters.push 'inner_html', 'clean_text', 'attribution'

Expand Down
36 changes: 36 additions & 0 deletions lib/docs/filters/core/external_urls.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# frozen_string_literal: true

module Docs
class ExternalUrlsFilter < Filter

def call
if context[:external_urls]

css('a').each do |node|

next unless anchorUrl = node['href']

# avoid links already converted to internal links
next if anchorUrl.match?(/\.\./)

if context[:external_urls].is_a?(Proc)
node['href'] = context[:external_urls].call(anchorUrl)
next
end

url = URI(anchorUrl)

context[:external_urls].each do |host, name|
if url.host.to_s.match?(host)
node['href'] = '/' + name + url.path.to_s + '#' + url.fragment.to_s
end
end

end
end

doc
end

end
end
4 changes: 4 additions & 0 deletions lib/docs/scrapers/backbone.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ class Backbone < UrlScraper
Licensed under the MIT License.
HTML

options[:external_urls] = {
'underscorejs.org' => 'underscore'
}

def get_latest_version(opts)
doc = fetch_doc('https://backbonejs.org/', opts)
doc.at_css('.version').content[1...-1]
Expand Down