You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Alternatively, you might want to use the `Moses <http://www.statmt.org/moses/>`_ tokenizer port in `SacreMoses <https://github.com/alvations/sacremoses>`_ (split from `NLTK <http://nltk.org/>`_). You have to install SacreMoses::
P/S: Though sacremoses and some nltk tokenizers are written in the same style, esp the Penn Treebank tokenizer part, it wasn't extracted from NLTK though; there's only that many ways to write regexes in Python 😄
📚 Documentation
Summary
This repo's
README.md
points to the https://github.com/alvations/sacremoses repo for the Sacramoses project:text/README.rst
Line 72 in 1d4ce73
Similarly, in
torchtext/data/utils.py
:text/torchtext/data/utils.py
Line 129 in 1d4ce73
However, the authoritative home of this project appears to be https://github.com/hplt-project/sacremoses, so the repo links should be updated accordingly.
cc: @alvations (author of the above repo) to confirm or correct if this is a misunderstanding on my part (apologies in advance if that's the case).
Rationale and background research
https://github.com/alvations/sacremoses may have been correct repo at the time of initial extraction of Sacramoses from the NLTK project (see issue #306 and PR #361); however, today, https://github.com/alvations/sacremoses is a fork of https://github.com/hplt-project/sacremoses, and it appears that it is simply behind the other, authoritative project by a number of commits, without having any unique commits of its own:
We can also see that https://pypi.org/project/sacremoses/ has the "homepage" link pointing to https://github.com/hplt-project/sacremoses, further supporting that this is the authoritative source of the project.
The text was updated successfully, but these errors were encountered: