Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Numo Gem for performing SVD #198

Merged
merged 1 commit into from
Jun 9, 2022
Merged

Commits on Jun 4, 2022

  1. Support Numo Gem for performing SVD

    **Background:**
    The slow step of LSI is computing the SVD (singular value decomposition)
    of a matrix. Even with a relatively small collection of documents (say,
    about 20 blog posts), the native ruby implementation is too slow to be
    usable (taking hours to complete).
    
    To work around this problem, classifier-reborn allows you to optionally
    use the `gsl` gem to make use of the [Gnu Scientific
    Library](https://www.gnu.org/software/gsl/) when performing matrix
    calculations. Computations with this gem perform orders of magnitude
    faster than the ruby-only matrix implementation, and they're fast enough
    that using LSI with Jekyll finishes in a reasonable amount of time
    (seconds).
    
    Unfortunately, [rb-gsl](https://github.com/SciRuby/rb-gsl) is
    unmaintained -- there's a commit on main that makes it compatible with
    Ruby 3, but nobody has released the gem so the only way to use rb-gsl
    with Ruby 3 right now is to specify the git hash in your Gemfile. See
    SciRuby/rb-gsl#67. This will be increasingly
    problematic because Ruby 2.7 is now in [security
    maintenance](https://www.ruby-lang.org/en/news/2022/04/12/ruby-2-7-6-released/)
    and will become end of life in less than a year.
    
    Notably, `rb-gsl` depends on the
    [narray](https://github.com/masa16/narray#new-version-is-under-development---rubynumonarray)
    gem. `narray` is deprecated, and the readme suggests using
    `Numo::NArray` instead.
    
    **Changes:**
    In this PR, my goal is to provide an alternative matrix implementation
    that can perform singular value decomposition quickly and works with
    Ruby 3. Doing so will make classifier-reborn compatible with Ruby 3
    without depending on the unmaintained/unreleased gsl gem. There aren't
    many gems that provide fast matrix support for ruby, but
    [Numo](https://github.com/ruby-numo) seems to be more actively
    maintained than rb-gsl, and Numo has a working Ruby 3 implementation
    that can perform a singular value decomposition, which is exactly what
    we need. This requires
    [numo-narray](https://github.com/ruby-numo/numo-narray) and
    [numo-linalg](https://github.com/ruby-numo/numo-linalg).
    
    My goal is to allow users to (optionally) use classifier-reborn with
    Numo/Lapack the same way they'd use it with GSL. That is, the user
    should install the `numo-narray` and `numo-linalg` gems (with their
    required C libraries), and classifier-reborn will detect and use these
    if they are found.
    mkasberg committed Jun 4, 2022
    Configuration menu
    Copy the full SHA
    aa3ea2c View commit details
    Browse the repository at this point in the history