Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

External Data Sources #36

Open
0xcaff opened this issue May 27, 2018 · 6 comments
Open

External Data Sources #36

0xcaff opened this issue May 27, 2018 · 6 comments

Comments

@0xcaff
Copy link
Member

0xcaff commented May 27, 2018

Currently, the only data used to index files, is data from the songs tags and files around the song. This doesn't seem to be enough (#18, 8e9ae36, #31).

Our plan so far has been to assume the music is well tagged. Even my collection doesn't seem to be well tagged. For forte to work, we need to have good metadata about audio and rich album artwork. For these reasons, we should consider pulling data from external sources (like musicbrainz and AcoustID).

The downside is that importing would be slower. Currently it takes about 15m to import 3k items stored on a NAS. Using AcoustID would mean that importing this many items would take 16m just to request the data from the AcoustID server (probably the main bottleneck). The old way could be hidden behind a flag.

The quality and reliability musicbrainz, acousticid and coverartarchive is really good compared to what it was before. I think it is worth the extra import time to have good data. Also, it goes with the requirement that input files will not be mutated, will still providing a good experience.

This was referenced May 27, 2018
@0xcaff
Copy link
Member Author

0xcaff commented May 29, 2018

On the other hand, tagging music is hard and beets does it really well. Maybe, we could work on interoperability with beets. Some way to import your hand tagged beets collection into forte.

@0xcaff
Copy link
Member Author

0xcaff commented Jun 10, 2018

Here's an example of the data exposed by the beets export plugin:

{
    "acoustid_fingerprint": "",
    "acoustid_id": "",
    "added": "2018-05-29 19:04:54",
    "album": "G Spot.",
    "album_id": "3",
    "albumartist": "Speedy J",
    "albumartist_credit": "Speedy J",
    "albumartist_sort": "Speedy J",
    "albumdisambig": "",
    "albumstatus": "Official",
    "albumtotal": "10",
    "albumtype": "album",
    "arranger": "",
    "artist": "Speedy J",
    "artist_credit": "Speedy J",
    "artist_sort": "Speedy J",
    "artpath": "None",
    "asin": "B000007UGD",
    "bitdepth": "0",
    "bitrate": "251kbps",
    "bpm": "0",
    "catalognum": "WARPCD27",
    "channels": "2",
    "comments": "",
    "comp": "False",
    "composer": "Jochem Paap",
    "composer_sort": "Paap, Jochem",
    "country": "GB",
    "data_source": "MusicBrainz",
    "day": "27",
    "disc": "01",
    "disctitle": "",
    "disctotal": "01",
    "encoder": "",
    "filesize": "0",
    "format": "MP3",
    "genre": "Electronic",
    "grouping": "",
    "id": "66",
    "initial_key": "",
    "label": "Warp",
    "language": "eng",
    "length": "5:06",
    "lyricist": "",
    "lyrics": "",
    "mb_albumartistid": "734fa82c-864e-468b-bee4-944cb4b1952b",
    "mb_albumid": "015aa9b3-0e76-4121-865a-1b599bc20f8c",
    "mb_artistid": "734fa82c-864e-468b-bee4-944cb4b1952b",
    "mb_releasegroupid": "88d68733-20c1-3518-9d0d-dfab72a8498a",
    "mb_releasetrackid": "a892b027-666b-345a-9f23-60ac91468c86",
    "mb_trackid": "5a28222a-c75b-4572-a4cb-6f73b776ee65",
    "media": "CD",
    "month": "03",
    "mtime": "1969-12-31 19:00:00",
    "original_day": "27",
    "original_month": "03",
    "original_year": "1995",
    "r128_album_gain": "000000",
    "r128_track_gain": "000000",
    "rg_album_gain": "0.0",
    "rg_album_peak": "0.0",
    "rg_track_gain": "0.0",
    "rg_track_peak": "0.0",
    "samplerate": "44kHz",
    "script": "Latn",
    "singleton": "False",
    "title": "Grogono",
    "track": "10",
    "track_alt": "10",
    "tracktotal": "10",
    "year": "1995"
}

It seems to be missing the path. This was generated by running:

beet export --include-keys='*' --library

The schema of beets can be found https://github.com/beetbox/beets/blob/3373b090bdae9bbc9ffb3653beb8553498e3c845/beets/library.py#L421-L493

https://github.com/beetbox/beets/blob/3373b090bdae9bbc9ffb3653beb8553498e3c845/beets/library.py#L893-L934

Both the item and album seem to be available, but only the items are exposed. The database is stored in the beets config folder in the library.db file. Here is its schema.

CREATE TABLE item_attributes (
  id INTEGER PRIMARY KEY, 
  entity_id INTEGER, 
  key TEXT, 
  value TEXT, 
  UNIQUE(entity_id, key) ON CONFLICT REPLACE
);
CREATE INDEX item_attributes_by_entity ON item_attributes (entity_id);
CREATE TABLE albums (
  id INTEGER PRIMARY KEY, artpath BLOB, 
  added REAL, albumartist TEXT, albumartist_sort TEXT, 
  albumartist_credit TEXT, album TEXT, 
  genre TEXT, year INTEGER, month INTEGER, 
  day INTEGER, disctotal INTEGER, comp INTEGER, 
  mb_albumid TEXT, mb_albumartistid TEXT, 
  albumtype TEXT, label TEXT, mb_releasegroupid TEXT, 
  asin TEXT, catalognum TEXT, script TEXT, 
  language TEXT, country TEXT, albumstatus TEXT, 
  albumdisambig TEXT, rg_album_gain REAL, 
  rg_album_peak REAL, r128_album_gain INTEGER, 
  original_year INTEGER, original_month INTEGER, 
  original_day INTEGER
);
CREATE TABLE album_attributes (
  id INTEGER PRIMARY KEY, 
  entity_id INTEGER, 
  key TEXT, 
  value TEXT, 
  UNIQUE(entity_id, key) ON CONFLICT REPLACE
);
CREATE INDEX album_attributes_by_entity ON album_attributes (entity_id);

This information was gathered with beets 1.4.7.

There are a couple of things we could do to integrate with beets.

  1. Open the database and start importing. This would work, but relies directly on implementation details. But, beets isn't really changing. That said, the plugin also exposes implementation details by just enumerating all columns.

  2. Write a plugin to expose album and item data. Item data is exposed by the export plugin, but it is missing the path. Album data isn't exposed. This is leaky, but less leaky than just hooking into the database directly.

0xcaff added a commit that referenced this issue Jun 10, 2018
Still need to decide whether the best input to this program is a file.
See #36.
@0xcaff
Copy link
Member Author

0xcaff commented Jul 15, 2018

It looks like the path is removed from one of the representations. https://github.com/beetbox/beets/blob/db782a2404fa8a6827c10a6536b4a960d19af135/beetsplug/info.py#L69

It still returns the internal item where the data is generated. I think we should opt for 2 because it is a less leaky abstraction. This would require creating a new plugin and injecting at runtime into a beets instance.

@0xcaff
Copy link
Member Author

0xcaff commented Aug 28, 2018

I've reached out to the beets community to see how they would like to go about this. https://discourse.beets.io/t/better-interoperability/460

@0xcaff
Copy link
Member Author

0xcaff commented Sep 19, 2018

The beets community hasn't been very responsive to our idea. If I were to guess why the path isn't returned, it is probably because paths are byte arrays and hard to represent in json. If we wanted to take advantage of beets, we should probably just make our own beets plugin.

@0xcaff
Copy link
Member Author

0xcaff commented Dec 1, 2018

Integrating with beets will take about as much code as just ripping the important ideas from beets and putting it into our thing. It seems like beets does the following:

  • Query A Bunch Of Metadata Providers (Musicbrainz + friends) for each album
  • Take The Best of the Results
  • If there isn't a strong match, ask the user for help disambugating

I don't think we need this now. Maybe later.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant