change: rework GPU features #810

qryxip · 2024-07-29T17:01:29Z

内容

#804 (comment)の案2.を実装します。

Cargo featureとしてcudaとdirectmlを廃止し、次の2つに統合します。

load-onnxruntime: すべてのEPが利用可能
link-onnxruntime: CPU以外のEPは利用不可

さらにリリースにはONNX Runtimeを含めないようにして、代わりにダウンローダーにonnxruntime-builderからのダウンロード機能を持たせます。これによりVOICEVOX COREとしては「CUDA版ビルド」と「DirectML版ビルド」が無くなります。

また #783 も行います。acceleration_modeがAutoかGpuのときは「GPUをテスト」し、Autoのときにすべて失敗するならCPU版にフォールバックします。

[INFO]  acceleration_mode=<AccelerationMode.AUTO: 'AUTO'>
[INFO]  GPUをテストします:
[INFO]    * CUDA (device_id=0): OK
[INFO]    * DirectML (device_id=0): Not supported by the current loaded ONNX Runtime
[INFO]  CUDA (device_id=0)を利用します
[DEBUG] synthesizer.is_gpu_mode=True

[INFO]  acceleration_mode=<AccelerationMode.AUTO: 'AUTO'>
[INFO]  GPUをテストします:
[INFO]    * CUDA (device_id=0): /home/runner/work/onnxruntime-builder/onnxruntime-builder/onnxruntime/core/session/provider_bridge_ort.cc:1209 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libvoicevox_onnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory
[INFO]    * DirectML (device_id=0): Not supported by the current loaded ONNX Runtime
[INFO]  CPUを利用します
[DEBUG] synthesizer.is_gpu_mode=False

[INFO]  acceleration_mode=<AccelerationMode.GPU: 'GPU'>
[INFO]  GPUをテストします:
[INFO]    * CUDA (device_id=0): /home/runner/work/onnxruntime-builder/onnxruntime-builder/onnxruntime/core/session/provider_bridge_ort.cc:1209 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libvoicevox_onnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory
[INFO]    * DirectML (device_id=0): Not supported by the current loaded ONNX Runtime
Traceback (most recent call last):
  File "/home/ryo/src/github.com/VOICEVOX/voicevox_core/main/example/python/./run.py", line 117, in <module>
    main()
  File "/home/ryo/src/github.com/VOICEVOX/voicevox_core/main/example/python/./run.py", line 35, in main
    synthesizer = Synthesizer(
                  ^^^^^^^^^^^^
voicevox_core.GpuSupportError: GPU機能をサポートすることができません:
* CUDA (device_id=0): /home/runner/work/onnxruntime-builder/onnxruntime-builder/onnxruntime/core/session/provider_bridge_ort.cc:1209 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libvoicevox_onnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory
* DirectML (device_id=0): Not supported by the current loaded ONNX Runtime

その他

Hiroshiba

一旦ダウンローダー周りだけ見てみました。
リリースノートをパースする形で別にいいんじゃないかなと思いました！
思ってたより複雑じゃなかった。lockの変更は大きいかもだけど。

ただまぁ HTML 的にこっちの方がいいのではと思った点が1個だけあったのでコメントしてみました。
全然強い意見じゃないです。別にそのままでも良さそう。

crates/downloader/src/main.rs

qryxip · 2024-08-02T19:09:12Z

317af0f (#810): TODOコメントを一つ解消しました。
6a29b20 (#810): acceleration_mode={Auto,Gpu}時にGPUを「テスト」するときの表示を日本語にしました。

crates/downloader/src/main.rs

* add: リリースノートに機械可読な`<table>`を置く * `<table>`の組み立てを`build-spec-table`に集約する <#43 (comment)> * `data-*`属性に情報を書く <VOICEVOX/voicevox_core#810 (comment)> * fixup! `data-*`属性に情報を書く * "ｱｰｷﾃｸﾁｬ"に統一する <#43 (comment)>

Hiroshiba

ちょっといくつか気になった点をコメントしました 🙇

Hiroshiba · 2024-08-03T18:03:57Z

crates/voicevox_core/src/error.rs

+    #[error("GPU機能をサポートすることができません:\n{_0}")]
+    GpuSupport(DeviceAvailabilities),


エラー文を外から突っ込む、みたいな形式なんですね。
一般的なのかはわからないけど、型あるから追えるし良さそう。

crates/voicevox_core/src/infer.rs

crates/voicevox_core/src/infer/runtimes/onnxruntime.rs

crates/voicevox_core/src/lib.rs

Hiroshiba · 2024-08-03T18:28:45Z

crates/voicevox_core/src/synthesizer.rs

+                    GpuSpec::defaults(),
+                    crate::blocking::Onnxruntime::DISPLAY_NAME,
+                    onnxruntime.supported_devices()?,
+                    |gpu| onnxruntime.test_gpu(gpu),


この処理って結構時間かかったりしますかね･･･？
なんか結構かかりそう感ありますね。。。。

まあ仮に結構時間かかっても、気になってるのはエンジン側の起動時間なので、synthesizer作るのを遅延させるとかで迂回できそうであれば、まあ良さそうかな～～～

一瞬だと思ったんですがうーんどうなんでしょう。まだ計測はしてませんが、もしかしたら数百msはかかるかも。

後でDirectMLで計測してみようかと思います。

計測ありがとうございます 🙇

初回起動は割とかかってもおかしくないと直感しています。
デバイスの初期化も入る可能性があるので、そこそこ待つ感じでもおかしくないはず。

まあDirectMLで300msくらいなら全く問題ないかも！
CUDAだともっとかかるかもだけど、まあユーザー数で見ればだいぶ少ないでしょうし･･･。

LinuxのCUDAだと成功時で200msから350msといった感じでした。NVIDIA GPUの数を超えたdevice_idを指定して失敗した場合も同じような時間で、システムからCUDA自体を抜くと0.5msで失敗するという感じ。

ただ後でWindows機二台(片方はGeForce機、もう片方はRadeon機)でも試すつもりです。Windowsだと秒単位かかるという可能性を否定しきれない… その前にWindows Updateからですが。

どうも「一瞬」よりはだいぶ遅いようなので、非同期ランタイムとか表示にはもうちょっと工夫は入れないとですね。あとSessionOptionsはSynthesizer内にキャッシュしないと…と思いましたがort側でSend付けないとですね。ちょっと面倒だ。

うおおなるほどです、ありがとうございます！！！
そうか、失敗時も時間かかる場合あるんですね･･･。

SessionOptionsはSynthesizer内にキャッシュしないと

わかってないのですが、SessionOptions生成に必要なパラメータをキャッシュする形もあり得るかもと思いました。
SessionOptionsをキャッシュできたほうが手っ取り早いかもなのですが、アイデアまで 🙇

わかってないのですが、SessionOptions生成に必要なパラメータをキャッシュする形もあり得るかもと思いました。

多分それが現状の形ですね。現状可変なのはIntraOpNumThreadsしかないので、それに対応するものとしてcpu_num_threadsと、EPの有無だけ持っていたはずです。

SessionOptions自体をキャッシュしたいと言ったのは、SessionOptionsにEPを登録するという形なので、SessionOptionsを使いまわせたら個々のdecodeセッション作成のパフォーマンスが向上する上に設計的にもよいのではないかと思ったからでした。

…と思いましたが、どうやらパフォーマンスについてはONNX Runtime側が上手く吸収してくれているっぽい? WindowsやmacOSだとどうかわかりませんが、そもそもSessionOptionsを使いまわしてよいものかどうかわからないので現状のままでよさそう。
(それでも0.15msくらいはかかりそうですが、許容できる範囲ではあるかと)

[crates/voicevox_core/src/infer/runtimes/onnxruntime.rs:55:9] gpu = Cuda [crates/voicevox_core/src/infer/runtimes/onnxruntime.rs:66:9] end - start = 312.693262ms [crates/voicevox_core/src/infer/runtimes/onnxruntime.rs:55:9] gpu = Cuda [crates/voicevox_core/src/infer/runtimes/onnxruntime.rs:66:9] end - start = 153.807µs

それはともかく、300msから(最悪)数秒かかるならSynthesizer::newはasync化した方がよいかも。

あ～なるほどです！！
何もせずに早いならそのままでも良い説ありそう。
というかもし使い回す形なら、Synthesizerをまたいで（別のインスタンスにも）渡したくなってきたりも考えれてしまってもっと複雑化してきそうなので、使い回さない形で済むならありがたいかもしれないですね 😇

Synthesizer::newはasync化した方がよいかも。

そうかもだけど、時間かかりそうなら後回しでもそんなに問題ないかもって感じかなと！
VOICEVOX的に気になるのはエンジンの起動速度ですが、まあ多分さすがにかかるにしてもここは1秒未満だと思ってて、まだ他に大きなボトルネックがありそうなのでここが並列になってもあまり変わらないかなと。
ライブラリ的にも、そんなに起動速度が必要になる場面はあまりない気がしてます。

とはいえ確かに並列にできた方がいいかもしれない。
（その100倍ぐらいリリースが待ち遠しいけど･･･！）

Hiroshiba

LGTM！！！

Hiroshiba · 2024-08-04T15:16:21Z

crates/voicevox_core/src/lib.rs

+//! - **`load-onnxruntime`**: ONNX Runtimeを`dlopen`/`LoadLibraryExW`で
+//!     開きます。[CUDA]と[DirectML]が利用できます。


これをダウンロードするだけじゃなくproviderが必要なので若干語弊があるかもですが、まあ･･･。

Hiroshiba · 2024-08-04T15:17:59Z

次の流れ的にはonnxruntime 1.18.1のビルドを目指す感じですかね！

#810 (comment)

qryxip · 2024-08-07T15:23:16Z

@takejohn 共有です!

Cargo featureのcudaとdirectmlは消滅しました。
SupportedDevicesのドキュメントの文面が変わりました。

change: rework GPU features

790499d

This was referenced Jul 29, 2024

add: リリースノートに機械可読な<table>を置く VOICEVOX/onnxruntime-builder#43

Merged

add: onnxruntime-win-x64-gpu-cudaを追加 VOICEVOX/onnxruntime-builder#44

Merged

Hiroshiba reviewed Jul 29, 2024

View reviewed changes

crates/downloader/src/main.rs Show resolved Hide resolved

crates/downloader/src/main.rs Outdated Show resolved Hide resolved

bodyに対してコメント

55878c5

qryxip added a commit to qryxip/onnxruntime-builder that referenced this pull request Aug 1, 2024

data-*属性に情報を書く

0c25909

<VOICEVOX/voicevox_core#810 (comment)>

qryxip added 4 commits August 2, 2024 23:40

<table>の組み立てをbuild-spec-tableに集約する

a798171

fixup! <table>の組み立てをbuild-spec-tableに集約する

d4fca5e

GpuSpec::defaultsの網羅性をテスト

317af0f

DeviceAvailabilitiesの表示を日本語にする

6a29b20

qryxip commented Aug 2, 2024

View reviewed changes

crates/downloader/src/main.rs Outdated Show resolved Hide resolved

デフォルトをVOICEVOX/onnxruntime-builder宛てにする

6f569a3

qryxip marked this pull request as ready for review August 3, 2024 07:20

qryxip requested a review from Hiroshiba August 3, 2024 07:20

Merge branch 'main' into change/rework-gpu-features

d066aa2

Hiroshiba reviewed Aug 3, 2024

View reviewed changes

qryxip added 2 commits August 4, 2024 19:31

if cfg!(…)の形にする

96ad3cb

load-onnxruntimeでCUDAとDirectMLが使えることに言及

669625a

qryxip requested a review from Hiroshiba August 4, 2024 10:45

Hiroshiba approved these changes Aug 4, 2024

View reviewed changes

Hiroshiba merged commit bf8cdb8 into VOICEVOX:main Aug 4, 2024
37 checks passed

qryxip linked an issue Sep 5, 2024 that may be closed by this pull request

ダウンローダーでonnxruntime.dllとかだけダウンロードできるようにする？ #698

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change: rework GPU features #810

change: rework GPU features #810

qryxip commented Jul 29, 2024

Hiroshiba left a comment

qryxip commented Aug 2, 2024

Hiroshiba left a comment

Hiroshiba Aug 3, 2024

Hiroshiba Aug 3, 2024

qryxip Aug 4, 2024

Hiroshiba Aug 4, 2024 •

edited

Loading

qryxip Aug 4, 2024 •

edited

Loading

Hiroshiba Aug 5, 2024

qryxip Aug 5, 2024 •

edited

Loading

Hiroshiba Aug 5, 2024 •

edited

Loading

Hiroshiba left a comment

Hiroshiba Aug 4, 2024

Hiroshiba commented Aug 4, 2024

qryxip commented Aug 7, 2024

		#[error("GPU機能をサポートすることができません:\n{_0}")]
		GpuSupport(DeviceAvailabilities),

		//! - `load-onnxruntime`: ONNX Runtimeを`dlopen`/`LoadLibraryExW`で
		//! 開きます。[CUDA]と[DirectML]が利用できます。

change: rework GPU features #810

change: rework GPU features #810

Conversation

qryxip commented Jul 29, 2024

内容

関連 Issue

その他

Hiroshiba left a comment

Choose a reason for hiding this comment

qryxip commented Aug 2, 2024

Hiroshiba left a comment

Choose a reason for hiding this comment

Hiroshiba Aug 3, 2024

Choose a reason for hiding this comment

Hiroshiba Aug 3, 2024

Choose a reason for hiding this comment

qryxip Aug 4, 2024

Choose a reason for hiding this comment

Hiroshiba Aug 4, 2024 • edited Loading

Choose a reason for hiding this comment

qryxip Aug 4, 2024 • edited Loading

Choose a reason for hiding this comment

Hiroshiba Aug 5, 2024

Choose a reason for hiding this comment

qryxip Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

Hiroshiba Aug 5, 2024 • edited Loading

Choose a reason for hiding this comment

Hiroshiba left a comment

Choose a reason for hiding this comment

Hiroshiba Aug 4, 2024

Choose a reason for hiding this comment

Hiroshiba commented Aug 4, 2024

qryxip commented Aug 7, 2024

Hiroshiba Aug 4, 2024 •

edited

Loading

qryxip Aug 4, 2024 •

edited

Loading

qryxip Aug 5, 2024 •

edited

Loading

Hiroshiba Aug 5, 2024 •

edited

Loading