Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VoiceModelの実装を統合する #746

Closed
3 tasks done
qryxip opened this issue Feb 12, 2024 · 5 comments · Fixed by #830
Closed
3 tasks done

VoiceModelの実装を統合する #746

qryxip opened this issue Feb 12, 2024 · 5 comments · Fixed by #830

Comments

@qryxip
Copy link
Member

qryxip commented Feb 12, 2024

内容

現在blocking::VoiceModelzipクレート、tokio::VoiceModelasync_zipクレートで別々に実装されています。blocking::VoiceModelもasync_zipを使うようにして、実装を統一します。

Pros 良くなる点

実装がバラけているのを解消できる

Cons 悪くなる点

実現方法

async_zipはtokio無しでも動くことを利用する。Rust 1.75からtrait定義に-> impl Traitを書けることを利用して上手く抽象化を行い、blocking::VoiceModelasync_zip::basetokio::VoiceModelasync_zip::tokioで駆動するようにする(async_zip v0.0.16の場合)。

VOICEVOXのバージョン

N/A

OSの種類/ディストリ/バージョン

  • Windows
  • macOS
  • Linux

その他

@qryxip
Copy link
Member Author

qryxip commented Mar 17, 2024

パフォーマンス的に本当にasync_zipに統合してよいのかどうかを知るため、とりあえずmem版(ZIPファイル全体を一度読み込んでから解凍)をベンチしてみました。やはりzipよりは僅かに遅いように見えますが、実用上は誤差になると思います。

image
image

コード
[package]
name = "bench-async-zip"
edition = "2021"
publish = false

[[bench]]
name = "bench"
harness = false

[dependencies]
async_zip = { version = "0.0.16", features = ["deflate"] }
criterion = { version = "0.5.1", features = ["html_reports"] }
futures-lite = "2.2.0"
zip = "0.6.6"
use std::{
    io::{self, Cursor},
    time::Duration,
};

use criterion::{criterion_group, criterion_main, Criterion};

criterion_main!(benches);

criterion_group! {
    name = benches;
    config = Criterion::default()
        .warm_up_time(Duration::from_secs(10))
        .measurement_time(Duration::from_secs(20));
    targets = bench
}

fn bench(criterion: &mut Criterion) {
    criterion
        .bench_function("zip-v0.6.6", |bencher| bencher.iter(bench_zip))
        .bench_function("async_zip-v0.0.16", |bencher| bencher.iter(bench_async_zip));
}

fn bench_zip() -> impl Sized {
    use zip::ZipArchive;

    let mut zip = ZipArchive::new(Cursor::new(SAMPLE_VVM.to_owned())).unwrap();
    let mut entry = zip.by_name("decode.onnx").unwrap();
    let mut buf = Vec::with_capacity(entry.size() as _);
    io::copy(&mut entry, &mut buf).unwrap();
    buf
}

fn bench_async_zip() -> impl Sized {
    futures_lite::future::block_on(async {
        let zip = async_zip::base::read::mem::ZipFileReader::new(SAMPLE_VVM.to_owned())
            .await
            .unwrap();
        let (idx, _) = zip
            .file()
            .entries()
            .iter()
            .enumerate()
            .find(|(_, e)| e.filename().as_str().unwrap() == "decode.onnx")
            .unwrap();
        let mut rdr = zip.reader_with_entry(idx).await.unwrap();
        let mut buf = Vec::with_capacity(rdr.entry().uncompressed_size() as _);
        rdr.read_to_end_checked(&mut buf).await.unwrap();
        buf
    })
}

static SAMPLE_VVM: &[u8] = include_bytes!("../sample.vvm");

@Hiroshiba
Copy link
Member

やはりzipよりは僅かに遅いように見えますが、実用上は誤差になると思います。

同感です!!

@qryxip
Copy link
Member Author

qryxip commented Sep 7, 2024

#828 についてを兼ねた検証を行ったので自分用のメモを残します。検証についての詳細は割愛します。

検証前はasync_zipはやめてzipにすべきかなと思ってましたが、async_zipの方で良さそうです。というかむしろ速度的にもこっちが良くなることがわかりました。

検証
[package]
name = "bench-async-zip"
edition = "2021"
publish = false

[[bench]]
name = "bench"
harness = false

[dependencies]
async-fs = "2.1.2"
async_zip = { version = "=0.0.17", features = ["deflate", "tokio-fs"] }
#async_zip = { git = "https://github.com/Majored/rs-async-zip.git", rev = "2d841d600c6d509cb4ecc611001ec339876ca6c9", features = ["deflate", "tokio-fs"] }
criterion = { version = "0.5.1", features = ["html_reports"] }
futures-lite = "2.3.0"
futures-util = { version = "0.3.30", features = ["io"] }
pollster = { version = "0.3.0", features = ["macro"] }
tokio = { version = "1.40.0", features = ["macros", "rt-multi-thread"] }
zip = "2.2.0"
use std::{
    io::{self, Cursor},
    sync::Arc,
};

use criterion::{criterion_group, criterion_main, Criterion};
use futures_lite::AsyncReadExt as _;
use tokio::task::JoinSet;

criterion_main!(benches);

criterion_group! {
    name = benches;
    //config = Criterion::default()
    //    .warm_up_time(Duration::from_secs(10))
    //    .measurement_time(Duration::from_secs(20));
    config = Criterion::default();
    targets = bench
}

fn bench(criterion: &mut Criterion) {
    std::fs::read("./sample.vvm").unwrap();
    criterion
        .bench_function("zip-v2.2.0-mem-jsons", |bencher| {
            bencher.iter(bench_zip_mem_jsons)
        })
        .bench_function("zip-v2.2.0-mem-onnxs", |bencher| {
            bencher.iter(bench_zip_mem_onnxs)
        })
        .bench_function("zip-v2.2.0-seek-jsons", |bencher| {
            bencher.iter(bench_zip_seek_jsons)
        })
        .bench_function("zip-v2.2.0-seek-onnxs", |bencher| {
            bencher.iter(bench_zip_seek_onnxs)
        })
        .bench_function("async_zip-v0.0.17-mem-jsons", |bencher| {
            bencher.iter(bench_async_zip_mem_jsons)
        })
        .bench_function("async_zip-v0.0.17-mem-onnxs", |bencher| {
            bencher.iter(bench_async_zip_mem_rest_file)
        })
        .bench_function("async_zip-v0.0.17-without-tokio-seek-jsons", |bencher| {
            bencher.iter(bench_async_zip_without_tokio_seek_jsons)
        })
        .bench_function("async_zip-v0.0.17-without-tokio-seek-onnxs", |bencher| {
            bencher.iter(bench_async_zip_without_tokio_seek_onnxs)
        })
        .bench_function("async_zip-v0.0.17-with-tokio-seek-jsons", |bencher| {
            bencher.iter(bench_async_zip_with_tokio_seek_jsons)
        });
}

fn bench_zip_mem_jsons() -> impl Sized {
    bench_zip_mem(&["manifest.json", "metas.json"])
}

fn bench_zip_mem_onnxs() -> impl Sized {
    bench_zip_mem(&[
        "predict_duration.onnx",
        "predict_intonation.onnx",
        "decode.onnx",
    ])
}

#[tokio::main(flavor = "multi_thread", worker_threads = 10)]
async fn bench_zip_mem(filenames: &[&'static str]) -> impl Sized {
    let zip = Arc::new(std::fs::read("./sample.vvm").unwrap());
    let mut tasks = JoinSet::new();
    for &filename in filenames {
        let zip = zip.clone();
        tasks.spawn_blocking(move || {
            let mut zip = zip::ZipArchive::new(Cursor::new(&**zip)).unwrap();
            let mut entry = zip.by_name(filename).unwrap();
            let mut buf = Vec::with_capacity(entry.size() as _);
            io::copy(&mut entry, &mut buf).unwrap();
            buf
        });
    }
    tasks.join_all().await
}

fn bench_zip_seek_jsons() -> impl Sized {
    bench_zip_seek(&["manifest.json", "metas.json"])
}

fn bench_zip_seek_onnxs() -> impl Sized {
    bench_zip_seek(&[
        "predict_duration.onnx",
        "predict_intonation.onnx",
        "decode.onnx",
    ])
}

fn bench_zip_seek(filenames: &[&str]) -> impl Sized {
    let zip = std::fs::File::open("./sample.vvm").unwrap();
    let mut zip = zip::ZipArchive::new(zip).unwrap();
    filenames
        .iter()
        .map(|filename| {
            let mut entry = zip.by_name(filename).unwrap();
            let mut buf = Vec::with_capacity(entry.size() as _);
            io::copy(&mut entry, &mut buf).unwrap();
            buf
        })
        .collect::<Vec<_>>()
}

fn bench_async_zip_mem_jsons() -> impl Sized {
    bench_async_zip_mem(&["manifest.json", "metas.json"])
}

fn bench_async_zip_mem_rest_file() -> impl Sized {
    bench_async_zip_mem(&[
        "predict_duration.onnx",
        "predict_intonation.onnx",
        "decode.onnx",
    ])
}

#[tokio::main(flavor = "multi_thread", worker_threads = 10)]
async fn bench_async_zip_mem(filenames: &[&'static str]) -> impl Sized {
    let zip = tokio::fs::read("./sample.vvm").await.unwrap();
    let zip = async_zip::base::read::mem::ZipFileReader::new(zip)
        .await
        .unwrap();
    let zip = Arc::new(zip);
    let mut tasks = JoinSet::new();
    for &filename in filenames {
        let zip = zip.clone();
        tasks.spawn(async move {
            let (idx, _) = zip
                .file()
                .entries()
                .iter()
                .enumerate()
                .find(|(_, e)| e.filename().as_str().unwrap() == filename)
                .unwrap();
            let mut rdr = zip.reader_with_entry(idx).await.unwrap();
            let mut buf = Vec::with_capacity(rdr.entry().uncompressed_size() as _);
            rdr.read_to_end_checked(&mut buf).await.unwrap();
            buf
        });
    }
    tasks.join_all().await
}

fn bench_async_zip_without_tokio_seek_jsons() -> impl Sized {
    bench_async_zip_without_tokio_seek(&["manifest.json", "metas.json"])
}

fn bench_async_zip_without_tokio_seek_onnxs() -> impl Sized {
    bench_async_zip_without_tokio_seek(&[
        "predict_duration.onnx",
        "predict_intonation.onnx",
        "decode.onnx",
    ])
}

#[pollster::main]
async fn bench_async_zip_without_tokio_seek(filenames: &[&str]) -> impl Sized {
    let zip = futures_util::io::BufReader::new(async_fs::File::open("./sample.vvm").await.unwrap());
    let mut zip = async_zip::base::read::seek::ZipFileReader::new(zip)
        .await
        .unwrap();
    let mut ret = vec![];
    for filename in filenames {
        let (idx, _) = zip
            .file()
            .entries()
            .iter()
            .enumerate()
            .find(|(_, e)| e.filename().as_str().ok() == Some(filename))
            .unwrap();
        let mut entry = zip.reader_without_entry(idx).await.unwrap();
        let mut buf = vec![];
        entry.read_to_end(&mut buf).await.unwrap();
        ret.push(buf);
    }
    ret
}

#[tokio::main(flavor = "current_thread")]
async fn bench_async_zip_with_tokio_seek_jsons() -> impl Sized {
    let zip = async_zip::tokio::read::fs::ZipFileReader::new("./sample.vvm")
        .await
        .unwrap();
    let zip = Arc::new(zip);
    let mut tasks = JoinSet::new();
    for filename in ["manifest.json", "metas.json"] {
        let zip = zip.clone();
        tasks.spawn(async move {
            let (idx, _) = zip
                .file()
                .entries()
                .iter()
                .enumerate()
                .find(|(_, e)| e.filename().as_str().ok() == Some(filename))
                .unwrap();
            let mut entry = zip.reader_without_entry(idx).await.unwrap();
            let mut buf = vec![];
            entry.read_to_end(&mut buf).await.unwrap();
            buf
        });
    }
    tasks.join_all().await
}

image

@qryxip
Copy link
Member Author

qryxip commented Sep 8, 2024

async-fs使って思ったのですが他のVoiceModel以外(Synthesizerとか)も脱tokio、というか「ランタイムレス」にできそうですね。同期コード上で非同期版APIをおもむろにawaitできるようになります。

現在(#830 での)VoiceModel以外は同期版APIをラップして非同期版APIを作ってますが、VoiceModel同様に逆転させてもいいかもしれません。async APIとしての中断ポイントとかが設計しやすくなりますし、 #687 の導入も円滑になりそう。

@Hiroshiba
Copy link
Member

キャンセルの導入が円滑になるの良さそうですね!!

コードがどれくらい複雑になっちゃうかがちょっと気になってます。
本当に非同期の方を主体にして、同期版の方.block_on()するだけなのであれば全然コードの難しさは上がらないので良さそう!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants