Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance README for Improved User Experience and Clarity #53

Merged
merged 1 commit into from
Oct 4, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 44 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,35 +3,50 @@
<a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/-Python 3.8+-blue?style=for-the-badge&logo=python&logoColor=white"></a>
<a href="https://black.readthedocs.io/en/stable/"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-black.svg?style=for-the-badge&labelColor=gray"></a>

Inference code for AudioSeal, a method for speech localized watermarking, with state-of-the-art robustness and detector speed (training code coming soon).
More details can be found in the [paper](https://arxiv.org/abs/2401.17264).
This repo contains the Inference code for **AudioSeal**, a method for speech localized watermarking, with state-of-the-art robustness and detector speed (training code coming soon).

To learn more, check out our [paper](https://arxiv.org/abs/2401.17264).

# :rocket: Quick Links:

[[`arXiv`](https://arxiv.org/abs/2401.17264)]
[[🤗`Hugging Face`](https://huggingface.co/facebook/audioseal)]
[[`Colab notebook`](https://colab.research.google.com/github/facebookresearch/audioseal/blob/master/examples/colab.ipynb)]
[[`Colab Notebook`](https://colab.research.google.com/github/facebookresearch/audioseal/blob/master/examples/colab.ipynb)]
[[`Webpage`](https://pierrefdz.github.io/publications/audioseal/)]
[[`Blog`](https://about.fb.com/news/2024/06/releasing-new-ai-research-models-to-accelerate-innovation-at-scale/)]
[[`Press`](https://www.technologyreview.com/2024/06/18/1094009/meta-has-created-a-way-to-watermark-ai-generated-speech/)]

![fig](https://github.com/facebookresearch/audioseal/assets/1453243/5d8cd96f-47b5-4c34-a3fa-7af386ed59f2)

# Updates:
# :sparkles: Key Updates:

- 2024-06-17: Training code is now available. Check the [instruction](./docs/TRAINING.md) !!!
- 2024-06-17: Training code is now available. Check the [instruction](./docs/TRAINING.md)!!!
- 2024-05-31: Our paper gets accepted at ICML'24 :)
- 2024-04-02: We have updated our license to full MIT license (including the license for the model weights) ! Now you can use AudioSeal in commercial application too !
- 2024-04-02: We have updated our license to full MIT license (including the license for the model weights) ! Now you can use AudioSeal in commercial application too!
- 2024-02-29: AudioSeal 0.1.2 is out, with more bug fixes for resampled audios and updated notebooks

# Abtract

We introduce AudioSeal, a method for speech localized watermarking, with state-of-the-art robustness and detector speed. It jointly trains a generator that embeds a watermark in the audio, and a detector that detects the watermarked fragments in longer audios, even in the presence of editing.
Audioseal achieves state-of-the-art detection performance of both natural and synthetic speech at the sample level (1/16k second resolution), it generates limited alteration of signal quality and is robust to many types of audio editing.
Audioseal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed — achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.
# :book: Abstract

**AudioSeal** introduces a breakthrough in **proactive, localized watermarking** for speech. It jointly trains two components: a **generator** that embeds an imperceptible watermark into audio and a **detector** that identifies watermark fragments in long or edited audio files.

- **Key Features:**
- **Localized watermarking** at the sample level (1/16,000 of a second).
- Minimal impact on audio quality.
- **Robust** against various audio edits like compression, re-encoding, and noise addition.
- **Fast, single-pass detection** designed to surpass existing models significantly in speed — achieving detection up to **two orders of magnitude faster**, making it ideal for large-scale and real-time applications.


# :mate: Installation
# :gear: Installation

AudioSeal requires Python >=3.8, Pytorch >= 1.13.0, [omegaconf](https://omegaconf.readthedocs.io/), [julius](https://pypi.org/project/julius/), and numpy. To install from PyPI:
### Requirements:
- Python >= 3.8
- Pytorch >= 1.13.0
- [Omegaconf](https://omegaconf.readthedocs.io/)
- [Julius](https://pypi.org/project/julius/)
- [Numpy](https://pypi.org/project/numpy/)

### Install from PyPI:
```
pip install audioseal
```
Expand All @@ -48,18 +63,16 @@ pip install -e .

You can find all the model checkpoints on the [Hugging Face Hub](https://huggingface.co/facebook/audioseal). We provide the checkpoints for the following models:

- [AudioSeal Generator](src/audioseal/cards/audioseal_wm_16bits.yaml).
It takes as input an audio signal (as a waveform), and outputs a watermark of the same size as the input, that can be added to the input to watermark it.
Optionally, it can also take as input a secret message of 16-bits that will be encoded in the watermark.
- [AudioSeal Detector](src/audioseal/cards/audioseal_detector_16bits.yaml).
It takes as input an audio signal (as a waveform), and outputs a probability that the input contains a watermark at each sample of the audio (every 1/16k s).
Optionally, it may also output the secret message encoded in the watermark.
- [AudioSeal Generator](src/audioseal/cards/audioseal_wm_16bits.yaml):
Takes an audio signal (as a waveform) and outputs a watermark of the same size as the input, which can be added to the input to watermark it. Optionally, it can also take a secret 16-bit message to embed in the watermark.
- [AudioSeal Detector](src/audioseal/cards/audioseal_detector_16bits.yaml):
Takes an audio signal (as a waveform) and outputs the probability that the input contains a watermark at each sample (every 1/16k second). Optionally, it may also output the secret message encoded in the watermark.

Note that the message is optional and has no influence on the detection output. It may be used to identify a model version for instance (up to $2**16=65536$ possible choices).

# :abacus: Usage

Audioseal provides a simple API to watermark and detect the watermarks from an audio sample. Example usage:
Here’s a quick example of how you can use AudioSeal’s API to embed and detect watermarks:

```python

Expand All @@ -73,7 +86,7 @@ model = AudioSeal.load_generator("audioseal_wm_16bits")

# a torch tensor of shape (batch, channels, samples) and a sample rate
# It is important to process the audio to the same sample rate as the model
# expectes. In our case, we support 16khz audio
# expects. In our case, we support 16khz audio
wav, sr = ..., 16000

watermark = model.get_watermark(wav, sr)
Expand Down Expand Up @@ -105,16 +118,16 @@ print(result[:, 1 , :])
print(message)
```

# Train your own watermarking model
# :rocket: Train your own watermarking model

See [here](./docs/TRAINING.md) for details on how to train your own Watermarking model.
Interested in training your own watermarking model? Check out our [training documentation](./docs/TRAINING.md) to get started.

# Want to contribute?
# :wave: Want to contribute?

We welcome Pull Requests with improvements or suggestions.
If you want to flag an issue or propose an improvement, but dont' know how to realize it, create a GitHub Issue.
We welcome pull requests with improvements or suggestions.
If you wish to report an issue or propose an enhancement but are unsure how to implement it, feel free to create a GitHub issue.

# Troubleshooting
# :bug: Troubleshooting

- If you encounter the error `ValueError: not enough values to unpack (expected 3, got 2)`, this is because we expect a batch of audio tensors as inputs. Add one
dummy batch dimension to your input (e.g. `wav.unsqueeze(0)`, see [example notebook for getting started](examples/Getting_started.ipynb)).
Expand All @@ -126,19 +139,19 @@ and re-run again.
- If you use torchaudio to handle your audios and encounter the error `Couldn't find appropriate backend to handle uri ...`, this is due to newer version of
torchaudio does not handle the default backend well. Either downgrade your torchaudio to `2.1.0` or earlier, or install `soundfile` as your audio backend.

# License
# :page_with_curl: License

- The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE).
- The code in this repository is licensed under the MIT license as detailed in the [LICENSE file](LICENSE). This license permits reuse, modification, and distribution of the software, as long as the original license is included.

# Maintainers:
# :star2: Maintainers:
- [Tuan Tran](https://github.com/antoine-tran)
- [Hady Elsahar](https://github.com/hadyelsahar)
- [Pierre Fernandez](https://github.com/pierrefdz)
- [Robin San Roman](https://github.com/robinsrm)

# Citation
# :scroll: Citation

If you find this repository useful, please consider giving a star :star: and please cite as:
If you find this repository useful, please consider giving it a star :star: and citing our work:

```
@article{sanroman2024proactive,
Expand Down
Loading