You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By documentation of torchaudio.load(), the expected behaviour for handling 24-bit WAV files is the following:
Since torch does not support int24 dtype, 24-bit signed PCM are converted to int32 tensors.
When calling torchaudio.load() in Windows, using PySoundFile, on a 24-bit WAV file, the rows 222 - 228:
with soundfile.SoundFile(filepath, "r") as file_:
if file_.format != "WAV" or normalize:
dtype = "float32"
elif file_.subtype not in _SUBTYPE2DTYPE:
raise ValueError(f"Unsupported subtype: {file_.subtype}")
else:
dtype = _SUBTYPE2DTYPE[file_.subtype]
and _SUBTYPE2DTYPE is: _SUBTYPE2DTYPE = { "PCM_S8": "int8", "PCM_U8": "uint8", "PCM_16": "int16", "PCM_32": "int32", "FLOAT": "float32", "DOUBLE": "float64", }
are raising the following error: ValueError: Unsupported subtype: PCM_24
Adding a simple "PCM_24:" "int32" to _SUBTYPE2DTYPE solves the problem.
Versions
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro
GCC version: (MinGW.org GCC-6.3.0-1) 6.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1050
Nvidia driver version: 555.99
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
I think this is by design, since 24 is not a power of 2.
I encountered the same error on linux, and it was solved by using the normalize=True parameter in the call of torchaudio.load() (this converts the audio file to Float32 format, its default value is True but I manually set it to False before).
🐛 Describe the bug
By documentation of
torchaudio.load()
, the expected behaviour for handling 24-bit WAV files is the following:When calling
torchaudio.load()
in Windows, using PySoundFile, on a 24-bit WAV file, the rows 222 - 228:and
_SUBTYPE2DTYPE
is:_SUBTYPE2DTYPE = { "PCM_S8": "int8", "PCM_U8": "uint8", "PCM_16": "int16", "PCM_32": "int32", "FLOAT": "float32", "DOUBLE": "float64", }
are raising the following error:
ValueError: Unsupported subtype: PCM_24
Adding a simple
"PCM_24:" "int32"
to_SUBTYPE2DTYPE
solves the problem.Versions
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Microsoft Windows 10 Pro
GCC version: (MinGW.org GCC-6.3.0-1) 6.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A
Python version: 3.12.0 (tags/v3.12.0:0fb18b0, Oct 2 2023, 13:03:39) [MSC v.1935 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 12.1.105
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1050
Nvidia driver version: 555.99
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture=9
CurrentClockSpeed=2801
DeviceID=CPU0
Family=198
L2CacheSize=1024
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=2801
Name=Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
ProcessorType=3
Revision=
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.3.1+cu121
[pip3] torchaudio==2.3.1+cu121
[pip3] torchvision==0.18.1+cu121
[conda] Could not collect
The text was updated successfully, but these errors were encountered: