We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When loading the common voice dataset on windows, the file train.tsv is loaded using cp1252 file encoding, leading to a failure.
train.tsv
training_speech_dataset = torchaudio.datasets.COMMONVOICE(root=base_dataset_cache_directory)
--------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) Cell In[49], line 1 ----> 1 training_speech_dataset = torchaudio.datasets.COMMONVOICE(root=base_dataset_cache_directory) File ~\Documents\GitHub\clarification\venv-pc\Lib\site-packages\torchaudio\datasets\commonvoice.py:55, in COMMONVOICE.__init__(self, root, tsv) 53 walker = csv.reader(tsv_, delimiter="\t") 54 self._header = next(walker) ---> 55 self._walker = list(walker) File ~\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py:23, in IncrementalDecoder.decode(self, input, final) 22 def decode(self, input, final=False): ---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 3155: character maps to <undefined>
Python 3.11
The text was updated successfully, but these errors were encountered:
You can try to download it from hugging face:
https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0
Sorry, something went wrong.
No branches or pull requests
🐛 Describe the bug
When loading the common voice dataset on windows, the file
train.tsv
is loaded using cp1252 file encoding, leading to a failure.Versions
Python 3.11
The text was updated successfully, but these errors were encountered: