Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinkdb-import impossible to use on windows #217

Open
ShadowJonathan opened this issue Jun 29, 2020 · 26 comments
Open

Rethinkdb-import impossible to use on windows #217

ShadowJonathan opened this issue Jun 29, 2020 · 26 comments
Labels
bug Something isn't working not qualified The issue is not checked yet by the owners

Comments

@ShadowJonathan
Copy link

Describe the bug
On Windows 10, with Python 3.8, rethinkdb-import simply refuses to function due to obscure python multiprocessing errors.

To Reproduce
Steps to reproduce the behavior:

  1. pip install rethinkdb from python 3.8 (on windows)
  2. rethinkdb import [options]

Expected behavior
Normal operation, the starting of importing data

System info

  • OS: Windows 10 (build 19041.329)
  • RethinkDB Version: 2.4.0~0buster (docker container)
  • RethinkDB Python adapter Version: 2.4.7

Additional context

PS D:\k8smig\docker\mongodb\_local> rethinkdb-import --file .\tumblr.posts.json --table tumblr.posts -c vanguard --force
Traceback (most recent call last):
  File "c:\python\3.8\lib\runpy.py", line 193, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\python\3.8\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Python\3.8\Scripts\rethinkdb-import.exe\__main__.py", line 7, in <module>
  File "c:\python\3.8\lib\site-packages\rethinkdb\_import.py", line 1716, in main
    import_tables(options, sources)
  File "c:\python\3.8\lib\site-packages\rethinkdb\_import.py", line 1359, in import_tables
    progress_bar.start()
  File "c:\python\3.8\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
  File "c:\python\3.8\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "c:\python\3.8\lib\multiprocessing\context.py", line 326, in _Popen
    return Popen(process_obj)
  File "c:\python\3.8\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)
  File "c:\python\3.8\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread._local' object
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\python\3.8\lib\multiprocessing\spawn.py", line 107, in spawn_main
    new_handle = reduction.duplicate(pipe_handle,
  File "c:\python\3.8\lib\multiprocessing\reduction.py", line 79, in duplicate
    return _winapi.DuplicateHandle(
PermissionError: [WinError 5] Access is denied
@ShadowJonathan ShadowJonathan added bug Something isn't working not qualified The issue is not checked yet by the owners labels Jun 29, 2020
@dpineiden
Copy link
Contributor

Can you check with a simple script and multiprocessing if works fine. I guess in windows you have to change the multiprocessing method, maybe to 'spawn', check the documentation module.
Your error is related to a way in what you are translating an object NOT PICKEABLE (not serializable) to other process or thread.
So, try this:

Initialize the process then create the object and connection, not before

@gabor-boros
Copy link
Member

Ping @ShadowJonathan

@ShadowJonathan
Copy link
Author

This isn't the actual python library driver, these are CLI commands, which honestly I expect to work regardless of platform

Yes, I'd wager that manually hacking the boot process of such a system would fix it, but this issue is specific to fixing the source code to work with windows as well, not monkey patch it

@htbrown
Copy link

htbrown commented Oct 9, 2020

Any updates on this? Having the same problem on macOS. RethinkDB v2.4.0

If not, anyone know any other way to import JSON files?

@gabor-boros
Copy link
Member

@htbrown to me it is weird that only happens on windows and the stack trace show the error is originated from the builtin multiprocessing lib.

Since I cannot test this on windows, I'd ask some more details from you:

What happens in case you execute the import script manually and not through the rethinkdb wrapper? (By that I mean find the python file and call that, not the "rethinkdb import" command)

@htbrown
Copy link

htbrown commented Oct 13, 2020

To clarify - I was having the same issues with macOS. I haven't tried it on my Windows box.

If you'd like me to, I can later. Give me a while.

@ShadowJonathan
Copy link
Author

to me it is weird that only happens on windows and the stack trace show the error is originated from the builtin multiprocessing lib.

@gabor-boros if you look closely at the top stack trace, this is caused by pickling _thread._local, after which a pickling engine (for "sending" values across) comes across it, and throws an exception. This is caused by thinks set in place both for that pickling engine, and whatever values are dropped to other threads.

(iirc (but i am not sure on this), when i originally was investigating this, i could remember some kind of multiprocess jank in there that was causing this, and would only fly nicely under linux/unix, but not under windows, i'm not sure of that, as i could be misremembering)

@htbrown
Copy link

htbrown commented Oct 22, 2020

If anyone needs it, I've made my own RethinkDB importer in Node. Got fed up with faffing around with the built in Python one. https://github.com/htbrown/rethinkdb-import

No documentation yet so if you need help submit an issue.

@qualitymanifest
Copy link

qualitymanifest commented Nov 5, 2020

I am also having this issue on a mac, while trying to do an export. Rethinkdb 2.4.0 with Python 3.8.2, database running in a Docker container

Update: So I had this issue while following the official docs using venv and Python 3. Just now tried it outside of venv, using Python 2, and it worked.

@htbrown
Copy link

htbrown commented Nov 6, 2020

Right so there's even something wrong with exporting. Hmm.

@daprieto1
Copy link

I'm also having the same problem in mac os
Python 3.9.0
rethinkdb 2.4.0

@GeoffreyPlitt
Copy link

GeoffreyPlitt commented Feb 8, 2021

I'm getting these errors as well. I'm on Mac Mini M1. I'm not a python expert, so I can't tell if rethinkdb is using python2 or python3, but when I type "python" it seems to use python2. Is that good? Bad? I have both on my system.

giro@geoffs-mac-mini:~/rethinkdb-import$python --version
Python 2.7.16
giro@geoffs-mac-mini:~/rethinkdb-import$python3 --version
Python 3.9.1
giro@geoffs-mac-mini:~/rethinkdb-import$rethinkdb --version
rethinkdb 2.4.1 (CLANG 12.0.0 (clang-1200.0.32.28))

Is there a non-python simple binary I can use for import/export instead?

@GeoffreyPlitt
Copy link

Or a way to disable multiprocessing?

@ShadowJonathan
Copy link
Author

I'm not a python expert, so I can't tell if rethinkdb is using python2 or python3, but when I type "python" it seems to use python2. Is that good? Bad? I have both on my system.

I suggest removing or avoiding using python 2, it's been deprecated for a while.

@htbrown
Copy link

htbrown commented Feb 8, 2021

I tried to remove Python 2 on my Mac before and seem to remember finding it incredibly difficult for a while and then just giving up because it was more effort than it was worth. Do they still package it all in with the Python 3 installer?

@ShadowJonathan
Copy link
Author

remove only if possible, sometimes python 2 is deeply embedded for system stuff (and people dont care enough to update it to 3), but try to find ways to make python 3 the default for your usages

@htbrown
Copy link

htbrown commented Feb 8, 2021

remove only if possible, sometimes python 2 is deeply embedded for system stuff (and people dont care enough to update it to 3), but try to find ways to make python 3 the default for your usages

Yeah I think portions of macOS use it. I'm just trying to avoid it as much as possible.

@jtwebb
Copy link

jtwebb commented Feb 13, 2021

On Mac 11.2.1 the dump command is doing the same thing.

rethinkdb dump -c my-host-name

cannot pickle '_thread._local' object
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/rethinkdb/_dump.py", line 200, in main
    _export.run(options)
  File "/usr/local/lib/python3.9/site-packages/rethinkdb/_export.py", line 641, in run
    run_clients(options, working_dir, db_table_set)
  File "/usr/local/lib/python3.9/site-packages/rethinkdb/_export.py", line 526, in run_clients
    new_process.start()
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/local/Cellar/[email protected]/3.9.1_8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle '_thread._local' object
Error: export failed, cannot pickle '_thread._local' object

@GeoffreyPlitt
Copy link

I have created this tool as a workaround (using NodeJS):
https://github.com/GeoffreyPlitt/rethinkdb-import

@jtwebb
Copy link

jtwebb commented Feb 13, 2021

If it's any help, it looks like there was an issue with redis-py. It looks like the parent process might not be serializable. Another similar issue was fixed by serving up a different connection.

@Dav2015
Copy link

Dav2015 commented May 14, 2021

If anyone needs it, I've made my own RethinkDB importer in Node. Got fed up with faffing around with the built in Python one. https://github.com/htbrown/rethinkdb-import

No documentation yet so if you need help submit an issue.

You saved my day. Thanks

@red-scorp
Copy link

PermissionError: [WinError 5] Access is denied

@ShadowJonathan Did you closed Kaspersky Antivitus when you started your python script?
Not suppressed for n minutes, but completely closed it. It helped in my case... a bit...

@ShadowJonathan
Copy link
Author

I didn't have any antivirus on when I tried this script, but that's not the point.

@lkovesdi
Copy link

lkovesdi commented Oct 21, 2021

any update on this? Same issue on dump (mac, rethinkdb 2.4, python3.9)

@lkovesdi
Copy link

ok.. so downgrading to Python 3.7.9 works. If anyone still needs it!

@jwr
Copy link

jwr commented Dec 15, 2021

@lkovesdi Thanks for posting this. I wasted some time on this problem, then found this suggestion, and indeed downgrading my python (via homebrew) let me work around the issue. At least temporarily, because there is no escaping newer Python versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working not qualified The issue is not checked yet by the owners
Projects
None yet
Development

Successfully merging a pull request may close this issue.