Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sql #22

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Sql #22

wants to merge 11 commits into from

Conversation

Saqibm128
Copy link
Contributor

adds SQL support to project as per #16 and explicitly declare code sections in README.md and dependencies in requirements.txt

@turambar
Copy link
Collaborator

Hey @Saqibm128 what's the status of this? Are you waiting on us to merge?

@turambar turambar self-assigned this Jan 11, 2018
@turambar turambar mentioned this pull request Jan 11, 2018
@Saqibm128
Copy link
Contributor Author

Yes, I changed some of the methods to read data tables to take an optional parameter called use_db, and added a new argument to the arg parser to use-db

@turambar turambar self-requested a review January 16, 2018 23:39
@turambar
Copy link
Collaborator

Looking at this PR this week. Going to try to include in the release.

@KirkHadley
Copy link

@Saqibm128 How much faster is this?
@turambar What's the state of things in terms of getting this merged in? More than happy to lend a hand here.

@Saqibm128
Copy link
Contributor Author

Faster? Do you mean execution speed? If the SQL server is on the same computer, then it goes about as fast. If it is on a separate computer and the network host of the SQL isn't the localhost, then I guess it would go much slower due to network lag.

@Saqibm128
Copy link
Contributor Author

I haven't actually looked at this project in a while, but if execution speed for setting up the directory structure is becoming a huge bottleneck, I might want to look into using parallel threads/processes.

@KirkHadley
Copy link

Thanks for getting back so quick. And yeah I didn't really look at the code before asking and naively assumed you'd taken the joins logic and rewritten that in SQL. My guess would be that, ceteris paribus, a series of group by + join queries would be faster than the current method.

But speaking of parallelism, I actually have cobbled together a rather hackish parallel extract_subjects and extract_episodes, would cleaning it up and writing a PR be worthwhile or have folks mostly moved on from actively maintaining things here?

@Saqibm128
Copy link
Contributor Author

Saqibm128 commented Aug 26, 2018

To be honest, I am not really involved in this project, added the SQL stuff well after they had started, and haven't been paying attention to this for awhile since, so I'm not sure at all.

Using joins actually makes a lot of sense, I don't think I was thinking at all about it, surprisingly. If you have a hacky parallel script, go ahead and share! I don't know if it would be accepted but others may still find any code of any quality useful.

@Saqibm128
Copy link
Contributor Author

Oh also just remembered they have a gitter chat room. https://gitter.im/YerevaNN/mimic3-benchmarks

@turambar
Copy link
Collaborator

@Harhro94 Now that the 1.0 release is done, should one of us should review this and then make the merge?

@Saqibm128
Copy link
Contributor Author

@turambar is there still value in this? should I try to fix merge conflicts, or should I close the pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants