Improvements on dependencies calculations #172

Remi-Gau · 2021-02-23T05:57:20Z

Metadata treatment

Split get_metadata into two functions:

metalist = get_meta_list(filename, pattern) returns list of files satisfying pattern with inheritance principle
get_metadata(metafile) construct meta structure from json files in passed meta-files list (from above function)

Each data file in dataset receives new field metafile that contains list of files from get_meta_list.

This will facilitate access to metafiles directly. Individual values can be extracted by user if needed using get_metadata.

Dependencies

Each file structure contains dependencies sub-structure with guaranteed fields:

explicit -- list of data files containing "IntendedFor" referencing current file.
data -- list of files with same name but different extension. This will combine files that are split in header and data (like in Brainvision), also takes care of bval/bvec files
group -- list of files that have same name except extension and suffix. This will group file that logically need each other, like functional mri and events tabular file. It also takes care of fmap magnitude1/2 and phasediff

Additional custom subfields can be added for given modalities.

This allow to treat dwi and fmap with generic parse_using_schema. perf likely also, but I can't certify it.

Issues

performance. I feel that retrieval of dependencies can be optimised
Tests -- was unable to install moxunit
Tests on other datasets

codecov · 2021-02-27T22:50:41Z

Codecov Report

Merging #172 (d332496) into beliy-derivate (ac3bb7f) will not change coverage.
The diff coverage is 0.00%.

❗ Current head d332496 differs from pull request most recent head 6bfd46f. Consider uploading reports for the commit 6bfd46f to get more accurate results

@@              Coverage Diff               @@
##           beliy-derivate    #172   +/-   ##
==============================================
  Coverage            0.00%   0.00%           
==============================================
  Files                  25      29    +4     
  Lines                 920     951   +31     
==============================================
- Misses                920     951   +31

Flag	Coverage Δ
unittests	`0.00% <0.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
+bids/+internal/append_to_layout.m	`0.00% <0.00%> (ø)`
+bids/+internal/ends_with.m	`0.00% <0.00%> (ø)`
+bids/+internal/get_meta_list.m	`0.00% <0.00%> (ø)`
+bids/+internal/get_metadata.m	`0.00% <0.00%> (ø)`
+bids/+internal/starts_with.m	`0.00% <0.00%> (ø)`
+bids/layout.m	`0.00% <0.00%> (ø)`
+bids/query.m	`0.00% <0.00%> (ø)`
+bids/derivate.m	`0.00% <0.00%> (ø)`
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac3bb7f...6bfd46f. Read the comment docs.

Remi-Gau · 2021-02-28T11:45:59Z

This should:

For qMRI some files in the derivatives folder also have an intendedFor and they would be missed by the current code.

I suggest we finish this PR as for now and then tackle #146 that concerns the indexing of nested derivative folders before we come back to finish fixing #157 completely.

Remi-Gau · 2021-02-28T12:07:19Z

* [ ]  performance. I feel that retrieval of dependencies can be optimised

I suspect that having to load and read the metadata of each file takes a lot of time.

I suspect that this can indeed be optimized but I would worry about that later.

Remi-Gau · 2021-02-28T12:16:29Z

@nbeliy

I pretty much took your code as is, fixed the test that were not passing and did a bit of refactoring and commenting.

I also:

did a soft reset to ditch the derivatives function from this PR so we can tackle separately in your other PR
cherry picked a commit that you had on your copy derivatives branch to update the query function

The code is pretty much the same though: thanks for providing all this.

Let me know what you think.

Note: I feel that the current way of indexing all the dependencies creates a lot of "redundancies" in the layout (example: the phase image of fieldmap will list all the associated imagesin dependencies.group like magnitude and the other way around).

But I think that we can also optimize this later, if needed as this way of indexing things makes fewer assumptions about what file is / should be the "main" one for a given group.

nbeliy · 2021-03-01T08:26:31Z

For qMRI some files in the derivatives folder also have an intendedFor and they would be missed by the current code.

It will still work if IntendedFor in derivatives reference file in itself, not in the original/main dataset. If it's not, we can imagine a search for files in main dataset at condition that main dataset is referenced in dataset_description.json. But then, it will be difficult to implement, considering that main dataset may be defined as DOI or URL.

I suspect that having to load and read the metadata of each file takes a lot of time.

Conserving full metadata for each file can explode memory usage. From inside we need only IntendedFor, and for user a query with 'target', {'meta1', 'meta2'} can be imagined to retrieve several parameters with one go.

Note: I feel that the current way of indexing all the dependencies creates a lot of "redundancies" in the layout (example: the phase image of fieldmap will list all the associated imagesin dependencies.group like magnitude and the other way around).

Sometimes the main file is not well defined -- in phasediff and 2 magnitude, which one is main? In order to simplify we can exclude .tsv files from looking for group dependencies.

+bids/+internal/ends_with.m

+bids/layout.m

+bids/query.m

nbeliy

Looks good for mee, more abstract and generic.

As for optimisation, the list of candidates for metadata (considering only json) is same for each call. Probably it can be saved somewhere.

If (to be confirmed) data files are scanned in alphabetical order, lokking for group dependencies can be limited in file's neighbours.

Also, we probably need to move get_metadata from internal to util. This function can be usefull outside the layout and query

Remi-Gau · 2021-03-01T10:57:31Z

Will try to adress some of your questions. Before merging.

Totally agree with getting get_metadata out of internal, was suggested before and I have been dragging my feet to do it. 🤷‍♂️

Was thinking something along the same line for optimization: will open a separate issue to keep track of that.

This was referenced Feb 23, 2021

Improvements on dependencies calculations #163

Closed

Issues with bids.internal.get_metadata for derivatives #175

Closed

Remi-Gau added 2 commits February 27, 2021 21:05

refactor get_metadata

8fc3759

reorganize dependencies

c5655eb

Remi-Gau force-pushed the beliy-dependencies branch from e68a18c to c5655eb Compare February 27, 2021 20:07

Beliy Nikita and others added 3 commits February 27, 2021 21:25

adapted query for metadata

ea8f072

fix test and lint

04c86e1

remove dead code

97bb20d

Remi-Gau added 2 commits February 28, 2021 09:37

minor fixes on get_metadata

91f5064

rename functions to use snake_case

9d37614

Remi-Gau added 3 commits February 28, 2021 12:46

refactor dependency indexing

ea6bb9b

get rid of parse_perf

2ca4496

remove commented out dead code

ffda8dc

Remi-Gau requested a review from nbeliy February 28, 2021 11:52

add comments

db812da

Remi-Gau requested a review from gllmflndn February 28, 2021 12:07

nbeliy reviewed Mar 1, 2021

View reviewed changes

+bids/+internal/ends_with.m Show resolved Hide resolved

nbeliy reviewed Mar 1, 2021

View reviewed changes

+bids/layout.m Show resolved Hide resolved

nbeliy reviewed Mar 1, 2021

View reviewed changes

+bids/layout.m Outdated Show resolved Hide resolved

nbeliy reviewed Mar 1, 2021

View reviewed changes

+bids/query.m Outdated Show resolved Hide resolved

nbeliy approved these changes Mar 1, 2021

View reviewed changes

This was linked to issues Apr 16, 2021

generalize usage of intended_for field where applicable and dependency sub-structure #143

Closed

Dealing with intendedFor and "file dependencies" #130

Closed

remove extra IF block

53f6156

Remi-Gau added 2 commits April 16, 2021 16:34

ASL jpg file handled like other dependency files

d332496

update ASL query tests to make sure context file dependency is indexed

6bfd46f

Remi-Gau mentioned this pull request Apr 16, 2021

Let ends_with accept a series of pattern to check #199

Open

Remi-Gau merged commit 61fd910 into beliy-derivate Apr 16, 2021

Remi-Gau deleted the beliy-dependencies branch April 16, 2021 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements on dependencies calculations #172

Improvements on dependencies calculations #172

Remi-Gau commented Feb 23, 2021 •

edited

Loading

codecov bot commented Feb 27, 2021 •

edited

Loading

Remi-Gau commented Feb 28, 2021

Remi-Gau commented Feb 28, 2021

Remi-Gau commented Feb 28, 2021

nbeliy commented Mar 1, 2021

nbeliy left a comment

Remi-Gau commented Mar 1, 2021

Improvements on dependencies calculations #172

Improvements on dependencies calculations #172

Conversation

Remi-Gau commented Feb 23, 2021 • edited Loading

Metadata treatment

Dependencies

Issues

codecov bot commented Feb 27, 2021 • edited Loading

Codecov Report

Remi-Gau commented Feb 28, 2021

Remi-Gau commented Feb 28, 2021

Remi-Gau commented Feb 28, 2021

nbeliy commented Mar 1, 2021

nbeliy left a comment

Choose a reason for hiding this comment

Remi-Gau commented Mar 1, 2021

Remi-Gau commented Feb 23, 2021 •

edited

Loading

codecov bot commented Feb 27, 2021 •

edited

Loading