Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimum directory structure for projects with multiply datatypes (e.g. ephys, behaviour, cameras) #12

Closed
JoeZiminski opened this issue Nov 7, 2022 · 2 comments

Comments

@JoeZiminski
Copy link

JoeZiminski commented Nov 7, 2022

Hi Everyone,

I am working with the neuroinformatics unit at the Sainsbury Wellcome Centre (London) to build standardised ephys analysis pipelines. We are very interested in standardized project structuring and really appreciate the work you are doing on this BEP.

We are currently thinking about the best way to handle multiple data types within a project folder organisation, very similar to your discussion in #4. For example, for a project including a single ephys session we might have:

project
  ephys
    sub-001
      ses-001
        sub-001_ses-001_task-x_ephys.nwb
  

Often, researchers will also have many behavioural sessions, including training sessions (with no ephys) and test sessions (with simultaneous ephys).

As such, we may create a second data-type folder and fill it with training sessions:

.
└── project/
    ├── ephys/
    │   └── sub-001/
    │       └── ses-001/
    │           └── sub-001_ses-001_task-x_ephys.nwb
    └── behav/
        └── sub-001/
            ├── ses-001_train/
            │   ├── camera/
            │   │   └── video.mp4
            │   └── responses/
            │       └── responses.csv
            ├── ses-002_train/
            │   └── ...
            └── ses-003_train/
                └── ...

However, it is not immediately clear the best place to put the behaviour for the ephys session. It is cleanest to place it in the behav folder (e.g. ses-004_train), and then include metadata linking it to the appropriate ephys session (and vice versa). Potential problems with this is a) it creates additional overhead for researchers to input metadata information, often when busy setting up the experimental session b) it requires additional overhead for researchers to link together their data during analysis (e.g. behav session 4 belongs with ephys session 1).

Alternatively, the behaviour folder could be placed in the ephys ses-001 folder (behav/...). This has the benefit of linking the data by location and is quite intuitive and avoids linking disparate session names (i.e. behaviour for ephys session 1 is always in the ephys session 1 folder). The downside is that it is confusing what behav-session means (i.e. it might be necessary to write an empty behav/sub-001/ses-003_test/ folder in behav that links to the ephys folder, to avoid duplicate session naming.

Finally, it is nice to store all data types under the subject / session directory, e.g.:

.
└── project/
    └── sub-001/
        ├── ses-001/
        │   ├── ephys
        │   └── behav
        ├── ses-002/
        │   └── behav
        ├── ses-003/
        │   └── behav
        └── ses-004/
            ├── ephys
            └── behav

This is probably the nicest and most intuitive overall structure and is as described in BIDS for neuroimaging. However, it mixes the data types so is a bit of an issue for researchers who do not have much coding experience, which is more common outside of neuroimaging (e.g. it is not possible to drag and drop all ephys sessions at once, for example). It also means it can be difficult to find the session you are looking for, in the case you have many behavioural training sessions interspersed with a few ephys test sessions (although this could be ameliorated by session naming e.g. ses-001_train, ses-002_test etc... for this is mixing the data-types can become confusing if you have many sessions that include various data types.

I was wondering if these issues have come up for you and what you think the best approach is in this case.

@robertoostenveld
Copy link

Hi Joe,

your latter structure indeed aligns the best with the existing BIDS for standard for non-ephys measurements, and also aligns with the BEP032 extension proposal for ephys.

Multiple data organizations are in principle technically equivalent; if you take a large enough group of people it is likely that they will have different preferences for organizing their data and argue for one over the other as in https://xkcd.com/927/. It is important to convey the (short and long term) value of adopting a standard that is shared with others internally and externally (and I think BIDS is a good one), to support people with local documentation, and to help them with tooling. One important aspect with tooling is that it is not only about software and libraries (like pybids), but also how you organize your data, whether "drag and drop all ephys sessions" is a valid operation or would result in wasteful data duplication, how the local network drives work, access permissions, which parts are read-only and which read-write, etc.

I hope this short reflection helps.
Robert

@JoeZiminski
Copy link
Author

Hi Robert, thanks a lot for the response, that has been very useful and helped shape our approach. We will proceed with the latter structure that aligns with BIDS, feedback from some of our researchers also indicate they are already using similar / would have no trouble switching to that directory organisation.

Thanks for the insights these are good to keep in mind, as is https://xkcd.com/927/. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants