Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NeuralynxIO: partial support for NVT files #1480

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

LEGion-42
Copy link

@LEGion-42 LEGion-42 commented May 26, 2024

Type of Change

  • New feature
  • Bug fix

Description

Added partial support for Neuralynx NVT files (#1463). for NeuralynxIO and NeuralynxRawIO, the x and y pixel coordinates and animal head angle from the nvt file are treated as dimensionless analog signals bundled into a signal stream separate from the NCS stream. NlxHeader can now read NVT files as well. For detailed output please refer to the Testing section below.

A buggy regex in NlxHeader was also fixed. Previously it can only read a single value after the header keys, now it can read multiple.

Limitations

  • Only supports loading a single NVT file (or a directory with a single NVT file in it) at a time.
  • Only loads the dnextracted_x, dnextracted_y and dnextracted_angle data fields from NVT files. Other fields that could be potentially useful (dwPoints and dntargets) are not yet supported.
  • The NVT is assumed to be in the same segment (sharing a common clock (time basis)) as the NCS files in the same directory.

As a temporary solution to the potential problems these limitations might cause, I added an optional ignore_nvt argument to the NeuralynxIO and NeuralynxRawIO classes. When enabled, NeuralynxIO and NeuralynxRawIO will behave like how they were before my changes. NlxHeader works fine without any problems during testing.

Testing

Code was tested in accordance with the official contribution guideline. Additionally, it was tested on my own lab's data. Listed below are some examples.

reader = NeuralynxIO(dir)
print(reader)
nb_block: 1
nb_segment:  [1]
signal_streams: [Stream (rate,#packet,t0): (1600.0, 7537, 16166107179) (chans: 13), Stream (rate,#frame,t0): (29.97, 72286, 16165954061) (chans: 3)]
signal_channels: [CSC2_2, CSC3_2, CSC4_2, CSC5_2 ... CSC1_2 , VT1 , VT1 , VT1]
spike_channels: [chTT1_1#0#0, chTT1_1#1#0, chTT1_1#2#0, chTT1_1#3#0 ... chTT4_1#15#8 , chTT4_1#15#9 , chTT4_1#15#10 , chTT4_1#15#13]
event_channels: [Events event_id=4 ttl=0]

Each of the VT1 signal channel represents a data field (x, y pixel position and head angle).

print(reader.read_block().segments[0].analogsignals[1])
[[428. 349.  45.]
 [431. 345.  28.]
 [428. 349.  46.]
 ...
 [436. 334.  87.]
 [435. 334.  88.]
 [436. 334.  87.]] dimensionless

First column is x posiiton, second y position and thrid head angle in degrees.

print(reader.read_block().segments[0].analogsignals[0].array_annotations["Resolution"][0])
[720 480]

Video resolution from the NVT file header is passed into array_annotations for easy access. However because list type in array_annotations is buggy, it is passed as a literal string ('[720 480]') and not a list. Though once it is accessed it can be easily converted into a list using ast.literal_eval() in the ast package.

Additional Information

I should be able to make semi-regular contributions to this project. I don't have a good solution in mind to include the dwPoints and dntargets data fields as of now, but I should be able to fix the problem with loading multiple NVT files and implement proper segment assignment in the near future.

I opened this pull request dispite the above mentioned limitations because I think in most cases (at least in my lab and other labs I know of) there are only one NVT file in a given directory, and the files often share the same time basis (since they are likely from the same experiment session). So this current version should be helpful enough in most scenarios. With the ignore_nvt argument, users will also be able to revert back the code's behavior. So even if the new code causes problems, they could be easily avoided.

LEGion-42 and others added 20 commits April 29, 2024 00:42
…iple values

feat: add header support for Neuralynx nvt files
Only loading dnextracted_x, dnextracted_y and dnextracted_angle as 3 dimensionless signal channels.

Only tested on directories with single nvt file.
…iple values

feat: add header support for Neuralynx nvt files
Only loading dnextracted_x, dnextracted_y and dnextracted_angle as 3 dimensionless signal channels.

Only tested on directories with single nvt file.
@LEGion-42 LEGion-42 changed the title Add support for Neuralynx NVT files NeuralynxIO: support for NVT files May 26, 2024
@LEGion-42 LEGion-42 changed the title NeuralynxIO: support for NVT files NeuralynxIO: partial support for NVT files May 26, 2024
@zm711
Copy link
Contributor

zm711 commented May 27, 2024

Thanks @LEGion-42, we will review soon. :)

EDIT: See also that tests are currently failing.

@LEGion-42
Copy link
Author

pytest somehow skipped all the necessary neuralynxrawio tests for me so I didn't catch these before submitting the PR. Sorry for the oversight. :(

The 4 AssertionError's are my code working as intended and does not cause problems -- the test simply did not consider the fact that I'm passing behavior data as signals. The header size of 4 is just the original 1 from an ncs file + 3 new ones from the nvt file. I have now updated the test with proper values.

The KeyError on the other hand raises some questions. neuralynx/Cheetah_V5.6.3 is the only directory with an nvt file in the testing dataset, and according to the timestamps, the time range of the nvt file is entirely contained in the first segment. However, while reading the data, NeuralynxRawIO is still trying to read nvt signal channels in the second segment despite it not being there. It seems like we are assuming that every signal is present in every segment? Or am I missing something?

I implemented a temporary fix by duplicating the nvt data to every segment to gether with a warning message telling users that nvt file should be loaded independently if there are multiple segments. But I think a proper fix should be allowing different segments have different channels?

@zm711
Copy link
Contributor

zm711 commented May 28, 2024

That is likely what we would do. I guess I don't know NVT/Neuralynx well enough. Segments for neuralynx can be due to gaps in the signal. So my questions would be 1) are there not gaps in the nvt that we need to account for? 2) can you have behavioral input of some sort for one segment realistically and not for others? You wouldn't happen to have some easy to access docs for the neuralynx format would you?

@LEGion-42
Copy link
Author

LEGion-42 commented May 31, 2024

@PeterNSteinmetz
I'll have a talk with the PI tomorrow and see if I can share some of our data. However as I mentioned before our data may be too clean to be meaningful for testing. There is no segmentation or noticeable gap in timestamps in any of our data.

As for the sampling rate, since nvt files are video tracking data, the fps is typically around 30.

@PeterNSteinmetz
Copy link
Contributor

@LEGion-42 . Thanks that sounds good. The test files are really just to test that headers are being read properly and records interpreted correctly. Normally we trim them down to a smaller number of records and use a small number of them. If there are no unusual features like gaps in the files, that is fine. The files can be anonymized if needed to remove any information, such as time/date, which might be identifiable to an individual person.

I agree the video tracking data should be in the same segment as per this design. Given the separate sampling frequency it should be in a separate stream.

@LEGion-42
Copy link
Author

@PeterNSteinmetz I got permission to share our nvt data. However our nvt files have size ranging from 30 to 100mb, which are all outside of the 10mb file size requirement on gin and I have no idea how to trim them down. Any suggestions?

@PeterNSteinmetz
Copy link
Contributor

PeterNSteinmetz commented Jun 7, 2024

@LEGion-42 Great. We don’t need an entire recording set, just perhaps 3-6 files. Since the header parsing is of primary interest, please trim to just header plus 10-20 records each.

If you are on MacOS or another Unix system you can use the ‘truncate’ command to shorten them. I usually add a ‘_trunc’ suffix to the file name before the extension to indicate this has been done.

You can compute the target size as 16384 bytes + number of records x record size. The record sizes for the various file types are available in the document at https://neuralynx.com/_software/NeuralynxDataFileFormats.pdf

if you have the ephy-testing data checked out, please create an issue at the gin site. Then add your files to an appropriately named sub folder of Neuralynx.

@alejoe91 alejoe91 added this to the 0.14.0 milestone Jun 7, 2024
@PeterNSteinmetz
Copy link
Contributor

@zm711 Zach, what do we need to do to get some of the pull requests for corresponding test data merged over there?

@zm711
Copy link
Contributor

zm711 commented Jun 12, 2024

Hey Peter. Are you talking about from the beginning? We have instructions on our repo how to make a PR on gin. If you're talking about merging test data just @ one of us and we can review it and merge it. We like to check to make sure it's not too big and that everything looks to be in order before we merge it. :)

@PeterNSteinmetz
Copy link
Contributor

Hi Zach, well there is this one https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/116 . That provides test data from some code that can be used to deal with some of the header issues in this pull request.

@PeterNSteinmetz
Copy link
Contributor

@LEGion-42 If you would like some further assistance with getting the test files ready, just let me know.

@LEGion-42
Copy link
Author

@PeterNSteinmetz Sorry for no being responsive, I was quite busy the past week. I'll try to upload the nvt data today.

@LEGion-42
Copy link
Author

@zm711 @PeterNSteinmetz Here are the the nvt files. gin wouldnt let me pass the their captcha for some reason (maybe im a bot?). So can you guys upload them? Sorry for the trouble.
nvts.zip

@PeterNSteinmetz
Copy link
Contributor

PeterNSteinmetz commented Jun 18, 2024 via email

@PeterNSteinmetz
Copy link
Contributor

@LEGion-42 These downloaded fine for me. I will try and structure the folder, write some tests, and start a pull request in the next few days.
@zm711 We are still waiting on approval of the prior pull request for the ephy testing data https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/116 and will then need examination of the new pull request for this. This has been a very slow point historically.

@zm711
Copy link
Contributor

zm711 commented Jul 1, 2024

@PeterNSteinmetz, it is merged. I would ping us here on github in your PR (probably a couple times) so we review and merge it. Thanks for all your help on these PRs!

@zm711
Copy link
Contributor

zm711 commented Jul 1, 2024

Actually @PeterNSteinmetz, that was a broken file. I had to overwrite and erase your PR (gin can be annoying). You submitted a stub on your branch rather than a file, but since I had merged it I had to delete it.

@PeterNSteinmetz
Copy link
Contributor

Thanks Zach. @zm711 I am going to move discussion of that dataset over to its pull request at https//gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/116 .

@zm711
Copy link
Contributor

zm711 commented Jul 24, 2024

@LEGion-42,

We are still working on getting your test file onto g-node/gin. So that is in the works. There was another PR for neuralynx where we were working on simplifying the parameters moving forward. That has now been merged which has caused conflicts in your PR. Would you like to fix those conflicts while we working on adding the test files.

@LEGion-42
Copy link
Author

@zm711 I'll take a look on the weekend. Thanks for letting me know.

@PeterNSteinmetz
Copy link
Contributor

@LEGion-42 I am about to prepare the push of the shortened .nvt files you provided. But I am wondering if a directory structure where all 5 .nvt files are in one directory would be good? This would allow testing of providing a directory to the constructor, which should then produce 5 channels, versus individual file names which should produce one.

Would such a structure provide any difficulties for your code or tests?

@LEGion-42
Copy link
Author

@PeterNSteinmetz currently the code only supports 1 nvt files per directory. Multiple nvt files won't crash the code but will cause problems.

@PeterNSteinmetz
Copy link
Contributor

@LEGion-42 OK, thanks, good to know. I will move them to the separate directories then before committing them.

@PeterNSteinmetz
Copy link
Contributor

@LEGion-42 How would you like the Readme.txt file to read? Normally we state where the files are from and who created them.

@PeterNSteinmetz
Copy link
Contributor

@LEGion-42 These files have been added to a pull request in the test data https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/141. Once incorporated there, it should be possible to add some tests to the code to support this addition to the functionality. Thanks for your work on this.

@zm711
Copy link
Contributor

zm711 commented Sep 23, 2024

@LEGion-42,

just one quick confirmation: these files are either a dummy recording or simulated data right? We just don't want private files on the public testing repo (for example in the case of human recording/medical data). Could you confirm how this data was obtained and then we will merge the testing data!

@LEGion-42
Copy link
Author

@zm711 Data was collected from mice and was explicitly allowed for release by my supervisor so don't worry!

@zm711
Copy link
Contributor

zm711 commented Sep 23, 2024

Thanks so much for confirming. The testing data has now been merged into our repo! So test away :)

@PeterNSteinmetz
Copy link
Contributor

@LEGion-42 I was just working a bit on the header parsing and I noticed something about about the headers in these .nvt files.

They do not contain a first line '######## Neuralynx Data File Header' . This is the only case we have seen where this is missing. That brings up some questions:

Are these produced by a Neuralynx recording system? If so, which one and versions of the software. The header does say CheetahRev 5.6.0. But this just seems quite different. It makes me wonder if there is some configuration difference there on that system what was not intended.

Implementing this will require us to either drop the requirement for that first line as a check generally or perhaps only omit it if the FileType is Video.

@LEGion-42
Copy link
Author

LEGion-42 commented Sep 24, 2024

@PeterNSteinmetz

I just checked the nvt files (both on my end and the pull request) and the headers including the first lines and the CheetahRev's are all there? Can you please recheck?

https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/141/files here you should see that they all have properly formatted headers

@PeterNSteinmetz
Copy link
Contributor

@LEGion-42 You know, you are right. I don't know what I got confused. Sorry for the false alarm.

Glad we have the correct files for testing in the ephy_testing_data now.

@LEGion-42
Copy link
Author

@PeterNSteinmetz

:D don't worry about it. Glad that the confusion was cleared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants