Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CsvEpochSource and EpochEncoder enhancements #13

Merged
merged 35 commits into from
May 16, 2019

Conversation

jpgill86
Copy link
Collaborator

CsvEpochSource was originally provided as a simple example of how to create a subclass of WritableEpochSource. However, it seems like it will be useful to many users of the epoch encoder, so I think it would be appropriate to migrate it into the main code.

Since Excel assumes commas are the delimiter when the file extension is CSV, the use of tabs as the delimiter in the epoch encoder example results in output files that are read incorrectly by Excel.  This could be resolved by either changing the file extension to TSV or by using commas with the extension CSV. This change does the latter.
CsvEpochSource was originally implemented for the purpose of demonstrating how to create a subclass of WritableEpochSource. However, if `'name': 'animal_state'` is removed, it's fully general and could be very useful to anyone interested in saving epochs using CSVs. Consequently, it's a great candidate for adoption into the main code.

This change migrates the implementation of CsvEpochSource to the main code so that users do not need to implement this subclass themselves. They are still invited to create alternative implementations of subclasses of WritableEpochSource that use different file types. Dependence of ephyviewer on pandas is avoided by requiring pandas only when CsvEpochSource is instantiated.
Accessible via inherited method get_channel_name().
@jpgill86
Copy link
Collaborator Author

I also propose making WritableEpochSource into an abstract class using abc. The method save should be marked as an abstract method that must be implemented by subclasses. A new method load could also be declared as an abstract method. I'm happy to take this on if this sounds reasonable to you, @samuelgarcia.

This change allows epoch=None when creating a WritableEpochSource object. In this case, the new load() method is called to build an empty dictionary containing the appropriate keys and data types.

Like the save() method, load() can be overridden in subclasses of WritableEpochSource to load epoch data from arbitrary sources. CsvEpochSource implements an example of this.
@jpgill86
Copy link
Collaborator Author

jpgill86 commented Aug 13, 2018

I think I came up with an alternative idea that is fully backwards compatible.

Changing WritableEpochSource into an abstract class using the abc module would disallow the class to be instantiated. This would require old code that uses it to be changed, such as the existing tests in test_epochencoder.py.

Instead, I made the constructor parameter epoch optional. If None is passed, a new load() method is called to create an empty dictionary for the epoch. If the load() method is overridden in subclasses derived from WritableEpochSource, data can be loaded from arbitrary sources. This is how CsvEpochSource.load() gets called.

(1) `WritableEpochSource._clean_and_set` now discards epochs with duration shorter than 1 microsecond. (2) `CsvEpochSource.save` now rounds time and duration to the nearest microsecond before writing to file. (3) `EpochEncoder.refresh_table` now rounds start, stop, and duration to the nearest microsecond before displaying in the data table.

These changes (1) prevent the creation of spurious epochs resulting from floating point arithmetic. They also improve display of floating point numbers in both (2) saved files and (3) the EpochEncoder's data table.
Algorithms in `InMemoryEpochSource.get_chunk_by_time`, `WritableEpochSource.add_epoch`, `WritableEpochSource.delete_in_between`, and `WritableEpochSource.merge_neighbors` implicitly assumed that epochs never overlap. This commit rewrites them to allow for the possibility of overlapping epochs.

Added sorting to `WritableEpochSource._clean_and_set`, since these modifications were simplest when not trying to maintain temporal ordering during operations (i.e., use of `np.append` instead of `insert_item`).

Removed redundant epoch deletion code from `add_epoch`. Added calls to `delete_in_between` before each `add_epoch` call in EpochEncoder so the behavior remains the same.

`WritableEpochSource.fill_blank` was not modified and still implicitly assumes that epochs do not overlap.
Removed assertion in `WritableEpochSource.__init__` that epochs do not overlap. Removed code in `CsvEpochSource.load` for working around inadvertent overlap caused by floating point arithmetic problems.

Added `remove_old_when_inserting_new` boolean parameter to EpochEncoder. When True (default), existing epochs are deleted when new epochs are created using shortcut keys or the region selector (this was the old behavior). When False, existing epochs are not deleted, resulting in overlapping epochs.
What the EpochEncoder does about existing epochs when a new one is created (deletes them or not) is determined by the `remove_old_when_inserting_new` parameter. With this commit, the behavior can be temporarily switched by holding the Shift key when pressing a shortcut key. If `remove_old_when_inserting_new` is True, pressing a shortcut key without the modifier key will delete epochs that overlap with the new epoch. Holding Shift will prevent this deletion and allow overlapping. The logic is inverted if `remove_old_when_inserting_new` is False (i.e., Shift can be used to force deletion).
@jpgill86
Copy link
Collaborator Author

jpgill86 commented Aug 30, 2018

Beginning with e257e14, I've started addressing #16 by developing on top of the changes already proposed in this pull request.

So far, these changes allow epochs to be created in the EpochEncoder that overlap, and functions like Merge Neighbors work regardless of whether epochs overlap. One exception is Fill Blanks. It has not been updated yet, and for the moment it's not clear what the behavior should be. See discussion in #16.

@jpgill86 jpgill86 changed the title Add CsvEpochSource to main code Add CsvEpochSource and EpochEncoder enhancements Aug 30, 2018
@jpgill86 jpgill86 mentioned this pull request Aug 30, 2018
19 tasks
Added new "Epoch insertion mode" button group to EpochEncoder GUI, containing radio buttons for "Mutually exclusive" and "Overlapping". Renamed global option `remove_old_when_inserting_new` to `exclusive_mode`.
Plain text labels for epochs provided in the EpochEncoder's data table are replaced with drop-down menus that allow the user to change the label of an existing epoch.
After using a drop-down menu to change an existing epoch's label and then pressing a shortcut key to insert a new one, the time would jump back to the first epoch, rather than forward one step. This was caused by `on_seek_table` triggering when the table was cleared by `refresh_table`. Interacting with a drop-down menu in the data table before this apparently caused row selection to trigger for the first row in an unexpected way, even if the drop-down menu interacted with was not in the first row.
Added a new feature that makes it easier to find epochs in the EpochEncoder's data table. When an epoch's rectangle is clicked in the plot, the corresponding row in the data table is automatically selected. More specifically, the label drop-down menu is selected, which allows the up and down arrow keys to be used to change the label quickly. Time is also moved to the start of the epoch.
Epochs would fail to delete if they started within the selected region and ended exactly at its right boundary.
In addition to selecting the corresponding row in the data table and changing the time, clicking a rectangle in the EpochEncoder plot will now update the region selection to match the epoch for easier duplication or deletion.
Changed region match feature to double-click instead of single-click since changing region might not always be desired.
self._next_id = 0
for chan in self.all:
chan['id'] = np.arange(self._next_id, self._next_id + len(chan['time']))
self._next_id += len(chan['time'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already ask for it but. I don't get the 'id' goal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ids are needed so that when get_chunk* returns a subset of the epochs, the on_rect_clicked and on_rect_doubleclicked can still uniquely identify the epoch. This allows the appropriate table row to be selected, and allows look-up of the start and end of that rectangle for self.region.setRegion.

See all this post.


assert self.all[0]['time'].dtype.kind=='f'
assert self.all[0]['duration'].dtype.kind=='f'
assert np.all((self.times[:-1]+self.durations[:-1])<=self.times[1:])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be kept and test or not with an optional karg like assert_epoch_not_overlap_at_load=True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we could add that.

@property
def id_to_ind(self):
return dict((id,ind) for ind,id in enumerate(self.ep_ids))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for all this setter with properties but if your final goal is multi channel EpochEncoder this will be useless no.
because each channel will in self.all[0]['label'], self.all[1]['label'], self.all[N]['label'] and so setter will be functions at the end.
Am I missing somethign ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The multi-channel idea I described here did not come to me until these properties were already in place. If we implement true multi-channel, these properties will need to change or be removed. I haven't started working on multi-channel since I wanted to stop making changes before you had a chance to review the existing changes.

keep3 = (ep_times<=t_start) & (ep_times+ep_durations>t_stop) # epochs that span the range
keep = keep1 | keep2 | keep3

return ep_times[keep], ep_durations[keep], ep_labels[keep], ep_ids[keep]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the parent function ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because get_chunk* in WritableEpochSource needed ids, but ids were not needed in InMemoryEpochSource. In fact, I originally put ids in InMemoryEpochSource and just called the parent function in WritableEpochSource as you suggest, but this broke some things, as I described in 6d5987d's commit message

elif ep_times[i]<t1 and (t1<ep_stops[i]<t2):
#~ print 'b'
# if epoch starts and ends inside range, delete it
if ep_times[i]>=t1 and ep_stops[i]<=t2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ep_stops[i]<=t2 became ep_stops[i]<t2.

I don't remember why it was important for me.
The general python approach is than the left limit is included (>=) and the right limit is excluded (<).

pandas apporach with .loc[t1:t2] with float index this your one. (right limit included).

Did you choose that intionally ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was intentional and necessary to fix a bug that existed before I began this PR. Epochs would fail to delete with the region deletion tool if they started within the selected region and ended exactly at its right boundary. None of the if clauses below this would catch that case. This case was probably not noticed before rounding of times to the nearest microsecond was added simply due to small numerical inaccuracy in floating point numbers.

@samuelgarcia
Copy link
Collaborator

Something else. I don't understand exactly the behavior of "split_epoch".
You need to select in the table and then split, isit ?

Coul we have a split on cursor and the epoch would be detected magically ?




class CsvEpochSource(WritableEpochSource):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me.

I will add my ExcelEpochSource here after this PR because it is basically the same except:
df = pd.read_csv(self.filename, index_col=None) >>> df = pd.read_excel(self.filename, index_col=None).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great!

@jpgill86
Copy link
Collaborator Author

Thank you for taking the time to review this!

briefly, what is the main purpose for this ids in epoch ?

When I added the feature to click a rectangle to select that epoch in the table, I needed a way to match rectangles to table rows. Since get_chunk* returns only a subset of the epochs and not the full list, the EpochEncoder doesn't know which epoch a rectangle belongs to. It just gets a short list of epochs that it should plot in the visible time period. These could be near the beginning of the epoch list, or near the end. The EpochEncoder doesn't know! So when it comes time to add signal connections to the rectangles so that, when clicked, the appropriate table row is selected, I needed a unique and accurate identifier for that epoch.

is the "split" when epoch overlap is done carfully ?

I tried very hard to do split carefully! I hope I succeeded.

@jpgill86
Copy link
Collaborator Author

Something else. I don't understand exactly the behavior of "split_epoch".
You need to select in the table and then split, isit ?
Coul we have a split on cursor and the epoch would be detected magically ?

Splitting by clicking with the mouse cursor (perhaps right-clicking, or clicking with a key modifier) is a great idea! Why didn't I think of that? (And it wouldn't use magic, it would need to use ids, haha!) But I think the existing functionality should be retained for high-precision splitting (splitting at exact moments in time).

Yes, in it's current form, you need to select in the table first. I chose to do it this way because I wanted to allow for splitting specific epochs that overlap with others that the user may not want to split. If the time marker (vline) is positioned inside two overlapping epochs and the user wants to split just one of them, automatic detection of the user's intention is not possible. Although splitting all epochs that intersect the time marker might be a desirable behavior, I did not implement it this way since I wanted more control. In it's current form, splitting multiple epochs at the same location is easy to do if the user (1) clicks on a rectangle under the cursor to select the matching table row, (2) clicks the "Split" button, and (3) repeats for each epoch under the time marker that he or she wants to split.

@jpgill86
Copy link
Collaborator Author

I think I've replied to all of your comments and questions so far. Let me know if things are still confusing or if I missed any. I understand that this is a lot to review!

@jpgill86
Copy link
Collaborator Author

Based on your comments, I added a few items to the list of goals in #16.

@samuelgarcia
Copy link
Collaborator

OK.
Thank you for answering every. It is now more clear.
We or not far from a merge if you want.

2 other ideas:

I am wondering something. It is just a proposal.
Maybe it could be good to have this exclusive_mode not at params level in the EpochEncoder.
Because the user can change it whenever he wants.
It maybe would be easier to have it at EpochEncoder.init or even at source level.
So this params could not be changed on the fly.
In that case we could be easier to select the good method in get_chunk (searsorted or mask approach).
In we can force or he test for overlapping.

For the split mode. cursor did not meant with the mouse cursor but like you did but without selecting the rectangle. Maybe it could be usefull to add a key EpochEncoder.params like force_selection_before_split=True (what you did) force_selection_before_split=False cut everything at this time. Or course we would have to chosse a better keyword (easy for an english speaker).

@jpgill86
Copy link
Collaborator Author

Thanks @samuelgarcia for the write access privileges!

Sorry I've been AWOL for the last several days, I had to focus on other projects.

I plan to take a close look at all your recent work on ephyviewer tomorrow.

@samuelgarcia
Copy link
Collaborator

samuelgarcia commented Sep 25, 2018

Welcome in the small council.

@jpgill86
Copy link
Collaborator Author

I am wondering something. It is just a proposal.
Maybe it could be good to have this exclusive_mode not at params level in the EpochEncoder.
Because the user can change it whenever he wants.
It maybe would be easier to have it at EpochEncoder.init or even at source level.
So this params could not be changed on the fly.
In that case we could be easier to select the good method in get_chunk (searsorted or mask approach).
In we can force or he test for overlapping.

I understand the motivation to use the more efficient search algorithm, but I think there may be such a thing as unnecessary over-optimization, haha. I tried pushing the masking method to its limits using this script:

import numpy as np
import ephyviewer

app = ephyviewer.mkQApp()
win = ephyviewer.MainViewer()

n = 1000000
epoch_source = ephyviewer.InMemoryEpochSource([{
    'time':     np.arange(n),
    'duration': np.array([0.9] * n),
    'label':    np.array(['']*n),
    'name':     ''
}])

epoch_viewer = ephyviewer.EpochViewer(source = epoch_source, name = 'epochs')
win.add_view(epoch_viewer)

win.set_xsize(30)
win.show()
app.exec_()

It wasn't until there were on the order of 1,000,000 epochs loaded in memory that the masking method became noticeably slower during random seeking (dragging the time slider back and forth) compared to the old searchsorted method on my computer. Of course, on slower systems or when many other data are loaded into ephyviewer, speed reduction due to the slower masking method might become an issue, but it seems unlikely given how large that number is.

I'd like to eventually implement the "true multi-channel" described here. This would allow us to strictly enforce the no-overlaps rule within individual channels, and it would permit the use of the better searchsorted method.

Nevertheless, if you still have concerns about performance with this PR in its current form, I think your suggestions are good for addressing them, and I could take a stab at making the changes. Let me know.

For the split mode. cursor did not meant with the mouse cursor but like you did but without selecting the rectangle. Maybe it could be usefull to add a key EpochEncoder.params like force_selection_before_split=True (what you did) force_selection_before_split=False cut everything at this time. Or course we would have to chosse a better keyword (easy for an english speaker).

Adding the ability to choose how splitting works seems reasonable. How about this as an alternative approach: I could rename the "Split" button to "Split selected epoch", and I could add another button called "Split all epochs under cursor". Would that be acceptable?

@samuelgarcia
Copy link
Collaborator

OK.Let's forget the performencee for the moment.
Could you keep commented in the code for memory the "searchsorted" thing.

I still think that exclusive_mode shoudl be in _init and not in params because in params this can be changed on th fly and will make complicated the behavior.

OK for the split porposal.

This is great, we will have a world class tools. Thank you very much.

@jpgill86
Copy link
Collaborator Author

Perhaps this proposal in our other thread would address your concerns about exclusive_mode and the searchsorted algorithm?

@jpgill86
Copy link
Collaborator Author

jpgill86 commented Apr 8, 2019

Someday I will need to revisit all this and clean it up... but for now, I'm pushing a couple fixes from my fork!

@samuelgarcia
Copy link
Collaborator

Hi Jeffrey.
happy to see again here.
I also been very busy since this summer.
I don't remember why I did not merge this at that time.
I propose to have a look again at theses 2 PR and merge this soon.
So if you need improvement you will be able to add small piece directly into teh master.
Best

@jpgill86
Copy link
Collaborator Author

jpgill86 commented Apr 9, 2019

Hi Sam, good to hear from you! Yes, I've noticed that the Neo projects have been very busy lately with overhauls to annotations and fixes to the Blackrock IO. My inbox has been stuffed, haha!

As I recall, you suggested some alternative implementations for the features proposed here, especially relating to the new optional feature of overlapping epochs and the internal representation of multiple epoch channels. I said in #16 that I liked your suggestions and had a desire to implement them, but it would take a lot of rewriting, and I thought I might just start a new pull request. As it turned out, I didn't have much time to work on this (surprise!), so I haven't begun that rewrite.

As it is now, I use the new features in this PR regularly and find them very useful. If you think that merging soon and making incremental improvements later makes sense, that's fine with me. On the other hand, perhaps if you look closely again you will remember what you didn't like!

@jpgill86
Copy link
Collaborator Author

jpgill86 commented Apr 9, 2019

In case those 2 pull requests weren't enough, I gave you 4 more. Don't worry, they are much smaller. 😜

You gave me write-access to the repo back in September. I can merge the new pull requests myself if you think they look sane.

@samuelgarcia
Copy link
Collaborator

I guess this ultimate and very important commit is hiden message for merging ?
I don't remember totallyy the story nor I have time to read this PR again but I trust you.
So can I merge ?

@jpgill86
Copy link
Collaborator Author

Haha, I wasn't trying to steal your attention, I'm just polishing some things because I'm beginning to train some students to use the software.

I think you could merge now, and some of the things that we discussed last fall could be implemented later. I will have to read through these threads again to recall the details, but I think you had some good suggestions I wanted to try. However, I think they could be implemented as changes later, on top of these changes, rather than requiring a total rewrite before merging anything.

@samuelgarcia
Copy link
Collaborator

héhé.
thank you very much for this.
lets go.

@samuelgarcia samuelgarcia merged commit a3a07b8 into NeuralEnsemble:master May 16, 2019
@jpgill86 jpgill86 deleted the epoch-encoder branch May 16, 2019 20:01
@jpgill86 jpgill86 added this to the 1.1.0 milestone Jul 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants