-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
define a standardized, serializable representation for modality-agnostic "montage descriptions" #114
Comments
Agree... it is not reasonable to build a "library" of montages. Sometimes custom montages are required (a common example is when a lead switch is known to have occurred, and then the montage would need to substitute the leads as appropriate). Perhaps a montage description/definition in which a set of reference channels is defined (system ref, avg ref, A1, A2, etc) and then channel defs as input#-ref# (example, define references to be ref (recording reference), avgref as the average of all EEG channels, A2 as right ear, and [Fp2 +F7 + f3] + 0.5 * [t3 + c3 + f8] as localLaplacianAverave_fp1. Then I want a montage that has Fp1-ref, Fp1-avgref, Fp1-A2, and Fp1-localLaplacianAverage_fp1 |
Got a start on this while on the plane from SFO to JFK yesterday! IntroFirst, let's define a "montage" in Onda-land to be some specification of a transformation of the form:
We want Onda montages to be primarily concerned with operations over collections of channels; including realiasing, pattern matching, grouping/stacking/splitting, and numerical broadcast/reduction operations. They may be notably unconcerned with:
If we canonicalize a (de)serializable DSL representation for Onda montages, then Onda.jl (and others) can implement resolver functions like: # executes the transformation specified by `montage` given the input data
execute(montage,
input_multichannel_sample_data_1,
input_multichannel_sample_data_2,
...) This would basically require implementing an interpreter for the montage DSL. If we implemented an abstract interpreter (or made sure our interpreter could be run in an abstraction interpretation mode), we could use the exact same DSL to do "data-less" transformation/matching, e.g. # returns the derived output channel names
evaluate(montage,
input_channel_names_1,
input_channel_names_2,
...) (Note that in this approach, like with regex, an empty output channel set would mean that the input didn't "match" the montage.) It's worth calling out that Julia is a pretty nice language for implementing these kinds of interpreters - you basically just parse content into AST nodes represented via Julia types, then you can use multiple dispatch to your heart's content to actually drive the interpretation. Simple JSON-based DSLHere's an extremely rough sketch of a DSL for representing montages staged atop JSON, with AST nodes represented by single-element objects:
ExamplesReturn all channels of the input minus the average of channels [
{"$channels":{"get":["$1",{"any":["a","b","c"]}]}},
{"$avg":{"mean":"$channels"}},
[{"-":["$1","$avg"]}]
] Return the dynamic range of the first signal minus the dynamic range of second, stacked on the magnitude of the difference of the two signals' means: [
{"$range1":{"-":[{"max":"$1"},{"min":"$1"}]}},
{"$range2":{"-":[{"max":"$2"},{"min":"$2"}]}},
{"$diff1":{"-":["$range1","$range2"]}},
{"$diff2":{"abs":{"-":[{"mean":"$1"},{"mean":"$2"}]}}}
{"return": [{"stack":["$diff1","$diff2"]}]}
] Restructure three signals into two signals: [
{"$x": [
{"alias":["x1", {"get":["$1", "a"]}]},
{"alias":["x2", {"get":["$2", "a"]}]},
{"alias":["x3", {"get":["$3", "a"]}]},
]},
{"$y": [
{"alias":["y1", {"get":["$1", "b"]}]},
{"alias":["y2", {"get":["$2", "b"]}]},
{"alias":["y3", {"get":["$3", "b"]}]},
]},
{"return": ["$x", "$y"]}
] Same DSL, Julia-Flavored SyntaxWe can stage the same language upon (a very limited subset of) Julia syntax instead of JSON. Pros:
Cons:
Below are the same examples written in the JSON DSL, written in Julia-Flavored syntax instead (uses Return all channels of the input minus the average of channels channels = IN[1][any("a", "b", "c")];
avg = mean(channels);
return [IN[1] - avg]; Return the dynamic range of the first signal minus the dynamic range of second, stacked on the magnitude of the difference of the two signals' means: range1 = max(IN[1]) - min(IN[1]);
range2 = max(IN[1]) - min(IN[1]);
diff1 = range1 - range2;
diff2 = abs(mean(IN[1]) - mean(IN[2]));
return [stack(diff1, diff2)]; Restructure three signals into two signals: x = ["x1" => IN[1]["a"],
"x2" => IN[2]["a"],
"x3" => IN[3]["a"]];
y = ["y1" => IN[1]["b"],
"y2" => IN[2]["b"],
"y3" => IN[3]["b"]];
return [x, y]; Channel Name Propagation/GenerationOur DSL enables authors to realias channel names whenever they want, but what if they don't? What are the names of the output channels? To answer this question, we should define semantics for tracing/accumulating/propagating intermediate channel names at each operation of our DSL. Take our earlier example: [
{"$channels":{"get":["$1",{"any":["a","b","c"]}]}},
{"$avg":{"mean":"$channels"}},
[{"-":["$1","$avg"]}]
] What should the output channel names be given input channels IMO it'd make sense for the semantics to follow the bindings used by the author, in which case we might have to add some notion to drive when we want bindings to inline into channel names, or maybe those semantics can be hardcoded into our statement/broadcast/reduction operations. You could imagine defining these semantics so that it's obvious that the answer to the above is:
This also implies that we might need to change Onda's validation criteria for Another thing to consider is cross-referencing across signals, like: [
[{"-":["$1","$2"]}]
] From the Onda specification:
This would require some thought to get right / avoid too much magic |
I should point out that one of my prerequisite action items to the above design brainstorm was to do a bit of review and see how other systems handle this (e.g. for EEG remontaging/specification - I know lots of other systems have some form of custom montage specification); however my plane's WiFi was busted so I couldn't search for stuff, so I dove ahead 😁 Would still be good to do a bit of review here, though. |
My take on montages is SUPER simple. Wrote it out here, hopefully you can read my writing. I can define a transform matrix for all the montages I use... though you have possibly suggested use cases where I could not (though I'm not sure about that). Happy to meet... about to set up a discussion with Dave and Mike.. |
dope! thanks, that makes sense i think the "essence" of the DSL-based approach is to enable doing these transforms in a "basis agnostic" way from a linear algebra perspective, since in Onda the basis is data-driven (by sets of channels) not fixed by convention. hence the functional form vs. the matrix form might be convenient to include a matrix representation as well though i guess also you could establish some convention like "i don't know what the basis is, but I do always sort the channel names lexographically," in which case you could represent canonical transforms as slices of the same canonical (infinite dimensional but countable) matrix That makes me realize we could also separate this out into two things:
that actually might be cleaner but maybe less flexible? worth thinking about |
The matrix form seems easy to implement efficiently, which would be a great pro. But, it doesn't seem to cover the entire scope of what we want from our transformation system (e.g. human readability, filtering). For the JSON vs Julia debate: I want to bring up two more general concerns: PerformanceIt wasn't easy to optimize streams montaging performance. Small slip ups like allocating one temporary array already were noticable in the latency (especially for multiple users), so to be usable in stream we mostly need the performance of handwritten code. So, we should make sure that the specification won't make us invest months into implementing a difficult mapreduce transformation system. SecurityI think we do need to accept pretty arbitrary specifications from the web. Although, we should be able to restrict it to authorized users, which should decrease the attack surface quite a bit. But, when chosing the parser and the implementation, we should definitely keep an eye out for security concerns. |
Yeah, that's a good idea. Worth calling out that "optimal" will mean different things to different users/use cases (e.g. latency vs. throughput). I think once we settle on / publish the spec, we should include a very basic montage resolver in Onda and maybe publish a more-dependency-heavy-but-more-performant resolver (backed by e.g. Dagger) in a separate package
I'm not sure about the relevance of this 🤔 Do you mean a situation where an attacker (or always more likely, a well-intentioned caller making some mistake 😁) accidentally sent some huge transformation description to some resolver function/service that would overload its capacity? i'd imagine that's pretty easily mitigated by the resolver just having some basic statement/complexity limit or something (keep in mind this thread is public / in OSS land, so there might not be as much context for internal Beacon projects 🙂) |
Yes, me neither! DSLs and parsing always ring some alarm bells for me, so I just wanted to quickly bring it up, in case there are some non obvious implications :) |
We've talked about this for years internally at Beacon, but have never actually gotten around to implementing it.
At Beacon, we have libraries, applications, and distributed services that all perform certain kinds of re-referencing/montaging operations. It's thus pretty common for one actor to want to describe a montage to be fulfilled/resolved/computed by another actor, and for actors to reuse each other's montage descriptions for consistency purposes. Examples:
For the sake of interopability between such actors (especially across language environments and service/package API boundaries), it'd be useful/appropriate for Onda to define a generic specification for a serializable representation of a "montage description". It probably should be defined in JSON (which is encodable in Arrow) to enable portability/applicability across many language environments.
It would need to support common arithmetic broadcast operations as well as reduction operations. We should clearly define what "resolvers" need to support. We probably want to also ensure the standard representation is extensible with user-defined operations to support modality-specific operations (e.g., filtering).
The text was updated successfully, but these errors were encountered: