add section on Event Labeling Queries #70

bmcase · 2023-05-08T15:41:57Z

There has been considerable interest in supporting event level outputs in IPA (IPA issue 60, PATCG issue 41). This was discussed at the May 2023 PATCG meeting where the consensus was we could support this so long as we can enforce a per user bound on the information released.

Here we outline a first draft of how an Event Labeling Query that labels events with noisy labels can be done in a way that lets us maintain a per matchkey DP bound. We also consider how these new queries can be compatible with an IPA system that flexibly supports either aggregation queries or event labeling queries.

w/ @benjaminsavage

csharrison · 2023-05-08T23:49:46Z

A few quick notes:

For dealing with users with multiple contributions I think it will probably be better to cap the sensitivity than to scale the noise dynamically.
Supporting binary labels is good but ideally we would support enum labels / value bucket labels (Add a mechanism to support breakdowns by trigger data #55, Consider a randomized-response-like privacy mechanism #60)

cc @badih

bmcase · 2023-05-09T18:23:23Z

@csharrison

For dealing with users with multiple contributions I think it will probably be better to cap the sensitivity than to scale the noise dynamically.

Yeah, that would be a nice option. I actually started out trying to do it that way, but ran into some issues. Once you pass the cap you have to still return something for that source row. For rows that exceeded the cap you can't give back a "not labeled" indicator or that can leak information. I think you could try to return just a random label, maybe by setting to 0 and then flipping with the same flip probability as other rows, but it wasn't immediately clear to me that this also wouldn't increase the DP bound on how much information was revealed about that user. It also wasn't clear to me if random labels on some events would be worst for utility than scaling all the noise.

I think we can probably make it work, but like I said this was just a quick first draft to get started.

csharrison · 2023-05-09T21:30:32Z

I think you could try to return just a random label, maybe by setting to 0 and then flipping with the same flip probability as other rows, but it wasn't immediately clear to me that this also wouldn't increase the DP bound on how much information was revealed about that user.

I think this should be fine for a DP bound and the argument will look really similar to sensitivity capping + Laplace with per-event queries. Consider two databases D and D' that are adjacent. After rate-limiting to sensitivity $s$, a users contribution will look like $(x_1, x_2, ... x_n)$ and $(x_1', x_2', ..., x_n')$ for n events, where the value at any $x_i$ is the label. After capping, these vectors will differ on at most $s$ columns¹. If you consider an output of $(y_1, ..., y_n)$, then we have

$$\frac{Pr[y_1,...,y_n | x'_1,...x'_n]}{Pr[y_1,...,y_n | x_1,...x_n]} = \prod_{i=1}^{n}{\frac{Pr[y_i | x'_i]}{Pr[y_i | x_i]}} = \prod_{i \in differing\_rows}{\frac{Pr[y_i | x'_i]}{Pr[y_i | x_i]}}$$

That is, we can analyze each of the $s$ differing columns independently, and if you can show that the mechanism on a single column is bounded by $e^\epsilon$ it implies the whole mechanism is $s \epsilon$-differentially private.

This is with an add-remove notion of DP that will just strip conversions for a converting user or add fake conversions for a non-converting user. With a "replacement" notion of DP that merely modifies the conversion patterns for a user the rows will differ on $2s$ values and everything goes through with a factor of 2 in the bound. ↩

martinthomson · 2023-05-09T22:35:10Z

GitHub fail:

Shame that I'm not smart enough to parse that out myself.

bmcase · 2023-05-17T17:52:56Z

Updated the doc to include the version that caps the number of events labeled per matchkey.

Benjamin Case added 3 commits May 8, 2023 11:26

Event Labeling Queries, first draft

e881f3a

Event Labeling Queries, first draft

64b3902

update Table of Contents

a91d2c2

bmcase mentioned this pull request May 8, 2023

IPA Biweekly Meeting Agenda Request - Event Labeling Queries with per matchkey DP bound #71

Closed

bmcase force-pushed the main branch from 121bc6b to a91d2c2 Compare May 9, 2023 21:27

capping version for Event Labeling Queries

612cf5c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add section on Event Labeling Queries #70

add section on Event Labeling Queries #70

bmcase commented May 8, 2023

csharrison commented May 8, 2023

bmcase commented May 9, 2023 •

edited

Loading

csharrison commented May 9, 2023

martinthomson commented May 9, 2023

bmcase commented May 17, 2023 •

edited

Loading

add section on Event Labeling Queries #70

Are you sure you want to change the base?

add section on Event Labeling Queries #70

Conversation

bmcase commented May 8, 2023

csharrison commented May 8, 2023

bmcase commented May 9, 2023 • edited Loading

csharrison commented May 9, 2023

Footnotes

martinthomson commented May 9, 2023

bmcase commented May 17, 2023 • edited Loading

bmcase commented May 9, 2023 •

edited

Loading

bmcase commented May 17, 2023 •

edited

Loading