Skip to content
This repository has been archived by the owner on Mar 16, 2023. It is now read-only.

Only train on pages on which document.interestCohort is called #33

Open
dmarti opened this issue Nov 25, 2020 · 5 comments
Open

Only train on pages on which document.interestCohort is called #33

dmarti opened this issue Nov 25, 2020 · 5 comments

Comments

@dmarti
Copy link
Contributor

dmarti commented Nov 25, 2020

In order to limit inadvertent sensitive group tagging, only train the FLoC classifier on the URL or content of pages on which document.interestCohort has been called. If the owner of a page wants to make it available for training but ignore the cohort, they can ignore the return value of the function.

The web currently has more than 1.2 billion sites (including parked domains). It is impractical for even a large browser developer to test for which patterns of usage of which sites are inadvertently revealing sensitive information about a user.

For example, web history on an ordinary-looking web-based game could result in providing inputs to the machine learning algorithm that train it to recognize a set of users with a specific disability that affects their gameplay, and expose that set of users as a cohort to any site they visit -- without revealing to the users affected that their cohort reveals this sensitive information to all those sites.

Patterns of usage of a set of general-interest world history or culture sites could result in training the algorithm to recognize people with specific political or religious concerns, again without revealing to the people affected that their cohort is flagging them as a likely member of a protected or at-risk group.

Many other patterns of emergent sensitive group tagging would likely only become evident only after FLoC has been deployed to real-world users with real web histories.

Source: https://news.netcraft.com/archives/category/web-server-survey/

Related issue: Publisher opt-out ? (Issue #13 covers an explicit opt-out that would remain in effect even if a script on the page later calls document.interestCohort)

@joshuakoran
Copy link

The post above raises a good point about the web author (publisher) expectations as to the control they ought to have over the operations of their web business.

The post also raises a good point about people's expectations. To expand on this second issue, people have real concerns around the attributes associated with their web client being used to harm them (e.g., embarrass people with sensitive health conditions or to deny them health insurance -- "a set of users with a specific disability that affects their gameplay, and expose that set of users as a cohort to any site they visit").

As we look to improve the web for people, it would be good to emphasize how proposals are protecting people from harms such as these, since they are unrelated to the existence of cross-origin IDs and instead associated with how data (attributes) are used to match content to people.

As has been discussed at TPAC and IWA BG, many of these issues are policy matters rather than technical ones and hence we can do a better job of delineating how proposals are improving the web for which stakeholders and the (unintended) impact to other stakeholders that may result.

@jkarlin
Copy link
Collaborator

jkarlin commented Nov 25, 2020

Thanks for opening the issue @dmarti . We're actively considering whether FLoC should be opt-in or opt-out, where this is a likely opt-in scenario. I think we'd also need a top-level opt-out option as well, so that a page that had some third-party use the interestCohort API wouldn't unknowingly or unwillingly be opted in. The question then becomes what happens to the API call if the page is opted out? My preference is to return an empty string in that case.

@dmarti
Copy link
Contributor Author

dmarti commented Nov 25, 2020

Yes, it makes sense that if the page is opted out and any script on the page calls interestCohort, it should get an empty string.

Sounds like an example of reciprocity -- if you want to use FLoC, you pay in to FLoC by allowing training. This would give some needed discretion to sites to use FLoC responsibly, so they could check with their own requirements (and update privacy policies if necessary) before turning it on.

@ph00lt0
Copy link

ph00lt0 commented Mar 7, 2021

The opt-in decision is not only to be made by the publisher. It should be the end users decision to allow evil tracking technologies. FloC is in essence not anonymous, therefor it should not automatically opt-in users by any means. Ignoring that would result is an undemocratic system.

@dmarti
Copy link
Contributor Author

dmarti commented Mar 18, 2021

Issue #61 points out that a browser extension will be able to obtain the user's cohort by injecting a script that calls interestCohort. There are legit reasons for an extension to be able to get the cohort (see #17). However an extension might inject such a script into a page that does not already call interestCohort and opt the page into FLoC training.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants