Django debates user tracking

Posted Dec 1, 2016 9:19 UTC (Thu) by misc (subscriber, #73730)
Parent article: Django debates user tracking

That's interesting, because I was discussing that with Ralph Bean (of Fedora fame) in the bus going to Devconf.cz back in February 2016, and we finally did explore a approach to do the counting based on dns, a public/private key schema and some kind of public log. IE, people use public key crypto to encode the payload, convert it to something that can be added to dns, and make a request to a central shared dns domain who publish his log.

In fact, first, we wanted to do that over a tor hidden service (now called onion services), but this would trigger too much IDS alert, and likely be blocked in some countries.

We need a specific domain, and a dns server. Each thing we want to count would trigger a json or whatever encoded blob to be sent over dns (in the host part), encrypted with a public/private key scheme.

The tracking party never see the ip, because the request is relayed by the dns at the isp level (or google dns, or public dns around), who do hide it. The log of DNS requests is supposed can be public, hence encryption. Given that the limited value of the data over time, we do think that this should be safe enough. And because we did envision the system to be shared (long term) among projects who did wish to have user counting, indication of something making a request to that dns domain wouldn't reveal anything to anyone, since it could come from homebrew or django or whoever use that. The idea is that someone wshing to count would take all string, attempts to decrypt using their key and see if it bring something useful or not (ie, proper json if encoding json, etc), and discard all crap as "not for me".

And the central dns server would just purge data on a regular basis. This doesn't prevent someone from doing a copy of course, but do prevent opportunistic attack in case of key compromission.

But we didn't publish the notes of our discussion, we did discuss that with Remy Decausemaker right after the bus, but Ralph changed role inside the company, and Remy left RH, so this didn't went anywhere.

There is a few shortcoming on the proposal, such as "handling the infra for that", likely crypto and security "details" such as getting it right, not hitting dns limit. And the usual downside of counting downloads, as I did tried to explain also in the past in http://community.redhat.com/blog/2015/09/lies-damned-lies...

And all the things that 2 engineers in a post FOSDEM week at the back of a Czech bus for 3h couldn't think about, of course.

But I truly think that something can be done, and so far, this proposal would prevent user identification by tracking IP and various way inspired by tor. The infra could be operated by LF or any others orgs. And there is a bunch of DNS that do not track people ( https://diyisp.org/dokuwiki/doku.php?id=technical:dnsreso... , https://servers.opennicproject.org/ ), some handled by non profit in different juridictions to avoid issue of "governement are asking broad access".

Then, the main remaining issue is the uuid handling as central to have a proper count, and see the difference between CI job and long term usage, etc, etc.

I also suspect there is things to do with homomorphic encryption, but I am not knowledgeable enough to tell what :)

(and the proposal only focus on "getting info in a database", the whole "prepare a proper UI" is left as a exercise to the user)