LWN.net Logo

RedHat do great work

RedHat do great work

Posted Oct 12, 2006 21:43 UTC (Thu) by skvidal (subscriber, #3094)
In reply to: RedHat do great work by roelofs
Parent article: How many Fedora users are there?

<i>But it sounds like they've decided to use the update service's web logs, which they already have. It may not be as privacy-cloaking (without an additional hashing step, anyway--which may be required in order to comply with European privacy regulations)</i>

I'm confused here. Are you saying that european privacy regulations do not allow you to read your own apache logs? That's all we're talking about here. We retrieve the logs of the hits against some urls and process the report to see how many unique ips hit that url.

If the european privacy regulations disallow viewing access logs then I cannot imagine how they're functional at all.

-sv


(Log in to post comments)

RedHat do great work

Posted Oct 12, 2006 22:09 UTC (Thu) by roelofs (guest, #2599) [Link]

I'm confused here. Are you saying that european privacy regulations do not allow you to read your own apache logs? That's all we're talking about here. We retrieve the logs of the hits against some urls and process the report to see how many unique ips hit that url.

Don't read too much into my comment--I'm not European and don't have to deal with European laws myself, so I'm absolutely not an authoritative source of info on this. ;-)

But my understanding is that European privacy law can seem surprisingly strict in some respects to someone used to US law, and if I'm remembering correctly, it touches on things like long-term data retention and security safeguards.

Also, as we've seen in the recent AOL PR disaster, even "anonymized" logs can be seriously problematic, depending on exactly what gets logged. I'm certainly not claiming auto-update web logs are in the same category as search logs, but an IP number coupled with the names of installed packages might be getting close to some threshold--particularly if it can be correlated with other forms of logging (like wiki updates or bugzilla reports).

In short, you're probably OK, but don't take my word for it, either way. Assuming you have any European "customers," you (or someone officially associated with the Fedora project) probably should have at least a vague idea of what the relevant laws require, just as a precaution.

Greg

RedHat do great work

Posted Oct 12, 2006 23:53 UTC (Thu) by skvidal (subscriber, #3094) [Link]

Just so we're clear on what information we're using:

Yum's default configuration file with fc6 requests a file from a cgi on mirrors.fedoraproject.org. This file is the list of mirrors for the repository

The only information that is passed to the cgi is:
the arch you're using and the version of fedora you're using. That information is passed so it can hand you back a list of valid mirrors closest to your ip range - based on your geographic ip as we got from geoip.

All we're doing is running an awk script across the apache access.log that tells us how many ips connected for each distro/arch.

There's no other information passed at all. No package names, no package lists, no machine info other than the architecture it's running as.

RedHat do great work

Posted Oct 14, 2006 21:15 UTC (Sat) by liljencrantz (subscriber, #28458) [Link]

Will this tracking be extended in the future to include information on what packages are updated? As a fedora extras maintainer, I'd be very interested in knowing roughly how many people actually use my packages.

RedHat do great work

Posted Oct 15, 2006 15:16 UTC (Sun) by skvidal (subscriber, #3094) [Link]

There's no tracking doing any of that.

the only thing we're getting information on is when a user contacts the mirror list cgi.

there's no package information contained therein.

-sv

RedHat do great work

Posted Oct 15, 2006 22:10 UTC (Sun) by liljencrantz (subscriber, #28458) [Link]

Thank you for clarifying. Is there any chance that this will change in the future?

RedHat do great work

Posted Oct 15, 2006 22:27 UTC (Sun) by skvidal (subscriber, #3094) [Link]

Change how? the structure of things - distributed out to mirrors is extremely prohibitive of this concept.

It's not impossible, but it'd be kinda silly to create a massive SPOF like that.

-sv

RedHat do great work

Posted Oct 15, 2006 22:45 UTC (Sun) by liljencrantz (subscriber, #28458) [Link]

Sorry, I was unclear. I didn't mean to suggest that the mirror distribution method should change. What I meant to ask was if the lack of information about what packages are downloaded could change in the future, since I, as an Extras maintainer, am curious about the number of users my package has.

This would probably imply that mirrors would need to send back some form of server logs to RH.

RedHat do great work

Posted Oct 16, 2006 4:07 UTC (Mon) by skvidal (subscriber, #3094) [Link]

speaking as a mirror for 6+ years I can assure you that getting the mirrors to feed back log information would be like pulling teeth.

painful and bloody.

-sv

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds