|
|
Subscribe / Log in / New account

Re: Concerns to software freedom when packaging deep-learning based appications.

From:  Ximin Luo <infinity0-AT-debian.org>
To:  Lumin <cdluminate-AT-gmail.com>, debian-devel-AT-lists.debian.org
Subject:  Re: Concerns to software freedom when packaging deep-learning based appications.
Date:  Sun, 15 Jul 2018 16:21:00 +0000
Message-ID:  <05d86304-2f32-d9b7-2edd-a1ec2ce5f3e3@debian.org>
Archive-link:  Article

Lumin:
> [..]
> 
> My core concern is:
> 
>   Even if upstream releases their pretrained model under GPL license,
>   the freedom to modify, research, reproduce the neural networks,
>   especially "very deep" neural networks is de facto controled by
>   PROPRIETARIES.
> 
> [..]
I think in general when one raise concerns, one should do some research to make it relevant to the
actual real situation. Your mail cited my leela-zero package but contained no information specific
to it. I'll supply the missing gaps:

The Debian package leela-zero contains no "pretrained model", or "weights file" in its
terminology.

Leela Zero contains a program (autogtp) that generates and uploads raw game data, based on some
weights you give it. Data generated by many volunteers is collected together and is available here
[1] released into the public domain [2]. There's more than 1TB of it.

The code to "compile"/"train" this raw data into a weights file is in training/* and contains code
for several different frameworks including tensorflow (Debian package in-progress at [3]). The nice
thing is that you can give it raw data, it will generate weights *periodically* so you can pause
the training and give the weights to someone else, who can perhaps supply more data and run a
different training algorithm on a different deep-learning platform. AFAICT that's how the
currently-recommended weights [4] have been trained, by the efforts of many different people for
the past year or so. For context, it is 110MB uncompressed and it's just a very big matrix of
floating-point numbers in ASCII format.

So the source code for everything is in fact FOSS, it's just the fact that the
compilation/"training" process can't be run by individuals or small non-profit orgs easily. For the
purposes of DFSG packaging everything's fine, we don't distribute any weights as part of Debian,
and upstream does not distribute that as part of the FOSS software either. This is not ideal but is
the best we can do for now.

There are various ways to improve this situation, and none of them involve philosphical discussions
on a mailing list about what is or isn't code or execution or proprietary whatever. The concepts
are really very clear, we know what the "source code" equivalent is (raw data plus training
algorithms), all of it is openly-licensed in the case of Leela Zero, and the problem is simply that
we can't perform the execution transparently as part of FOSS infrastructure.

Two approaches are: (1) get the hardware or (2) invent a method to verify the results, probably
using magical crypto to generate a proof-of-execution that can be easily verified and then have
deep-learning platforms do this. I'm pretty some there's some early-stage research on (2) as part
of the recent hype around blockchains but I don't know what the specific progress is, someone can
dig through this more.

Also in practise, for the case of Leela Zero, verification of the deep learning results is
"half-done" by having different weights/models play games against each other. In other words people
can about performance in practise, and don't worry about the fact that the deep learning platform
may embed secret specific corner cases into the compiled model for whatever reason. I think it's
useful to cover these corner cases, for transparency and security, but a lot of work has to go into
it beyond simply complaining about it on mailing lists.

X

[1] https://leela.online-go.com/training/
[2] https://github.com/gcp/leela-zero/issues/167
[3] https://salsa.debian.org/science-team/tensorflow
[4] http://zero.sjeng.org/best-network

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git



to post comments


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds