Scanning for secrets

Posted Apr 8, 2021 9:23 UTC (Thu) by k3ninho (subscriber, #50375)
Parent article: Scanning for secrets

>It is better, of course, if secrets never actually make it into a repository at all. Second best would be to catch them on the committer's system before they have pushed their changes to the central repository; it may be somewhat painful to do, but the offending commit(s) can be completely removed from the history at that point via a rebase operation.

Like evidence you've run tests on the code you're committing, you can agree a process* where you need evidence you've run the pre-commit linting/cleanup script that includes a scan for key-like items before the pull request can be approved and the code merged. That would have to be a chained hash of the testcases that worked: this binary artefact (a) in these configurations (b) ran this test suite (c) and we use developer's gpg key (d) to sign results set (a+b+c+d) to $hash1, or head-of-tree (i) plus the diff of these changes (j) was accredited by this edition (k) of pre-commit script with result (l), signed by developer gpg key (d, let's reuse it) to $hash2. The goal is that someone looking at the diffs can re-run these scripts to score the same hashes -- plus the local run before pushing upstream will warn on possible key-like data.

*: when I hear other developers say 'should', I wince because 'should is considered harmful'. Likewise 'you can agree a process' has the caveat that we *can and should* agree a process but for everyone's special considerations.

K3n.

Scanning for secrets

Posted Apr 8, 2021 10:26 UTC (Thu) by Otus (subscriber, #67685) [Link] (5 responses)

That workflow is almost the opposite of most CI setups where committing and pushing/pull requesting is precisely what triggers the build/test/lint/etc run and leaves proof that those are ok. I certainly wouldn't want to run some of the heavier build&test processes on my laptop if I can offload to a build server instead.

This seems like a separate problem.

Scanning for secrets

Posted Apr 8, 2021 19:12 UTC (Thu) by mathstuf (subscriber, #69389) [Link] (4 responses)

Besides the fact that rando developer's whacky `LD_LIBRARY_PATH` they put into `.profile` to fix a problem they had years ago makes the PR work is no reason to trust their test suite run. I want to see what the known environment thinks of your code, not what your custom setup thought of the code.

Scanning for secrets

Posted Apr 9, 2021 0:52 UTC (Fri) by pabs (subscriber, #43278) [Link] (3 responses)

I think I'd prefer to see test suites pass on a diverse set of systems rather than just the few that the CI runs on, especially since CI services usually support very few architectures and operating systems.

Scanning for secrets

Posted Apr 9, 2021 11:58 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

That's true, but getting that developer to remember they had set `LD_LIBRARY_PATH` which is causing your test suite to fail all over the place is a frustrating debugging experience. Nothing makes sense until you realize that they haven't been using the libraries at runtime anyone thought was in use.

A diverse set of environments is useful, but it has to be a *known* set of environments. I'm not coding up the logic needed to guard my project against silly `LD_PRELOAD` environments, rogue `PYTHONPATH`, or other such things. It's an exercise in futility for very little gain. CI provides that known environment. The dockcross project can do it for other Linux arches, but I'm not too far from "can you reproduce it in a Docker container? no? do that first please" in response to spooky linker-related problems.

Scanning for secrets

Posted Apr 10, 2021 1:17 UTC (Sat) by pabs (subscriber, #43278) [Link] (1 responses)

I don't think every project should need to guard against environment variables like CFLAGS, LD_PRELOAD or PYTHONPATH, since those are intended to change the build process.

At some point, "it works in CI" is basically the same as "it works on my computer". A better approach in case of build environment related problems is to record the two build environments, compare them and bisect the differences to find out which change causes the problem.

Some build environment related examples:

The Debian "buildd from hell", which compared packages built in a clean chroot with those built in a chroot containing as many -dev packages as possible to install at the same time. The mail below contains Message-IDs for related discussions, which you can put into the Debian lists msgid-search form to find the archives.

https://lists.debian.org/msgid-search/351842f7a4da3cff7ee...
https://lists.debian.org/msgid-search/

The Reproducible Builds folks deliberately vary build environments in various ways in order to detect parts of the build system that introduce non-determinism. Some of that may (in the past, now or in future) include LD_PRELOAD of various things, including faketime and or cowbuilder, which has a copy-on-write preload. The buildinfo.rst link below contains some of the philosophy that lead to this approach. Of course, the set of variations could be expanded and the set of tested build environments will never achieve the level of variation that random users trying to reproduce builds could achieve.

https://reproducible-builds.org/
https://tests.reproducible-builds.org/debian/index_variat...
https://salsa.debian.org/reproducible-builds/reprotest
https://reproducible-builds.org/docs/perimeter/
https://reproducible-builds.org/docs/recording/
https://salsa.debian.org/reproducible-builds/specs/buildi...

The Bootstrappable Builds project is aiming to get to a full Linux distro from < 1000 bytes of audited machine code plus all the necessary source code. Their approach is slightly different, instead of recording the build environment, they aim to *create* the build environment from scratch, but they will still encounter build environment differences, due to hardware differences and non-determinism (but they plan to eventually push the bootstrap process deeper into the hardware layer). They also desire build environment diversity though, they want to be able to do this for any arch and from any arch and on a variety of hardware of the same arch.

https://bootstrappable.org/
https://github.com/fosslinux/live-bootstrap/blob/master/p...
https://github.com/oriansj/talk-notes/blob/master/live-bo...
https://github.com/oriansj/talk-notes/blob/master/live-bo...

Scanning for secrets

Posted Apr 12, 2021 13:00 UTC (Mon) by mathstuf (subscriber, #69389) [Link]

> A better approach in case of build environment related problems is to record the two build environments, compare them and bisect the differences to find out which change causes the problem.

I agree with all of that. However, we're lacking a suitable "diff" tool to first get the diff we need to bisect. Unfortunately, not everyone is aware of effects that their "quick fixes" actually have and so when asking for differences, the important details don't even come up until you've already changed the code a few times, rebuilt, then finally asked for `LD_DEBUG=libs` output to be provided showing that none of the changes even mattered.