This page describes a new project which attempts to create an unpacked copy of the source code of each package contained in the Debian Archive. This would then allow source code scanning, to detect simple patterns of code known to be frequently insecure.
The process is pretty straightforward:
Mirroring the archive will take approximately 29Gb of space. To store the unpacked source you'll need considerably more space. Currently I have 100Gb of free storage and I've managed to fill that completely without unpacking /everything/.
I'd suggest if you're interested in replicating this work that you don't start unless you have a fast connection and 200Gb of local storage to work with.
To unpack the archive I use a simple invokation like:
find /mnt/mirror/source/pool/main/ -name '*.dsc' | \
xargs --max-args 1 /home/skx/bin/unpack
This will unpack each package source to /mnt/mirror/unpacked. The unpacking uses a simple helper tool called dls which is designed to show the files referenced within a .dsc or .changes file. That can be used as follows:
skx@vain:~$ dls lighttpd_1.4.13-4etch2.dsc lighttpd_1.4.13.orig.tar.gz lighttpd_1.4.13-4etch2.diff.gz
Now that the archive is unpacked we can start scanning. The two scans I've done so far are:
rgrep getenv /mnt/mirror/unpacked | grep strcpy
and
rgrep getenv /mnt/mirror/unpacked | grep sprintf
These two searches are both designed to find unbounded string copys of environmental variables. Note that these are not perfect patterns since they don't take account of code like this:
foo = ( getenv("FOO" ) )
if ( NULL != foo )
{
sprintf( str, "%s/.blah", foo );
}
For each of the two scans performed so far I've saved the results, and have about 1000 matches to examine by hand. A slow process.
I've been filing bugs with results after checking them, and tagging them with "sourcescan". Unfortunately these user tags don't seem to be working properly:
Security advisories which have resulted from this work include:
I'm sure more will follow.