This page describes a new project which attempts to create an unpacked copy of the source code of each package contained in the Debian Archive. This would then allow source code scanning, to detect simple patterns of code known to be frequently insecure.

Overview

The process is pretty straightforward:

Mirroring the archive will take approximately 29Gb of space. To store the unpacked source you'll need considerably more space. Currently I have 100Gb of free storage and I've managed to fill that completely without unpacking /everything/.

I'd suggest if you're interested in replicating this work that you don't start unless you have a fast connection and 200Gb of local storage to work with.

Archive Unpacking

To unpack the archive I use a simple invokation like:

 find /mnt/mirror/source/pool/main/ -name '*.dsc' | \
    xargs --max-args 1 /home/skx/bin/unpack

This will unpack each package source to /mnt/mirror/unpacked. The unpacking uses a simple helper tool called dls which is designed to show the files referenced within a .dsc or .changes file. That can be used as follows:

skx@vain:~$ dls lighttpd_1.4.13-4etch2.dsc
lighttpd_1.4.13.orig.tar.gz
lighttpd_1.4.13-4etch2.diff.gz

Archive Scanning

Now that the archive is unpacked we can start scanning. The two scans I've done so far are:

rgrep getenv /mnt/mirror/unpacked | grep strcpy

and

rgrep getenv /mnt/mirror/unpacked | grep sprintf

These two searches are both designed to find unbounded string copys of environmental variables. Note that these are not perfect patterns since they don't take account of code like this:

foo = ( getenv("FOO" ) )
if ( NULL != foo )
{
   sprintf( str, "%s/.blah", foo );
}

Results

For each of the two scans performed so far I've saved the results, and have about 1000 matches to examine by hand. A slow process.

I've been filing bugs with results after checking them, and tagging them with "sourcescan". Unfortunately these user tags don't seem to be working properly:

Security Advisories

Security advisories which have resulted from this work include:

I'm sure more will follow.