Thursday, March 08, 2007

Lower limit of free software packages in the tree?

Ever wonder what portion of the portage tree is free software, and
what portion is proprietary? Here's an estimate.

Total number of packages:

feynman grant> paludis --list-packages --repository gentoo | \
> sed -n -e 's/^\* \(.*\)/\1/p' | wc -l


Total number of packages w/ LICENSE containing [gpl|as-is|bsd]:

feynman grant> paludis --list-packages --repository gentoo | \
> sed -n -e 's/^\* \(.*\)/\1/p' | while read a ; \
> do paludis -qM ${a}::gentoo | grep LICENSE: | \
> cut -d: -f2 | grep -i --quiet '[gpl|as-is|bsd]' \
> && echo ${a} ; done | wc -l


So, if I did things right, we're looking at roughly 96.7% of the
tree. Thanks to ciaranm for the lengthy one-liner, although any
mistakes are definitely my own.

1 comment:

  1. imported comments

    Posted by g2boojum on Thu Mar 8 15:25:54 2007
    Thanks to beandog, I now know that I could have just looked at

    Posted by Diego Flameeyes Pettenò on Thu Mar 8 15:29:54 2007
    Besides the debatable need for using cut, grep should have put outside the while loop, which would have required only one run, and wc -l could have been replaced by the -c switch in grep itself.

    For who's interested, I find this command more readable, even if it counts something different, as packages can have different licenses per-version:

    pquery --one-attr=license --raw --all --repo=portdir | egrep -i '(gpl|as-is|bsd|mit)' -c

    returns 18703 on 22864 ebuilds, which is about 82%

    And yes, I added MIT on the license, and it still doesn't show a good result for the quantity of free software; the three or four licenses only cover a part of Free Software licenses, so I wouldn't really claim these stats are truthful, myself.

    Posted by Andrew Saunders on Thu Mar 8 18:34:37 2007
    Given that stuff like win32codecs falls under the "as-is" umbrella, is it really reasonable to count everything under "as-is" as Free Software?

    Posted by Diego Flameeyes Pettenò on Fri Mar 9 08:56:41 2007
    It would, if as-is was used consistently.. the classic as-is would be

    Permission to use, copy, modify, and distribute this software and its
    documentation for any purpose and without fee is hereby granted, provided
    that the above copyright notice appears in all copies and that both the
    copyright notice and this permission notice appear in supporting
    documentation, and that the same name not be used in advertising or
    publicity pertaining to distribution of the software without specific,
    written prior permission. We make no representations about the
    suitability this software for any purpose. It is provided "as is"
    without express or implied warranty.

    but it is often used for "All rights reserved" software like win32codecs, yes it's an error, as much as it is to use BSD for MIT-licensed stuff (so-called BSD-2), and at the same time for BSD-3 and BSD-4. It's also a mistake to use GPL-2 for both "GNU GPL 2 or later and gnu gpl 2 only"

    Posted by brian harring on Fri Mar 9 11:43:12 2007
    pquery --raw --all --one-attr license -n --repo=portdir | grep -i '[bsd|gpl|as-is]' wc

    is a bit closer to the original version he tried.

    Posted by brian harring on Fri Mar 9 11:46:33 2007
    heh... yay for lack of sleep
    query --raw --all --one-attr license -n --repo=portdir | egrep -i '(bsd|gpl|as-is)' | wc

    is a fair bit more accurate ;)

    closer to 78% via that also, since the version grant used does a char check instead of proper text match...

    Posted by Diego Flameeyes Pettenò on Fri Mar 9 14:09:39 2007
    Still the wc should have been wc -l (think of || licenses), and using -c to egrep is faster and saves one pipe.



Label Cloud