Ever wonder what portion of the portage tree is free software, and
what portion is proprietary? Here's an estimate.
Total number of packages:
feynman grant> paludis --list-packages --repository gentoo | \
> sed -n -e 's/^\* \(.*\)/\1/p' | wc -l
11540
Total number of packages w/ LICENSE containing [gpl|as-is|bsd]:
feynman grant> paludis --list-packages --repository gentoo | \
> sed -n -e 's/^\* \(.*\)/\1/p' | while read a ; \
> do paludis -qM ${a}::gentoo | grep LICENSE: | \
> cut -d: -f2 | grep -i --quiet '[gpl|as-is|bsd]' \
> && echo ${a} ; done | wc -l
11164
So, if I did things right, we're looking at roughly 96.7% of the
tree. Thanks to ciaranm for the lengthy one-liner, although any
mistakes are definitely my own.
imported comments
ReplyDeletePosted by g2boojum on Thu Mar 8 15:25:54 2007
Thanks to beandog, I now know that I could have just looked at http://spaceparanoids.org/gentoo/gpnl/stats.php?q=license
Posted by Diego Flameeyes Pettenò on Thu Mar 8 15:29:54 2007
Besides the debatable need for using cut, grep should have put outside the while loop, which would have required only one run, and wc -l could have been replaced by the -c switch in grep itself.
For who's interested, I find this command more readable, even if it counts something different, as packages can have different licenses per-version:
pquery --one-attr=license --raw --all --repo=portdir | egrep -i '(gpl|as-is|bsd|mit)' -c
returns 18703 on 22864 ebuilds, which is about 82%
And yes, I added MIT on the license, and it still doesn't show a good result for the quantity of free software; the three or four licenses only cover a part of Free Software licenses, so I wouldn't really claim these stats are truthful, myself.
Posted by Andrew Saunders on Thu Mar 8 18:34:37 2007
Given that stuff like win32codecs falls under the "as-is" umbrella, is it really reasonable to count everything under "as-is" as Free Software?
Posted by Diego Flameeyes Pettenò on Fri Mar 9 08:56:41 2007
It would, if as-is was used consistently.. the classic as-is would be
Permission to use, copy, modify, and distribute this software and its
documentation for any purpose and without fee is hereby granted, provided
that the above copyright notice appears in all copies and that both the
copyright notice and this permission notice appear in supporting
documentation, and that the same name not be used in advertising or
publicity pertaining to distribution of the software without specific,
written prior permission. We make no representations about the
suitability this software for any purpose. It is provided "as is"
without express or implied warranty.
but it is often used for "All rights reserved" software like win32codecs, yes it's an error, as much as it is to use BSD for MIT-licensed stuff (so-called BSD-2), and at the same time for BSD-3 and BSD-4. It's also a mistake to use GPL-2 for both "GNU GPL 2 or later and gnu gpl 2 only"
Posted by brian harring on Fri Mar 9 11:43:12 2007
@diego
pquery --raw --all --one-attr license -n --repo=portdir | grep -i '[bsd|gpl|as-is]' wc
is a bit closer to the original version he tried.
Posted by brian harring on Fri Mar 9 11:46:33 2007
heh... yay for lack of sleep
query --raw --all --one-attr license -n --repo=portdir | egrep -i '(bsd|gpl|as-is)' | wc
is a fair bit more accurate ;)
closer to 78% via that also, since the version grant used does a char check instead of proper text match...
Posted by Diego Flameeyes Pettenò on Fri Mar 9 14:09:39 2007
Still the wc should have been wc -l (think of || licenses), and using -c to egrep is faster and saves one pipe.