PMXBOT Log file Viewer

Help | Karma | Search:

#aboutcode logs for Wednesday the 14th of March, 2018

(Back to #aboutcode overview) (Back to channel listing) (Animate logs)
[08:07:22] <travis-ci> nexB/aboutcode-toolkit#845 (load_json_issue - 33f95ff : Chin Yeung Li): The build passed.
[08:07:23] <travis-ci> Change view : https://github.com/nexB/aboutcode-toolkit/compare/load_json_issue
[08:07:23] <travis-ci> Build details : https://travis-ci.org/nexB/aboutcode-toolkit/builds/353215298
[08:12:52] <travis-ci> nexB/aboutcode-toolkit#846 (load_json_issue - f3a3a05 : Chin Yeung Li): The build was broken.
[08:12:52] <travis-ci> Change view : https://github.com/nexB/aboutcode-toolkit/compare/33f95ffd0708...f3a3a05b1841
[08:12:53] <travis-ci> Build details : https://travis-ci.org/nexB/aboutcode-toolkit/builds/353216847
[14:46:18] <jose_ifm> hi
[14:49:53] <jose_ifm> I am a new user of ScanCode and have an issue using it. Is this the right way to ask questions about it?
[15:52:23] <pombreda> jose_ifm, hi :)
[15:52:32] <jose_ifm> hi
[15:53:14] <pombreda> you can ask it here, or on a ticket at https://github.com/nexB/scancode-toolkit/issues or there is also a more active Gitter chat channel at https://gitter.im/aboutcode-org/discuss
[15:53:23] <pombreda> jose_ifm, all ways work!
[15:53:46] <pombreda> though this channel is made quite busy from CI and commit notifications FWIW
[15:54:04] <jose_ifm> ok, will try any of them
[15:54:23] <pombreda> jose_ifm, go ahead with your question anyway you like
[15:54:32] <jose_ifm> also here? :O
[15:54:42] <pombreda> sure
[15:54:56] <pombreda> any way works
[15:55:33] <jose_ifm> well, I am running scancode in a linux machine traying to extract licenses information of a pretty big folder containing the linux kernel (portions of it)
[15:56:06] <jose_ifm> Afte some hours running, it finishes the scan and start to save results
[15:56:16] <jose_ifm> but it fails with following message:
[15:56:21] <jose_ifm> ./scancode: line 114: 10444 Killed $SCANCODE_ROOT_DIR/bin/scancode "$@"
[15:56:50] <jose_ifm> Scanning result:
[15:56:52] <jose_ifm> Scan statistics: 172881 files scanned in 17752s. Scan options: licenses with 1 process(es). Scanning speed: 9.74 files per sec. Scanning time: 17750s. Indexing time: 1s.
[15:57:20] <jose_ifm> Extracting the copyrights from the same folder worked perfectly
[15:57:33] <jose_ifm> (and save the results)
[15:58:07] <jose_ifm> Any idea?
[16:04:08] <pombreda> jose_ifm, which version of ScanCode do you run?
[16:04:24] <pombreda> and which command line options did you use?
[16:04:33] <jose_ifm> 2.2.1
[16:05:06] <jose_ifm> --only-findings -l -format
[16:05:24] <jose_ifm> Format is CSV, then source dir and filename
[16:05:31] <pombreda> ok, how much ram and cpu do you have on this box?
[16:05:53] <jose_ifm> 4Gb Ram, 1 CPU. It is running in a Virtual Machine
[16:07:11] <jose_ifm> Is that an issue?
[16:09:18] <pombreda> ok. that's on the low side, but there is a bug in 221 that you are likely hitting
[16:11:09] <jose_ifm> which vesion should I use?
[16:12:11] <jose_ifm> 2.9.b1?
[16:13:00] <pombreda> jose_ifm, one sec, checking something
[16:15:06] <pombreda> jose_ifm, yes, plsae use 2.9.b1 instead.
[16:15:19] <pombreda> Now 17752s is a lot of time
[16:15:21] <jose_ifm> ok
[16:15:38] <pombreda> for instance it takes 20 minutes to scan a linux kernel on my laptop ;)
[16:15:52] <jose_ifm> I know, it is the stupid VM...
[16:15:55] <pombreda> I have 16GB and quad core though
[16:16:08] <jose_ifm> will try to install it and run it in another machine
[16:16:39] <jose_ifm> A different question: I try to generate a CSV output with '|' instead of ',' as field separator
[16:16:45] <pombreda> but using multiprocessing (the --processes/-n speeds it up a lot
[16:17:38] <jose_ifm> nice hint about the processes - will use it for sure!
[16:18:09] <pombreda> why usinga different separator (and BTW the options for output/format have changed in 2.9b1, and you can create plugins for various format and create multiple formats too)
[16:18:33] <pombreda> now you issue above is that you are getting prockilled by the kernel (e.g. using too much ram quite likely)
[16:18:54] <pombreda> *your issue
[16:19:44] <pombreda> with a large scan with 2.2.1, the scan were all in RAM at the end, hence the prockill IMHO
[16:20:01] <pombreda> the latest develop is/should be less memory hungry
[16:20:07] <jose_ifm> reason for the separator: I need to filter the output file, and using the ',' as separator also separates the copyright/liceses information into fields. With '|' as separator, it is ok
[16:20:10] <pombreda> e.g. 2.9b1
[16:20:45] <pombreda> jose_ifm, ok, but there is more than just commas to CSVs
[16:21:22] <pombreda> if you need to do filtering, the 2.9b1 (and develop branch) have a much better way to do this that fiddle with the output directly
[16:21:55] <pombreda> jose_ifm, for instance this https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/plugin_only_findings.py
[16:22:39] <jose_ifm> do you know which docu should I read additionally? I already tried --only-findings and no very successfully
[16:22:49] <pombreda> that the --only-findings code
[16:23:13] <pombreda> there is not much docu yet . It needs to be written :P
[16:23:18] <jose_ifm> :D
[16:23:29] <jose_ifm> Sourcecode docu - best docu
[16:23:40] <jose_ifm> like jenkins
[16:23:46] <pombreda> but in any case a filter plugin derived from the one above should be easier to code IMHO
[16:24:06] <pombreda> well, we need more doc :P code as doc is not great for sure
[16:24:31] <pombreda> jose_ifm, what kind of filtering do you need?
[16:24:42] <jose_ifm> Right now, I created a formatted option that uses that '|' as separator and then I post-process the output with LUA
[16:25:38] <jose_ifm> Because I got outputs like this: termcap|||13610|13612||F. Girard
[16:25:48] <jose_ifm> (| is the separator)
[16:26:04] <pombreda> ack. but what are you criteria to filter things out?
[16:27:50] <jose_ifm> The output should go into an embedded device without lot of place... I want to eliminiaate duplicates
[16:29:15] <jose_ifm> Second issue: even if filtering findings, something like this appears:
[16:29:36] <jose_ifm> CHANGELOG||wd Branch HEAD|9531|9534||
[16:29:53] <jose_ifm> No copyright information at all
[16:30:07] <pombreda> ok
[16:30:19] <pombreda> so there is really tow ways to go at it IMHO
[16:31:00] <pombreda> 1. create a proper filtering plugin, though if your goal is keep only thing with something, --only-findings should be working for you
[16:31:42] <pombreda> 2. continue to use you LUA post-processing script, and possibly make it easier for you by writing a new output plugin that uses | separators
[16:32:15] <pombreda> with 2. the current CVS plugin uses the default CSV options in Python
[16:32:15] <pombreda> https://github.com/python/cpython/blob/2.7/Lib/csv.py#L57
[16:32:24] <pombreda> aka the "excel" CSV dialect
[16:32:33] <pombreda> https://github.com/nexB/scancode-toolkit/blob/develop/src/formattedcode/output_csv.py#L81
[16:32:40] <jose_ifm> ok
[16:32:46] <pombreda> e.g. no specific options are provided to the CSV writer
[16:32:56] <jose_ifm> I guess, I will try both
[16:33:15] <pombreda> you could either dupliate the whole CSV plugin OR we could add an option to specific an alternate separator
[16:33:41] <pombreda> but frankly it is likely to be easier and more efficient to do 1. e.g.a filter plugin
[16:33:50] <pombreda> jose_ifm, do you dabble a bit in Python?
[16:33:59] <pombreda> or that's not your cup of tea?
[16:34:12] <jose_ifm> I am an expert of "learning by doing" in python
[16:34:21] <jose_ifm> already written some scripts
[16:34:26] <jose_ifm> :-|
[16:34:55] <jose_ifm> "ask for forgiveness instead of permission"
[16:35:09] <pombreda> +1
[16:35:29] <jose_ifm> BTW, I have to go
[16:35:33] <pombreda> sure
[16:35:50] <jose_ifm> If you want me to report how it did, you can send me an email to jose.camacho@ifm.com
[16:36:10] <pombreda> jose_ifm, the best is likely to use Gitter as it logs things which makes it easy if not online
[16:36:34] <jose_ifm> ok
[16:36:40] <jose_ifm> bye and many thanks!
[16:36:51] <pombreda> so we can chat even though we may be online at the same time
[16:37:01] <pombreda> jose_ifm, bye! and thanks for using scancode