pmxbot IRC Log Viewer

[08:07:22] <travis-ci> nexB/aboutcode-toolkit#845 (load_json_issue - 33f95ff : Chin Yeung Li): The build passed.

[08:07:23] <travis-ci> Change view : https://github.com/nexB/aboutcode-toolkit/compare/load_json_issue

[08:07:23] <travis-ci> Build details : https://travis-ci.org/nexB/aboutcode-toolkit/builds/353215298

[08:12:52] <travis-ci> nexB/aboutcode-toolkit#846 (load_json_issue - f3a3a05 : Chin Yeung Li): The build was broken.

[08:12:52] <travis-ci> Change view : https://github.com/nexB/aboutcode-toolkit/compare/33f95ffd0708...f3a3a05b1841

[08:12:53] <travis-ci> Build details : https://travis-ci.org/nexB/aboutcode-toolkit/builds/353216847

[14:46:18] <jose_ifm> hi

[14:49:53] <jose_ifm> I am a new user of ScanCode and have an issue using it. Is this the right way to ask questions about it?

[15:52:23] <pombreda> jose_ifm, hi :)

[15:52:32] <jose_ifm> hi

[15:53:14] <pombreda> you can ask it here, or on a ticket at https://github.com/nexB/scancode-toolkit/issues or there is also a more active Gitter chat channel at https://gitter.im/aboutcode-org/discuss

[15:53:23] <pombreda> jose_ifm, all ways work!

[15:53:46] <pombreda> though this channel is made quite busy from CI and commit notifications FWIW

[15:54:04] <jose_ifm> ok, will try any of them

[15:54:23] <pombreda> jose_ifm, go ahead with your question anyway you like

[15:54:32] <jose_ifm> also here? :O

[15:54:42] <pombreda> sure

[15:54:56] <pombreda> any way works

[15:55:33] <jose_ifm> well, I am running scancode in a linux machine traying to extract licenses information of a pretty big folder containing the linux kernel (portions of it)

[15:56:06] <jose_ifm> Afte some hours running, it finishes the scan and start to save results

[15:56:16] <jose_ifm> but it fails with following message:

[15:56:21] <jose_ifm> ./scancode: line 114: 10444 Killed $SCANCODE_ROOT_DIR/bin/scancode "$@"

[15:56:50] <jose_ifm> Scanning result:

[15:56:52] <jose_ifm> Scan statistics: 172881 files scanned in 17752s. Scan options: licenses with 1 process(es). Scanning speed: 9.74 files per sec. Scanning time: 17750s. Indexing time: 1s.

[15:57:20] <jose_ifm> Extracting the copyrights from the same folder worked perfectly

[15:57:33] <jose_ifm> (and save the results)

[15:58:07] <jose_ifm> Any idea?

[16:04:08] <pombreda> jose_ifm, which version of ScanCode do you run?

[16:04:24] <pombreda> and which command line options did you use?

[16:04:33] <jose_ifm> 2.2.1

[16:05:06] <jose_ifm> --only-findings -l -format

[16:05:24] <jose_ifm> Format is CSV, then source dir and filename

[16:05:31] <pombreda> ok, how much ram and cpu do you have on this box?

[16:05:53] <jose_ifm> 4Gb Ram, 1 CPU. It is running in a Virtual Machine

[16:07:11] <jose_ifm> Is that an issue?

[16:09:18] <pombreda> ok. that's on the low side, but there is a bug in 221 that you are likely hitting

[16:11:09] <jose_ifm> which vesion should I use?

[16:12:11] <jose_ifm> 2.9.b1?

[16:13:00] <pombreda> jose_ifm, one sec, checking something

[16:15:06] <pombreda> jose_ifm, yes, plsae use 2.9.b1 instead.

[16:15:19] <pombreda> Now 17752s is a lot of time

[16:15:21] <jose_ifm> ok

[16:15:38] <pombreda> for instance it takes 20 minutes to scan a linux kernel on my laptop ;)

[16:15:52] <jose_ifm> I know, it is the stupid VM...

[16:15:55] <pombreda> I have 16GB and quad core though

[16:16:08] <jose_ifm> will try to install it and run it in another machine

[16:16:39] <jose_ifm> A different question: I try to generate a CSV output with '|' instead of ',' as field separator

[16:16:45] <pombreda> but using multiprocessing (the --processes/-n speeds it up a lot

[16:17:38] <jose_ifm> nice hint about the processes - will use it for sure!

[16:18:09] <pombreda> why usinga different separator (and BTW the options for output/format have changed in 2.9b1, and you can create plugins for various format and create multiple formats too)

[16:18:33] <pombreda> now you issue above is that you are getting prockilled by the kernel (e.g. using too much ram quite likely)

[16:18:54] <pombreda> *your issue

[16:19:44] <pombreda> with a large scan with 2.2.1, the scan were all in RAM at the end, hence the prockill IMHO

[16:20:01] <pombreda> the latest develop is/should be less memory hungry

[16:20:07] <jose_ifm> reason for the separator: I need to filter the output file, and using the ',' as separator also separates the copyright/liceses information into fields. With '|' as separator, it is ok

[16:20:10] <pombreda> e.g. 2.9b1

[16:20:45] <pombreda> jose_ifm, ok, but there is more than just commas to CSVs

[16:21:22] <pombreda> if you need to do filtering, the 2.9b1 (and develop branch) have a much better way to do this that fiddle with the output directly

[16:21:55] <pombreda> jose_ifm, for instance this https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/plugin_only_findings.py

[16:22:39] <jose_ifm> do you know which docu should I read additionally? I already tried --only-findings and no very successfully

[16:22:49] <pombreda> that the --only-findings code

[16:23:13] <pombreda> there is not much docu yet . It needs to be written :P

[16:23:18] <jose_ifm> :D

[16:23:29] <jose_ifm> Sourcecode docu - best docu

[16:23:40] <jose_ifm> like jenkins

[16:23:46] <pombreda> but in any case a filter plugin derived from the one above should be easier to code IMHO

[16:24:06] <pombreda> well, we need more doc :P code as doc is not great for sure

[16:24:31] <pombreda> jose_ifm, what kind of filtering do you need?

[16:24:42] <jose_ifm> Right now, I created a formatted option that uses that '|' as separator and then I post-process the output with LUA

[16:25:38] <jose_ifm> Because I got outputs like this: termcap|||13610|13612||F. Girard

[16:25:48] <jose_ifm> (| is the separator)

[16:26:04] <pombreda> ack. but what are you criteria to filter things out?

[16:27:50] <jose_ifm> The output should go into an embedded device without lot of place... I want to eliminiaate duplicates

[16:29:15] <jose_ifm> Second issue: even if filtering findings, something like this appears:

[16:29:36] <jose_ifm> CHANGELOG||wd Branch HEAD|9531|9534||

[16:29:53] <jose_ifm> No copyright information at all

[16:30:07] <pombreda> ok

[16:30:19] <pombreda> so there is really tow ways to go at it IMHO

[16:31:00] <pombreda> 1. create a proper filtering plugin, though if your goal is keep only thing with something, --only-findings should be working for you

[16:31:42] <pombreda> 2. continue to use you LUA post-processing script, and possibly make it easier for you by writing a new output plugin that uses | separators

[16:32:15] <pombreda> with 2. the current CVS plugin uses the default CSV options in Python

[16:32:15] <pombreda> https://github.com/python/cpython/blob/2.7/Lib/csv.py#L57

[16:32:24] <pombreda> aka the "excel" CSV dialect

[16:32:33] <pombreda> https://github.com/nexB/scancode-toolkit/blob/develop/src/formattedcode/output_csv.py#L81

[16:32:40] <jose_ifm> ok

[16:32:46] <pombreda> e.g. no specific options are provided to the CSV writer

[16:32:56] <jose_ifm> I guess, I will try both

[16:33:15] <pombreda> you could either dupliate the whole CSV plugin OR we could add an option to specific an alternate separator

[16:33:41] <pombreda> but frankly it is likely to be easier and more efficient to do 1. e.g.a filter plugin

[16:33:50] <pombreda> jose_ifm, do you dabble a bit in Python?

[16:33:59] <pombreda> or that's not your cup of tea?

[16:34:12] <jose_ifm> I am an expert of "learning by doing" in python

[16:34:21] <jose_ifm> already written some scripts

[16:34:26] <jose_ifm> :-|

[16:34:55] <jose_ifm> "ask for forgiveness instead of permission"

[16:35:09] <pombreda> +1

[16:35:29] <jose_ifm> BTW, I have to go

[16:35:33] <pombreda> sure

[16:35:50] <jose_ifm> If you want me to report how it did, you can send me an email to jose.camacho@ifm.com

[16:36:10] <pombreda> jose_ifm, the best is likely to use Gitter as it logs things which makes it easy if not online

[16:36:34] <jose_ifm> ok

[16:36:40] <jose_ifm> bye and many thanks!

[16:36:51] <pombreda> so we can chat even though we may be online at the same time

[16:37:01] <pombreda> jose_ifm, bye! and thanks for using scancode

Log file Viewer

Help | Karma | Search:

#aboutcode logs for Wednesday the 14th of March, 2018