PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 20th of April, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:44:55] <sinclair-linux> hey everyone
[00:45:02] <sinclair-linux> i have a quick question
[00:45:20] <sinclair-linux> where is the most appropriate place to install a mongo database on linux?
[00:45:28] <sinclair-linux> as in what directory is the most common?
[00:45:35] <sinclair-linux> or, does it not matter?
[00:47:10] <rossdm> you can put it wherever you want, default installs usually go to /data/db
[00:48:22] <sinclair-linux> rossdm: as in, the root ?
[00:48:43] <sinclair-linux> rossdm: for example, i have a site running /var/www
[00:49:09] <sinclair-linux> so, i would have the /data directory next to the /var directory in the root right?
[14:21:32] <lacrymology> how do I do a nested-structure search? Like if my documents look like [{ foo: 'foo', bar: { x: 1, y: 2} }, {foo: 'baz', bar: { x:2, y: 2}}, ...], how do I look for the object that foo.bar.x == 1?
[14:22:31] <Baribal> coll.find({foo.bar.x: 1})
[14:22:45] <Baribal> Er, wait.
[14:22:59] <Baribal> Er, wait.
[14:23:28] <lacrymology> Baribal: sorry, it's even worse. bar is a list, and what I want is bar, not the whole document (but I can handle that afterwards, of course)
[14:23:33] <Baribal> coll.find({bar.x: 1}), as bar is not a nested doc in foo.
[14:24:47] <lacrymology> Baribal: like { models: [{ x:1, y:2},{x:2, y:2}, {x:3, y:3}]}
[14:25:23] <kali> try find({"bar.x": 1}, { "bar.$" : 1}), if you use 2.4 or later
[14:25:26] <Baribal> Ah, okay... I'm not sure...
[14:25:45] <lacrymology> I'll check those, thank you
[16:00:03] <fredix> hi
[16:00:26] <fredix> It seems that mongo::fromjson doesn't work anymore
[16:13:25] <fredix> mongo::fromjson failed with int
[16:27:48] <fredix> ouch I find the issue, from::json need a white space before comma if the value is a int -> { "A": 1 , "B": 2 , "C": 3 }
[16:52:49] <kali> waw, that's a robust parser
[17:43:24] <fabio> hello, im trying a seq scan in a 11 Million row table
[17:43:32] <fabio> in postgresql vs mongodb
[17:43:47] <fabio> I thought mongo would won
[17:43:53] <fabio> but it doesn't
[17:44:06] <fabio> how can I scan 11M row faster
[17:44:09] <fabio> by sharding?
[17:44:36] <kali> fabio: mongodb is designed mostly for low latency small queries, not for high bandwidth scanning
[17:45:44] <fabio> kali, im trying to mapreduce an aggregation of this 11M row, in postgresql I have a ps/sql function which costs 4sec
[17:46:01] <fabio> I thought I could reduce that number with mongo
[17:46:18] <fabio> is not a good idea?
[17:46:18] <kali> honestly, i don't think you will
[17:46:22] <kali> no.
[17:46:51] <fabio> then why people is migrating to mongodb to use with metrics and large amount of data
[17:47:13] <kali> large amount of data is not a problem, scanning is
[17:47:33] <fabio> I mean it only works for low latency tiny queries?
[17:47:44] <kali> that's what mongodb is best at
[17:47:52] <fabio> whith no joins and almost few functionality?
[17:48:22] <kali> no joins, yeah, definitely.
[17:48:55] <fabio> why mongodb sells the wonderful mapreduce in its doc
[17:49:12] <fabio> if it is better to do with functions in a relational database
[17:49:13] <richthegeek> I'm writing a system which runs as a daemon and which should init a process when a row is inserted/updated in a collection. The collection will have at most (and usually far fewer) 1000 rows. Is it more performant to poll the collection for new rows every N seconds, or to cap the collection and use a tailable cursor?
[17:49:13] <fabio> ?
[17:49:48] <richthegeek> fabio: if the core of your app is joining (relating) data, then a relational database is what you need
[17:49:51] <kali> fabio: i don't know
[17:50:04] <kali> fabio: but i know that for high latency procesing, nothing beats hadoop
[17:50:21] <kali> fabio: and certainly not mongodb, who is optimised for exactly the opposite
[17:50:26] <richthegeek> fabio: however, for speed, lack of schema (see speed, has other uses though), then a document store like Mongo is best
[17:51:23] <fabio> I still dont understand in which situation mongodb wins
[17:51:44] <richthegeek> the situation in which you need speed over relational capabilities
[17:51:45] <fabio> I thougt it won in bigdata
[17:51:51] <fabio> I need speed
[17:52:09] <fabio> but no in aggregate data
[17:52:21] <fabio> only works in select one document
[17:52:33] <fabio> ?
[17:52:52] <fabio> who can design a database for that case?
[17:53:30] <richthegeek> I'm not sure what you mean by "but no in aggregate data, only works in select one document", can you rephrase?
[17:53:41] <kali> fabio: high load of interactive queries
[17:53:52] <fabio> I need to aggregate data from 11M row
[17:53:58] <kali> fabio: that is, about any read/write web site
[17:54:14] <fabio> i thought in sharding to improve performance
[17:54:20] <fabio> and do the mapreduce
[17:54:24] <fabio> faster than postgresql
[17:54:34] <fabio> but you say is not a good idea
[17:55:03] <fabio> in one server, seq scan drop twice worse than postgresql
[17:57:12] <kali> fabio: anyway, you'd better consider the aggregation framework instead of map/reduce
[17:57:43] <kali> fabio: and yes, sharding should help
[17:59:18] <fabio> yea kali just read here http://stackoverflow.com/questions/12678631/map-reduce-performance-in-mongodb-2-2
[17:59:59] <fabio> but I think is a time-wasting because a seq scan is twice worse than postgresql
[18:02:57] <kali> you plan to run thees scans / aggregations interactively ?
[18:06:28] <fabio> yes
[18:06:39] <fabio> a user now can do a select from a function
[18:06:48] <fabio> wait 4 seconds and have the results
[18:07:17] <fabio> I want to use mongodb an its aggregation framework thinking it would improve that time
[18:07:52] <kali> mmmm
[18:08:38] <kali> if you shard a lot, maybe you'll get something better... it will cost you in hardware
[18:09:00] <kali> can't you pre-aggregate some of the data beforehand ?
[18:09:45] <fabio> nope
[18:09:56] <fabio> what is sharding a lot
[18:10:03] <fabio> more than 4 servers?
[18:11:00] <kali> well, in the theoric case, cutting the dataset in two parts might cut the time in half
[18:11:41] <kali> so you would need to shards to match psql perf, four to divide the time in two, etc
[18:12:45] <kali> *two*, no to
[18:14:57] <fabio> then would be nice
[18:15:23] <fabio> thats what i thoght , by sharding i would reduce the time
[21:16:28] <preinhei_> I'm having some trouble finding my data. Which is a pity.
[21:16:46] <preinhei_> I've got four mongo servers participating in a replica set
[21:17:14] <preinhei_> i'm doing work near one of the slaves, and not all the work I'm writing is available later
[21:17:35] <preinhei_> http://pastie.org/private/0famdfkurqfxpfqsy8kza (using the PHP driver)
[21:19:07] <preinhei_> I guess a key note would be, I'm writing three things under results.Chicago, http (the documented one there), trace (shows in results) and dig
[21:21:18] <preinhei_> I've got a copy of the application running on the same server as the primary, no problems there
[21:22:04] <preinhei_> setting write concern (either to 2, or majority) seems to have made things worse, not better.
[21:23:07] <kali> preinhei_: you're aware you need your php client to be able to talk to the primary ?
[21:23:21] <preinhei_> yes
[21:23:44] <kali> preinhei_: you get some kind of error with some write concern ?
[21:23:56] <preinhei_> the other data elements (like dig and trace) are being written by the same script, on the same server
[21:24:45] <preinhei_> I've not found any error, but when I'm not modifying write concern I'd say 2/3rds+ of the data is available
[21:24:54] <preinhei_> now that I've set it, none of it shows up
[21:27:22] <preinhei_> wrong button, sorry
[22:09:18] <Almindor> anyone here using mongodb on ubuntu via upstart along with some depending service (e.g. node or some other server using mongo)?