PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 28th of April, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:23:53] <GothAlice> T_T Today was the first day I've ever incinerated security tokens within minutes of receiving the package. All because of this one conditional: if (!pw1.isValidated() && pw1_modes[PW1_MODE_NO81]) ISOException.throwIt(…)
[00:25:34] <GothAlice> My kingdom for a set of parenthesis.
[01:10:22] <bros> How can I query for only the length of an array in a subdocument, not the actual contents?
[01:11:21] <GothAlice> bros: http://docs.mongodb.org/manual/reference/operator/query/size/
[01:12:32] <bros> GothAlice, I think that would query against the size.
[01:12:47] <bros> I want to query for the size.
[01:12:53] <GothAlice> …
[01:13:14] <GothAlice> Two methods: map/reduce, or an aggregation pipeline involving $unwind and $group.
[01:13:14] <bros> In SQL, it would be SELECT COUNT(*) FROM table
[01:13:25] <GothAlice> Well, no, it wouldn't really be that in SQL. SQL has no comparison.
[01:13:33] <bros> Are you serious?...
[01:13:38] <bros> How intensive of commands are these?
[01:13:48] <GothAlice> MongoDB does have a count operation… it counts records. Not nested values.
[01:13:52] <bros> My system is already getting locked up at 100% usage and 3s queries
[01:13:59] <GothAlice> $group with a $sum: 1 would count the nested values.
[01:14:09] <GothAlice> (After an $unwind.)
[01:14:18] <GothAlice> Reference: https://jira.mongodb.org/browse/SERVER-4899
[01:14:21] <bros> Is this going to be faster than just for-looping the data myself?
[01:14:41] <GothAlice> Huh.
[01:17:43] <GothAlice> Wow, the documentation needs a swift kick in the pants.
[01:18:22] <GothAlice> bros: http://showterm.io/5e69a645f147ada1c43c3
[01:19:00] <bros> Thank you GothAlice.
[01:19:19] <bros> What does unwind do?
[01:23:09] <GothAlice> Unwind emits one document per array element to the next pipeline stage, duplicating the other fields in the document.
[01:23:28] <GothAlice> I.e. {foo: 1, bar: [2, 3]} -> $unwind on bar -> {foo: 1, bar: 2}, {foo: 1, bar: 3}
[01:26:27] <bros> What do I do if I want to query certain fields of the documents I also want to aggregate?
[01:27:55] <GothAlice> bros: http://docs.mongodb.org/manual/core/aggregation-introduction/ < the documentation is quite in-depth, and there are numerous tutorials available on Google.
[01:28:14] <GothAlice> (In the case of querying fields, that's a $match pipeline stage.)
[01:32:23] <bros> I think the logic I want to achieve is too complicated to fit into a match/group/project sort of aggregation.
[01:32:24] <bros> for all orders matching X stores within the time span of Y and Z,
[01:32:24] <bros> loop through all orders, according which user the order, its scans, misscans, and time elapsed belong to
[01:32:24] <bros> if the order belongs to a batch, divide the time elapsed by the number of orders in the batch
[01:38:03] <GothAlice> Step 1: get it done using plain application logic. Step 2: benchmark it and find out where it's slow as a dog. Step 3: optimize the slowest bits. If attempting to refactor into a single monolithic map/reduce or aggregate, you have a "known good" process to compare against.
[01:40:31] <bros> I got it done in plain application logic. It falls apart in the fact that it takes 3 seconds over 200 records.
[02:11:13] <JonGorrono> Anyone know a way to disable auto-creation of db' s in mongoshell? .... erytime I mistype a use <db name>... erytime
[02:14:51] <joannac> JonGorrono: "use abc" does not create a database for me
[02:29:33] <JonGorrono> I guess not I thought it was there since it gets dropped :
[02:29:36] <JonGorrono> > show databases
[02:29:36] <JonGorrono> admin (empty)
[02:29:36] <JonGorrono> cmsapi 3.952GB
[02:29:36] <JonGorrono> courseinfo 1.953GB
[02:29:36] <JonGorrono> deltas_cmsapi 0.078GB
[02:29:36] <JonGorrono> lms_sis_api 0.078GB
[02:29:37] <JonGorrono> local 0.078GB
[02:29:37] <JonGorrono> > db.dropDatabase()
[02:29:38] <JonGorrono> { "dropped" : "whatevs", "ok" : 1 }
[02:29:41] <JonGorrono> ...sory
[02:29:54] <JonGorrono> tmi
[02:30:11] <JonGorrono> fatfinger mofo
[03:54:24] <dunkel2> hello
[03:57:27] <dunkel2> is it possible to sort by a field string with -number suffix? lie mystring-1 mystring-2 mystring-10
[05:10:07] <dreamdust> People say skip() is not performant because mongo has to walk the whole collection… but isn't that true of any find query ? What I mean is, is I have a find() query with some constraint based on an indexed field, is adding skip() to it any *less* performant?
[05:12:20] <dreamdust> IE find({ some: 'indexedField'}) and find({ some: 'indexedField'}).skip(0) should have the same performance, correct?
[06:02:27] <morenoh149> dreamdust: I'd imagine
[06:06:26] <morenoh149> dreamdust: you should test it but I'd image first it does the filtering then the skipping. Wouldn't make much sense if it worked otherwise, unless mongodb intends for you to not use it that way
[06:06:41] <morenoh149> this is pretty in depth http://docs.mongodb.org/manual/reference/method/cursor.skip/
[10:03:37] <Johnade> hi everyone great to see a channel about MongoDB :)
[10:07:25] <Johnade> i'm using nodejs with express, i'm trying to render an image (jpg) from db to my browser
[10:07:29] <Johnade> but i'm failing
[10:23:03] <Johnade> someone is using mongodb with nodejs right here ?
[11:29:48] <newbsduser> spliting data and using 4 mognodb instance on the same machine VS using all data with single mongodb instance on the same machine
[11:29:51] <newbsduser> which one is correct way?
[11:45:29] <m4th> Hi there
[11:48:56] <Johnade> hi :)
[11:49:06] <Johnade> there is a lot of people and not so much of them talkiing
[12:05:07] <deathanchor> most of us are happy with our mongo usage :)
[12:08:34] <Johnade> you're the proof of late messages lol
[12:08:49] <Johnade> but yes it's cool to be happy with mongo
[12:16:42] <m4th> does anyone know if there is a clean way of doing a tailf-like on a collection that is not capped (i.e tailable cursors are not an option) ?
[12:17:04] <deathanchor> yeah
[12:17:13] <m4th> is there an API like a replication one, that could inform a client that the collection has been updated
[12:17:17] <deathanchor> .sort({ $natural : -1 }).limit(20)
[12:17:51] <m4th> deathanchor: I won't loose entries with that ? If there are a lot of insertions between my queries ?
[12:17:53] <deathanchor> try $natural, not sure if it guarantees latest entries
[12:19:22] <deathanchor> hmm.. looks like if you want guarantees, you need to add a date/time stamp and index by that so you can get a quick lookup
[12:19:40] <deathanchor> someone else might know a better way
[12:20:07] <m4th> deathanchor: thanks anyway for the suggestion
[12:20:08] <StephenLynx> better way for what?
[12:20:18] <m4th> StephenLynx: tailing a non capped collection
[12:20:37] <StephenLynx> what do you mean by that?
[12:20:43] <m4th> in order to be notified for new data inserted
[12:20:50] <StephenLynx> hm
[12:21:00] <m4th> and do some work with that outside of mongo
[12:21:08] <cheeser> if you use ObjectId, you can use the time component for ordering.
[12:21:34] <m4th> I don't have a timestamp on every lines
[12:22:03] <m4th> (line/object inserted)
[12:22:09] <deathanchor> yeah, but I don't think you can get cursor that will just keep polling for new entries.
[12:22:12] <StephenLynx> why do you need it to be non capped?
[12:22:40] <StephenLynx> because I just read you can make a tail for capped collections, and capped collections will make room for new documents when needed.
[12:22:50] <m4th> StephenLynx: I have disk space, I want the collection to store as much as data as possible, and I can lvresize if ever I need it
[12:23:05] <m4th> StephenLynx: I read that already
[12:23:20] <StephenLynx> sec, gonna search what is lvresize
[12:23:29] <m4th> StephenLynx: oh don't...
[12:24:12] <StephenLynx> hm, it seems it isn't something directly related to mongo, but to filesystems.
[12:24:39] <StephenLynx> ok, so you want to be able to use this to resize the volume which holds your database?
[12:24:40] <m4th> StephenLynx: just take my words, I don't work with capped collection
[12:25:20] <m4th> StephenLynx: forget the fs part, it's off topic...
[12:25:39] <StephenLynx> well, it seems its the only reason you need a non-capped collection.
[12:25:48] <m4th> nevermind.
[12:27:35] <m4th> cheeser: you spoke of the time component that's used to build the ObjectId I guess (didn't spot that at first sight)
[12:34:50] <cheeser> http://docs.mongodb.org/manual/reference/object-id/
[12:35:13] <m4th> cheeser: thx, yep already read that (hence my last msg)
[12:35:34] <cheeser> ah, ok. just wanted to be sure. ;)
[12:43:41] <deathanchor> but that's only doc creation date really
[12:44:35] <Johnade> People know how to store image with nodeJS
[12:47:26] <StephenLynx> what
[12:48:09] <Johnade> i'm trying to store picture in mongoDB with nodeJS
[12:49:32] <StephenLynx> 1: I suggest upgrading to io.js. 2: I have done it with gridfs, will link in a moment
[12:50:01] <Johnade> why ui.js more than node.js ? :) Thanks for your answers StephenLynx
[12:50:15] <StephenLynx> because it is everything node.js but better.
[12:50:20] <StephenLynx> faster, more stable
[12:50:22] <jokke> hey
[12:50:22] <Johnade> ahah really ?
[12:50:38] <jokke> how do i remove the nth element of an array field?
[12:51:04] <StephenLynx> https://gitlab.com/mrseth/lynxchan/blob/master/src/be/operations.js#L237
[12:51:49] <StephenLynx> https://gitlab.com/mrseth/lynxchan/blob/master/src/be/db.js
[12:51:53] <StephenLynx> to be more specific
[12:52:10] <jokke> i saw that $pull needs a query to match against
[12:52:34] <jokke> but i'd like something like $pull: 'foo.4'
[12:52:39] <Johnade> StephenLynx: thanks mate, i'm looking at it
[12:53:19] <StephenLynx> https://gitlab.com/mrseth/lynxchan/blob/master/src/be/operations.js#L42
[12:53:20] <StephenLynx> this
[12:53:27] <StephenLynx> is actually I do the writing.
[12:53:44] <StephenLynx> I would write the image buffer directly too.
[12:53:48] <StephenLynx> could*
[12:53:56] <jokke> anyone?
[12:54:05] <StephenLynx> but because I needed it on disk for resizing with imagemagick, I changed to writing from disk to gridfs.
[12:55:00] <jokke> or should i do something like $pull: { 'foo.$' => { position: 4 } } ?
[12:55:02] <Johnade> ok for the saving, but how render it with node or io ? When i tried to rendre,i have the buffer only [32,32,34,154 etc...]
[12:55:47] <jokke> ah $position i mean
[12:56:32] <StephenLynx> sec, let me get it
[12:57:55] <StephenLynx> https://gitlab.com/mrseth/lynxchan/blob/master/src/boot.js#L279 Johnade
[12:58:51] <Johnade> the gs.open function show the image on browser ?
[12:59:04] <StephenLynx> no
[12:59:10] <StephenLynx> it opens the file.
[12:59:13] <StephenLynx> then you stream it.
[12:59:16] <StephenLynx> or w/e
[12:59:32] <StephenLynx> read the docs and learn its basics, is not complicated.
[12:59:50] <StephenLynx> but it won't automagically put it on a browser either.
[13:00:09] <Johnade> yes thanks, but i dont get the "stream" part
[13:01:55] <StephenLynx> https://github.com/substack/stream-handbook
[13:02:18] <StephenLynx> instead of reading, storing to RAM then writing to the response, you just stream it directly.
[13:02:31] <Johnade> ohhhhh okay
[13:02:33] <Johnade> great
[13:02:36] <Johnade> thanks you StephenLynx
[13:02:39] <StephenLynx> np
[13:02:49] <Johnade> i just saw that io.js has ecmascript 6 enable
[13:04:11] <Johnade> StephenLynx: i'm making a craiglist kind of website, do you think that mongoDB is a good choice for this
[13:04:11] <Johnade> ?
[13:04:24] <StephenLynx> yes.
[13:04:34] <StephenLynx> because first it revolves around the software.
[13:04:43] <StephenLynx> second it doesn't have too many relations.
[13:05:00] <StephenLynx> you will have to deal with some limitations due to relations, though.
[13:05:04] <StephenLynx> but nothing too severe, IMO.
[13:05:21] <Johnade> i hope so
[13:05:38] <Johnade> which framework are you using on client side ? or advice me ?
[13:06:07] <StephenLynx> node.
[13:06:09] <StephenLynx> none*
[13:06:16] <StephenLynx> web frameworks are useless bloat.
[13:06:42] <StephenLynx> both fe and be.
[13:06:53] <pamp> Which is the best linux distribution for MongoDB
[13:06:58] <pamp> ubuntu centos?
[13:07:09] <StephenLynx> I don't think there would be a specific one for running mongo.
[13:07:21] <StephenLynx> but certainly a server-focused distro would be better.
[13:07:30] <StephenLynx> centos, RHEL
[13:07:46] <Johnade> StephenLynx: mhh so pure javascript html 5 ccs5 ?
[13:07:53] <Johnade> i would like to avoid php in fact ..
[13:07:56] <StephenLynx> yeah, that is what I would do.
[13:08:17] <StephenLynx> serve static files for the front-end and pure data on the back-end.
[13:08:30] <StephenLynx> you could generate html on the back-end if you want to support non-js users.
[13:08:38] <StephenLynx> without php.
[13:09:32] <Johnade> using Jade view engine for the html ?
[13:09:34] <Johnade> for example
[13:10:22] <StephenLynx> yes, for example. but I would have look at it and try to do it by myself.
[13:10:57] <StephenLynx> it doesn't seem to be too complex to need a dependency.
[13:11:26] <Johnade> yes i used it before it's cool. but i will to work with a lot of json, but it will be simple, no inscription at all, just posting and reading list of ad
[13:11:44] <Johnade> but i would like a auto refresh when there are news ad posted
[13:12:34] <StephenLynx> hm
[13:12:42] <StephenLynx> I am not very experience with web fe.
[13:12:45] <StephenLynx> so I dunno.
[13:12:51] <Johnade> okay StephenLynx thanks :)
[13:12:54] <StephenLynx> I would just ignore non-js users.
[13:13:10] <Johnade> i'm scared to have incompatibility with io.js
[13:13:14] <Johnade> and express and stuff
[13:19:35] <StephenLynx> express is crap anyway.
[13:20:31] <Johnade> oh, ahha tell me, what do you use with io.js
[13:20:36] <Johnade> interesting
[13:25:36] <mrmccrac> anyone know about a known bug where listCollections just hangs for minutes?
[13:25:44] <mrmccrac> but other queries execute quickly
[13:28:45] <StephenLynx> Johnade told you, I use nothing.
[13:29:11] <StephenLynx> that boot.js file is my starting point.
[13:29:19] <StephenLynx> for the application.
[13:29:20] <Johnade> StephenLynx: okay, but i dont have your knowledge :(
[13:29:36] <StephenLynx> from all problems in software development, that one is the easiest one to solve.
[13:29:53] <Johnade> true
[13:31:43] <mrmccrac> "show collections" took 5 minutes to return.
[13:31:51] <StephenLynx> daium
[13:31:57] <StephenLynx> how many collections you got?
[13:32:12] <mrmccrac> uh like 6 i think?
[13:32:21] <StephenLynx> thats fucked up, then.
[13:32:26] <mrmccrac> yup
[13:32:35] <mrmccrac> like i said i can execute find queries very quickly while this is happening
[13:33:21] <StephenLynx> even weirder.
[13:33:49] <mrmccrac> running 3.0.1 w/ wiredtiger
[13:37:00] <StephenLynx> afaik, wiredtiger is not the default db and several people wouldn't recommend it for production.
[13:37:24] <StephenLynx> even tough 10gen uses it for production.
[13:48:37] <bros> How can I aggregate over multiple subdocuments in one document?
[13:49:19] <bros> I have a collection called orders with subdocuments called: log, scans, misscans, and shipments. I want to extract the size of each of these subdocuments over a set of data that's about ~300 entries long without overloading my servers.
[13:50:13] <StephenLynx> $group operator, bros.
[13:50:41] <StephenLynx> use a fixed _id for the outputted document.
[13:55:06] <jigax> hi everyone. was wondering if someone could kindly help me with this gist https://gist.github.com/fpena06/206d67acf4a5b4d3cd20
[13:57:46] <griffindy> does anyone here have experience running mongo with many many collections in one db? i'm talking about hundreds of thousands
[13:58:03] <StephenLynx> GothAlice how come I never saw OVH on your list? They have great prices from what I noticed both for small stuff https://www.ovh.com/us/vps/vps-classic.xml and dedicated are pretty great too.
[13:58:17] <StephenLynx> griffindy you have dynamic collection creation?
[13:58:37] <griffindy> that would happen, yes
[13:58:38] <GothAlice> StephenLynx: My list wasn't measuring VM prices.
[13:58:53] <griffindy> i haven't implemented anything yet, trying to gather information first
[13:58:54] <StephenLynx> wasnt it measuring cost/benefit?
[13:58:59] <GothAlice> Nope.
[13:59:04] <StephenLynx> what was it measuring?
[13:59:07] <GothAlice> Bulk storage cost in TB/month.
[13:59:15] <GothAlice> Literally just dumb file storage. ;)
[13:59:17] <StephenLynx> so, cost / benefit?
[13:59:18] <greyTEO> jigax, what help are you looking for? What is the problem?
[13:59:28] <GothAlice> StephenLynx: A VM bulk storage does not make.
[13:59:52] <StephenLynx> griffindy having hundreds of thousands is only possible if you have dynamic collection creation, which is a HUGE no no.
[14:00:15] <jigax> trying to run a async.waterfall to query some data then update it. but when i save it i dont see the upate.
[14:00:46] <griffindy> StephenLynx out of curiosity, what's wrong with dynamic collection creation (not trolling, I just haven't found anything online)
[14:01:00] <StephenLynx> its impossible to maintain.
[14:01:06] <GothAlice> … not if well engineered.
[14:01:16] <GothAlice> Rolling pre-aggregated collections is a useful thing.
[14:01:29] <jigax> greyTEO: trying to run a async.waterfall to query some data then update it. but when i save it i dont see the upate.
[14:01:36] <griffindy> maintain because mongo cannot keep up with creation? or there's some sort of garbage collection that's expensive?
[14:01:39] <StephenLynx> a collection fro pre-aggregated data, you mean?
[14:02:24] <GothAlice> Well, dividing up the pre-aggregated data. I.e. by client.
[14:02:55] <griffindy> StephenLynx what makes it impossible to maintain?
[14:03:39] <StephenLynx> because it is much harder to query, because it would require joins to get the whole data.
[14:03:49] <greyTEO> jigax, it is hard to say what could be wrong. are you sure customer lines is not empty?
[14:04:03] <griffindy> what if the data in different collections is not related?
[14:04:15] <StephenLynx> they all follow a single purpose.
[14:04:29] <StephenLynx> but you are splitting them like they don't.
[14:04:35] <StephenLynx> its counter intuitive.
[14:04:41] <StephenLynx> and impossible to document.
[14:04:41] <jigax> greyTEO: no they are not. i actually did a console log after each update to see the lines and update was made
[14:09:03] <griffindy> StephenLynx one of my colleagues was under the impression it would offer better performance for indices fitting in memory
[14:09:52] <griffindy> instead of one giant index, parts of which could get paged out of memory
[14:10:04] <griffindy> the docs also say "Generally, having a large number of collections has no significant performance penalty and results in very good performance. Distinct collections are very important for high-throughput batch processing."
[14:13:23] <mrmccrac> maybe but if you're having to constantly query from all these multiple collections wont it be the same net result?
[14:13:32] <mrmccrac> as one giant collection
[14:15:18] <Johnade> i'm using mongoose, and i can do query :p
[14:15:20] <Johnade> simple for me
[14:15:50] <griffindy> i think my colleague's thoughts are if one collection is much hotter than the others, it would be easier to hold just that one index in memory, rather than a subset of the entire index, if that makes sense
[14:16:13] <griffindy> I'm also not 100% certain how mongo behaves with its indices when they can't fit in memory
[14:16:35] <GothAlice> StephenLynx: No joins needed if you aren't silly about it (i.e. client segregation means one _doesn't_ want data pollution) and it's automatically handled by the app; a new client signs up, it populates the indexes in a new collection.
[14:16:54] <GothAlice> A little "automation" goes a long way. ;)
[14:17:59] <griffindy> although at the end of the day, there seems like a very hard cap of ~3m namespaces
[14:19:39] <GothAlice> griffindy: 3.338 million, aye.
[14:19:58] <GothAlice> Though because most collections come with an _id index by default, that doesn't equate to 3.338 million collections.
[14:22:19] <griffindy> my understanding is that a collection counts as a namespace, and each index on that collection is also a namespace
[14:23:02] <GothAlice> Aye.
[14:32:17] <bros> How can I select the first element of a subdocument in an aggregation?
[14:37:01] <jmeister> hey
[14:37:34] <jmeister> is there a way to query the date/time from a mongodb server? Just to use it as a sort of a centralized time server
[14:38:15] <mrmccrac> mongos> Date()
[14:38:15] <mrmccrac> Tue Apr 28 2015 14:37:11 GMT+0000 (UTC)
[14:38:57] <mrmccrac> not sure why you would use mongo for this though
[14:39:31] <jmeister> @mrmccrac we're running several instances on AWS and they all have ~2-5 seconds difference between them
[14:39:58] <mrmccrac> if you want to sync their clocks you might want to use NTP ?
[14:40:17] <jmeister> we considered that, but we're looking for a quick patch for the time being
[14:40:36] <jmeister> thanks for the quick answer btw
[14:41:49] <mrmccrac> $ mongo --eval "Date()"
[14:41:49] <mrmccrac> MongoDB shell version: 3.0.1
[14:41:49] <mrmccrac> connecting to: test
[14:41:49] <mrmccrac> Tue Apr 28 2015 14:40:37 GMT+0000 (UTC)
[14:42:41] <jigax> guys i've honestly been googling all night to try and figure this one out and can't find much. I have a async waterfall which queries a line price and is supposed to update the line price. I can see the change in the first asymc call when doing a console.log, however when saving the document in the second call the document is being saved without the price that was previously updated.
[14:44:21] <jmeister> mrmccrac thanks a lot :)
[14:50:06] <newbsduser> where can i find cortisol's compiled binary versions?
[14:51:15] <snowcode> is there a way to pick only certain properties during an aggregate call?
[14:51:27] <deathanchor> snowcode: $project
[14:52:17] <jr3> is there a convention to follow on schemas? like I prefer to fully qualify an id so: new CarSchema({ carId: Number }) vs new CarSchema({ id: Number})
[14:52:46] <snowcode> great thank you deathanchor
[14:54:20] <snowcode> excellent ^_^
[15:06:58] <jigax> can someone please help me figure out why this gist ins't working as expected https://gist.github.com/fpena06/206d67acf4a5b4d3cd20 thanks
[15:10:17] <snowcode> Is possible to use an expression inside the $group aggregate operation. Let me explain. I've a group operator which count a set of field by day (number of events per day): { $group: { _id: { $dayOfYear: "$time"}, eventscount: { $sum: 1 } }} Now I would to get a detailed info: number of events in this day with a property x > value and number of events in this day with a property y > value2
[15:10:24] <snowcode> is this possible via query
[15:10:32] <snowcode> or should I do it via code?
[15:18:16] <snowcode> I've tried with ...prop : { $sum : {magnitude : {$gte : 2} }}
[15:18:20] <snowcode> but it say
[15:18:30] <snowcode> "Expression $gte takes exactly 2 arguments. 1 were passed in" .... ?? two args?
[15:27:27] <snowcode> yo solved
[15:27:32] <snowcode> :d
[15:45:45] <djoanes> Hey, how can i create a converter in spring to save a particular field ina document as a java serialized object
[15:47:06] <StephenLynx> that is a spring question.
[15:51:27] <djoanes> fair enough
[16:07:06] <bros> Any criticisms on the aggregation/use? https://gist.github.com/brandonros/97f5ebba0be2667e5345
[16:08:38] <StephenLynx> a regular match and project, from what I see.
[16:08:49] <StephenLynx> "log.0.time"
[16:08:51] <StephenLynx> 0?
[16:08:59] <StephenLynx> is log an array?
[16:28:38] <tpayne> how can i see the number of mongo connections I have running?
[16:29:08] <GothAlice> tpayne: http://docs.mongodb.org/manual/reference/command/connPoolStats/
[16:29:24] <tpayne> nice!
[16:29:27] <GothAlice> :)
[16:30:06] <GothAlice> There are some notes regarding requirements to use that function, though.
[16:31:03] <GothAlice> For use outside of a sharded cluster, see: http://docs.mongodb.org/manual/reference/command/serverStatus/#globallock-activeclients
[16:31:11] <GothAlice> (It's just less fine-grained about what it returns, there.)
[16:31:19] <GothAlice> tpayne: ^
[16:31:46] <tpayne> ok
[16:32:25] <tpayne> GothAlice: sorry but what do i actually type?
[16:32:53] <tpayne> typing just connPoolStats is undefined
[16:33:00] <GothAlice> db.runCommand( { serverStatus: 1 } )
[16:33:10] <GothAlice> db.runCommand( { conPoolStats: 1 } )
[16:33:11] <GothAlice> Etc.
[16:33:15] <tpayne> ahh thanks!
[16:33:48] <tpayne> damn, i'm not authorized
[16:33:55] <GothAlice> That could be a problem. ^_^
[16:34:21] <tpayne> do i need to be admin or something?
[16:34:32] <tpayne> i've auth'd with my credentials that allows me to do CRUD
[16:34:37] <tpayne> but i guess it's not everything
[16:34:58] <GothAlice> Hmm; the docs aren't explicit on which perm is needed. Let me dig around my production cluster.
[16:35:44] <tpayne> thanks
[16:36:09] <GothAlice> For conPoolStats you'll need clusterMonitor, clusterManager, clusterAdmin, or root. These are the ones I checked for that command.
[16:37:03] <tpayne> isn't there a default user that has 100% access to everything?
[16:37:10] <GothAlice> clusterMonitor and clusterAdmin also give access to serverStatus.
[16:37:14] <GothAlice> Nope.
[16:37:30] <GothAlice> The first user you add should be an "admin" (i.e. "root" role), but it isn't actually enforced, I don't think.
[16:37:47] <fontanon> Hi everybody, I'm migrating from a single-node mongodb to a mongodb-cluster. To migrate data, I added temporarily the single-node as a shard, so the sharded collections started to drain data to the cluster, but ... what about the not-sharded collections? Once I remove the old-single-node shard would the not-sharded collection be copied to the rest of the shards in my cluster?
[16:38:37] <tpayne> i'm going to view all users
[16:38:39] <tpayne> see what i got
[16:38:44] <tpayne> gut i'm going to do this on my dev box ha
[16:38:54] <GothAlice> ^_^
[16:39:18] <GothAlice> tpayne: http://docs.mongodb.org/manual/reference/built-in-roles/ should be of use.
[16:44:38] <tpayne> shit i can't even figure out how to view all the users i've created
[16:54:41] <fontanon> Hi everybody, given a mongodb cluster, how can I force to move not-sharded collections from one shard to another ?
[16:59:38] <fxmulder> Tue Apr 28 09:38:14.221 [rsSync] Socket flush send() errno:9 Bad file descriptor
[16:59:59] <fxmulder> this is now the second time this has happened and my week-long replica cloning is starting over for the second time now
[17:01:00] <fxmulder> I'm going to have to rsync this, if there is any part of mongodb that needs to be looked at it is replica clong, this is horrible
[17:04:12] <daidoji> hello, can I add space to the mongo data directory dynamically?
[17:04:38] <StephenLynx> I don't think it has any hard caps by default.
[17:05:23] <cheeser> daidoji: come again?
[17:06:32] <daidoji> cheeser: sorry, my dbpath is maxed out of space on that drive
[17:06:56] <daidoji> is there a way to add extra space to dbpath without shutting down mongod?
[17:08:04] <cheeser> if that directory is on an LVM, maybe.
[17:08:56] <daidoji> cheeser: its on aws
[17:09:28] <cheeser> replSet?
[17:09:41] <daidoji> cheeser: what's that?
[17:09:47] <cheeser> replica set
[17:10:00] <daidoji> this one http://docs.mongodb.org/manual/tutorial/deploy-replica-set/?
[17:10:03] <cheeser> or sharded would work better.
[17:10:03] <daidoji> oh okay, I'll read about this
[17:10:18] <cheeser> well, at this point you're probably stuck bouncing mongod
[17:10:20] <daidoji> cheeser: hmm, does that mean I have to run multiple instances instead of just one?
[17:10:25] <daidoji> cheeser: ahhh really?
[17:10:35] <cheeser> well, the OS is going to need to restart...
[17:11:45] <daidoji> cheeser: really? I thought I would just shutdown mongod, mount a snapshot (with expanded space) to the dbpath, start mongod
[17:12:01] <daidoji> would I have to do that if I were doing switching to replSet or Sharded?
[17:13:34] <cheeser> that might be possible on AWS. i dunno. but it still involves bouncing mongod
[17:13:57] <cheeser> if you were sharded you could add a new shard then drain and remove this one.
[17:14:01] <daidoji> cheeser: awww too bad, thats what I thought too.
[17:14:06] <daidoji> cheeser: roger
[17:14:59] <daidoji> thanks
[17:15:52] <cheeser> np
[17:15:54] <cheeser> good luck
[18:10:23] <ah-> hi
[18:10:35] <ah-> is _id somehow special in the mongo-c-driver?
[18:10:54] <ah-> i do a _id $in [something] query, and it doesn't return anything
[18:11:08] <ah-> whereas if i have a field foo_id with exactly the same contents it works
[18:11:15] <ah-> which confuses me quite a bit
[18:12:00] <cheeser> i don't see why it would be
[18:14:17] <ah-> my _id is a binary field, not an objectid, could that change things?
[18:14:41] <StephenLynx> I think you messed up something.
[18:15:17] <ah-> like what?
[18:15:47] <StephenLynx> well, how can you have a binary value for your document id? then you would just hold two documents in the collection?
[18:16:00] <StephenLynx> why did you changed that in the first place?
[18:17:02] <ah-> it's a binary string, not just a bool
[18:17:31] <ah-> and it's very convenient, since my primary key/identifier for that collection is that binary string, which doesn't fit into an objectid
[18:17:34] <cheeser> "binary string?"
[18:17:44] <cheeser> a byte[]?
[18:17:46] <ah-> yes
[18:20:07] <ah-> it's just a sha hash, and i'm pretty sure it worked in previous experiments with python
[18:21:59] <StephenLynx> I find _id to be such a pain in the ass that I use another field to make my unique indexes.
[18:22:11] <StephenLynx> it is so much out of the norm
[18:22:32] <StephenLynx> even when you are projecting it has special rules.
[18:22:45] <StephenLynx> IMO it wasn't very well designed, this whole _id system.
[19:43:30] <christo_m> design question: i have users, and many queues belong to a user, and many queue items belong to a queue. sholud these all be nested subdocuments?
[19:44:03] <cheeser> i wouldn't
[19:44:20] <christo_m> cheeser: what do you recommend here
[19:44:36] <cheeser> are queues shared between users?
[19:44:45] <christo_m> cheeser: no
[19:44:53] <christo_m> but ill be aggregating peoples queues based on who's followed etc
[19:44:57] <cheeser> how many queues? does the number change much?
[19:45:19] <christo_m> currently there are two types of queues.. so the number per user shouldn't change (right now)
[19:45:41] <cheeser> well, queue items should definitely be in their own collection
[19:45:53] <cheeser> you might get away with embedded queues under users.
[19:46:07] <christo_m> how do you tie them together?
[19:46:23] <christo_m> do you keep user_id in queue, and queue_id in queue items?
[19:48:04] <christo_m> cheeser: from my understanding you should embed your child objects within parent objects as much as possible
[19:48:15] <GothAlice> It really depends.
[19:48:31] <GothAlice> http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html goes into a few of the factors involved in choosing when to embed.
[19:50:21] <GothAlice> From one app I wrote, MongoDB-powered forums, embedding replies to a thread in a thread? Great idea. Embedding replies in threads in forums? Terrible idea.
[19:50:41] <christo_m> GothAlice: is that because replies will be changing more?
[19:50:44] <christo_m> i mean being added frequently etc
[19:50:57] <christo_m> cheeser: is that why you said queue items should be their own collection?
[19:51:29] <GothAlice> It's a combination of factors: when looking at one thread, the user could care less about the others, so you'd need to project them out. It's also nigh-on impossible to update beyond a single nesting without losing your sanity.
[19:52:16] <christo_m> GothAlice: say i do split these collections up
[19:52:17] <cheeser> christo_m: more or less, yes.
[19:52:24] <christo_m> whats the method in joining them together after? is that what mapreduce is about
[19:52:30] <cheeser> like GothAlice said, it's a bit of a nuanced decision
[19:52:33] <GothAlice> And it'd also restrict the amount of content available per forum, not per thread. (I.e. 16MB threads = ~3.3 million words per thread. 16MB forums? 3.3 million words in _all threads in that forum_.) There are ways around this, i.e. having "continuation" documents, but that adds a lot of complexity to the app.
[19:54:19] <christo_m> gonna read this: http://blog.mongodb.org/post/87200945828/6-rules-of-thumb-for-mongodb-schema-design-part-1
[19:54:29] <christo_m> seems to explain that the 1 to many problem needs to be thought about a bit more in depth
[19:54:34] <christo_m> namely how "many"
[20:08:30] <dreamdust> Is the performance of find({ some: 'indexedField'}) and find({ some: 'indexedField' }).sort(0) equivalent ?
[20:09:39] <dreamdust> bleh I mean skip(0)
[20:09:40] <dreamdust> not sort
[20:11:38] <GothAlice> I would hope .skip(0) would be elided away, client-side.
[20:12:12] <GothAlice> I.e. a new queryset instance would start with a skip of zero, and changing it to zero should have no impact.
[20:17:40] <dreamdust> okay that's what I assumed
[21:41:43] <drags> anything recommended for visualizing shard layout besides shard-viz? browsed through mongodb-tools.com and been searching, but not coming up with much
[21:58:46] <jaybehash> hi
[22:13:32] <greyTEO> GothAlice, you mentioned you replaced redis(and/or)memcache with mongo. Is it as simple as creating a collection for calculated data? Same as memcache?
[22:13:47] <greyTEO> What approach did you take to purge it?
[22:13:53] <greyTEO> or update it?
[22:14:25] <GothAlice> greyTEO: Depends. If you want to replicate a redis queue, use a capped collection. If you want to replicate a memcache auto-expiring key-value store, use TTL index (with zero delay) on a date field you set to the expiry time.
[22:15:30] <greyTEO> I am wanting to avoid introducing a new system that I have to manage. I think I can do it mongo.
[22:17:52] <GothAlice> greyTEO: As a note, MongoDB TTL indexes are culled once a minute, and if a pass doesn't complete in time it won't clean up all expired data all at once.
[22:18:31] <GothAlice> greyTEO: Ref. my own cache implementation's "getter" which needs to handle the TTL edge case: https://github.com/marrow/cache/blob/develop/marrow/cache/model.py#L115-L133
[22:19:31] <GothAlice> Wait; that's not the right method.
[22:19:44] <GothAlice> Never mind, it is!
[22:20:10] <greyTEO> I dont necessarily care that it doesn't expire. I will have to purge the queue on events.
[22:20:16] <coucou77> hello!
[22:20:39] <greyTEO> so my expire time will be huge, say 2 days
[22:20:48] <greyTEO> (huge for cache)
[22:21:12] <coucou77> I sent a request in stackoverflow regarding mongodb, if anyone has any clue, thanks in advance!
[22:21:12] <coucou77> http://stackoverflow.com/questions/29929338/mongodb-performing-a-group-on-mapreduce-output-inline
[22:23:26] <greyTEO> is CacheMiss an exception?
[22:23:56] <jaybehash> #macbidouille
[22:24:41] <jaybehash> oops sorry
[22:24:42] <greyTEO> Any reason to not update the cache on cachemiss? I do not know the interworkings on your system though?
[22:26:02] <GothAlice> greyTEO: Yes, it's an exception.
[22:26:19] <GothAlice> greyTEO: It's an early exit from https://github.com/marrow/cache/blob/develop/marrow/cache/model.py#L75-L79
[22:26:45] <greyTEO> bingo. Got it.
[22:27:12] <GothAlice> ^_^ This lib is 100% tested across Python 2.6+ and Pypy2+ and is used in production at the moment. It's MIT. Feel free to steal any ideas that seem good. ;)
[22:36:54] <greyTEO> I will definitely reference it. Solid extension. Ill have to figure out how to navigate through python...
[22:46:20] <bros> GothAlice, Thanks for helping me bring my server usage from 100% to 5% and response times from 3s to 30ms with aggregation.
[22:46:38] <cheeser> nice.
[22:51:02] <GothAlice> bros: Wow!
[22:51:11] <GothAlice> Great job optimizing that.
[22:51:30] <bros> GothAlice, https://gist.github.com/brandonros/e64e94f3026b13f185cb
[22:51:54] <bros> How do the aggregations look? Ok?
[22:52:18] <GothAlice> bros: The .0. stuff is… potentially concerning. (I take it you are intentionally only checking the first array element in that first $match, line 70?)
[22:52:24] <bros> There's no way to check the last element of a subdocument, correct?
[22:52:35] <GothAlice> bros: There sorta is, using aggregation and some $unwind tricks.
[22:52:38] <bros> like, log.last.time
[22:53:08] <GothAlice> $unwind on the array, $last the values you care about in a $project stage, bam, you'll (eventually) get the last value of each document's array.
[22:53:28] <GothAlice> Rather, $last in a $group.
[22:53:37] <GothAlice> Too many types of projection goin' on here. ^_^
[22:56:19] <bros> GothAlice, what can I do instead of the projection?
[22:56:38] <GothAlice> Wait, no, that was a comment on me getting confused, not your code.
[22:57:22] <bros> I would have never known about $project if it wasn't for you. All of the tutorials/documentation tried to lead me toward $group it felt
[23:04:42] <GothAlice> For certain edge cases involving lists, an $unwind/$group pair is effectively the only solution. I.e. your "get the last array element" issue.