PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 21st of July, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:42:27] <StephenLynx> GothAlice, I got a race condition and I wonder if this would work: if I use a findAndModify to increment a field in a document, will that cause the document to lock for writings while it increment one of it's fields eliminating the racing condition?
[06:13:29] <Razerglass> hello can someone help me with syntax to delete a item from a array with mongoose?
[06:38:53] <svm_invictvs> Is it possibel to query using $ref
[06:54:50] <amitprakash> svm_invictvs, dbref?
[06:56:31] <svm_invictvs> yeah
[06:58:18] <amitprakash> svm_invictvs, you can use dbref._id iirc
[06:59:17] <amitprakash> db.collection.find({'reffield.$id': 'blah'})
[06:59:46] <amitprakash> Unfortunately, this will ignored indexes on reffield
[07:00:28] <amitprakash> a better way would be to db.collection.find('reffield' : { "$ref" : 'othercollection', "$id" : 'blah' , "$db" :'othercollectiondb' })
[07:17:31] <svm_invictvs> cheeser: http://pastebin.ca/3069723
[07:17:36] <mbuf> in cloud.mongodb.com, how does one determine the mongodb_mms_group_id and mongodb_mms_api_key?
[07:17:38] <svm_invictvs> cheeser: That reproduces my problem from earlier
[07:17:56] <svm_invictvs> cheeser: But, it looks like in the Morphia documentation it' snot clear if @Id private String theObjectId; is okay
[07:18:10] <svm_invictvs> And that's where it gets screwed up
[07:39:30] <taspat> hi, I'm noob and would like to ask u this quick question: can I have a collection and insert objects in order by some of that object's property. And then say query the collection with something like "take(10, collection)" so mongodb return first 10 ordered items with a little quering cost?
[08:14:52] <joannac> taspat: yes. find(...).sort(...).limit(10)
[08:16:11] <taspat> joannac: ty. But it is very expensive right? Because every time you need sort the collection. Can you maintain the order when inserting/updating? And then do take (10, col) ?
[08:16:27] <taspat> I hope I'm explaining myself..
[08:16:51] <joannac> create an index then
[08:17:17] <taspat> i google index mongodb?
[08:17:21] <joannac> yes
[08:17:42] <taspat> thanks
[09:08:48] <puppeh> Anybody knows how can I close the connections to the server in the ruby driver v2.1?
[09:11:07] <Zelest> No idea, but what happens if you set the variable/handler to nil?
[09:11:13] <Zelest> Is it still connected?
[09:15:02] <puppeh> dunno but that's not an option
[09:16:42] <Zelest> Then I'm out of ideas, sorry.
[10:09:47] <d-snp> puppeh: just call close?
[10:11:20] <d-snp> Mongo::Client#close
[11:03:00] <Siamaster> Hi, is it possible to make a read operation use the same lock write operations does?
[11:03:31] <Siamaster> I don't want to read anything before all the current write operations are done
[11:19:30] <Siamaster> Is it bad practice to have a list of objectids as the id for an element?
[11:29:08] <Siamaster> I have an entity called Chat
[11:29:38] <Siamaster> right now it has objectId id, objectid user1Id, objectId user2Id as fields
[11:30:12] <Siamaster> I'm thinking about making it objectId[] _id
[11:40:17] <Siamaster> ok, can't use array for id
[11:56:20] <Siamaster> but what if I stringify the objectIds and seperate them by Comma?
[11:56:24] <Siamaster> that would obviously work
[11:56:25] <Siamaster> but
[11:56:33] <Siamaster> is that really bad practice?
[11:57:03] <GothAlice> Siamaster: If you hexlify an ObjectId it goes from 12 bytes to 24 + null + 4 byte length.
[11:57:08] <GothAlice> It should obviously be sub-optimal.
[11:57:54] <Siamaster> but right now my entity chat has 3 ObjectIds ChatId, User1Id and User2Id
[11:58:04] <GothAlice> You also require additional processing (conversion back) before you can use the values within the ObjectIds, like the creation time.
[11:58:25] <GothAlice> So you don't have a list, you have a concrete 3-tuple.
[11:58:44] <GothAlice> {_id: {session: Objectid(…), initiator: ObjectId(…), target: ObjectId(…)}, message: "…"}
[11:59:01] <Siamaster> but this is the best way to assure uniqueness?
[11:59:10] <GothAlice> It's guaranteed to be unique.
[11:59:10] <Siamaster> how else can I achieve that?
[11:59:20] <GothAlice> Create additional indexes on other fields, marking the index as unique.
[11:59:24] <GothAlice> But _id does this for you.
[11:59:45] <Siamaster> I mean that I have only 1 entity that is for user1 and user2
[11:59:53] <GothAlice> {_id: Objectid(…session ID…), initiator: ObjectId(…), target: ObjectId(…), message: "…"}
[12:00:27] <StephenLynx> GothAlice, did you read what I asked yesterday? you think that could work?
[12:00:27] <GothAlice> It's the same thing as embedding it in _id, except you need to create a new index on {initiator: 1, target: 1}, {unique: true}
[12:00:45] <Siamaster> what if they both issue the first message at the same time?
[12:01:09] <StephenLynx> you get an error on the second with code 11000 if I am not mistaken
[12:01:20] <GothAlice> Siamaster: All operations in MongoDB are applied in a linear order.
[12:01:24] <Siamaster> no? lets say you and me chat
[12:01:34] <GothAlice> I.e. if two requests come in "simultaneously"… they're not actually simultaneous. One will "win" and go first.
[12:01:37] <Siamaster> and we issue the chat at same time
[12:01:49] <Siamaster> there will be two entities where on one I'm the initiator and the other you are
[12:01:55] <GothAlice> Down to the millisecond, with millisecond accurate clock synchronization between machines?
[12:02:12] <Derick> and don't forget the object ID also has the PID in it
[12:02:12] <StephenLynx> "I.e. if two requests come in "simultaneously"… they're not actually simultaneous. One will "win" and go first." hm, now that probably fixes my problem.
[12:02:44] <GothAlice> (And even then, MongoDB still linearizes inserts, and yeah, ObjectIds are fundamentally designed to not be nearly as naive as "auto-incremnet IDs", which are wrought with problems. Just ask Twitter. ;)
[12:03:14] <Siamaster> yes but imagine this, if(chatRepository.getChatFor(user1,user2) == null) { chatRepository.createChatFor(user1,user2);}
[12:03:29] <GothAlice> That's an unfortunate approach, potentially introducing race conditions.
[12:03:33] <StephenLynx> GothAlice so if I use findAndModify to increment a field of a document and use this field to get auto-incremented ids, it makes impossible for me to get the same ID twice?
[12:03:41] <GothAlice> For which, StephenLynx, findAndModify with upsert: True is your friend.
[12:04:25] <GothAlice> However, findAndModify is broken.
[12:04:29] <StephenLynx> I won't need the upsert, the document that I will use to generate the id will be created before hand.
[12:04:33] <StephenLynx> eh?
[12:04:34] <StephenLynx> why
[12:04:36] <GothAlice> Update + query to retrieve is… clearer.
[12:04:44] <StephenLynx> yeah, but heres the deal
[12:04:49] <StephenLynx> I have these two collections
[12:05:00] <GothAlice> http://docs.mongodb.org/manual/reference/command/findAndModify/ < it introduces more special cases and weirdness than it solves.
[12:05:14] <StephenLynx> one has a field called threadId
[12:05:17] <StephenLynx> and the other postId
[12:05:36] <StephenLynx> there can't be a threadId equal to another postId in the same forum.
[12:05:40] <GothAlice> (That's 8 "screenfulls" of special casing, on my monitor, right there. To heck with that.)
[12:05:49] <Siamaster> so what do you guys suggest I do? I can't secure myself against the race condition?
[12:06:00] <StephenLynx> so I use the forum document to get the id of the new post or thread.
[12:06:51] <GothAlice> Siamaster: Write your queries to assume that the record does not exist, and that whatever query is run will create the record as needed. (That's the general approach.)
[12:07:57] <Siamaster> i.e chatRepository.getOrCreateChatFor(user1,user2) ?
[12:08:02] <StephenLynx> and where are the special cases?
[12:08:05] <GothAlice> User A sends a message to User B. db.chat.update({initiator: ObjectId(User A), target: ObjectId(User B)}, {$push: {messages: {_id: ObjectId(*new*), sender: ObjectId(User A), message: "Hi."}}}, upsert=True)
[12:08:08] <Siamaster> but how would a query like that look like in mongo? :S
[12:08:13] <GothAlice> Like that.
[12:08:27] <Siamaster> thanks
[12:08:52] <GothAlice> StephenLynx: http://docs.mongodb.org/manual/reference/command/findAndModify/#behavior < all of these
[12:08:58] <GothAlice> Oh, it can insert multiple times accidentally.
[12:09:20] <GothAlice> http://docs.mongodb.org/manual/reference/command/findAndModify/#comparisons-with-the-update-method < behaves differently from update in a number of ways.
[12:09:40] <GothAlice> Notably: "By default, findAndModify method returns an object that contains the pre-modified version of the document…" which is… a strange default.
[12:09:41] <Siamaster> that's great, thanks GothAlice
[12:10:17] <GothAlice> So, I actually see no point in findAndModify over update() followed by a find().
[12:10:21] <Siamaster> but, I would still have two entities right?
[12:10:32] <StephenLynx> reading it.
[12:10:40] <Siamaster> whereas both users are initiators in each
[12:10:44] <StephenLynx> I don't think I will hit these special cases though.
[12:10:54] <GothAlice> StephenLynx: Or, if you do want the pre-modified version, a find() followed by an update(). ;)
[12:11:05] <StephenLynx> there is an option
[12:11:11] <StephenLynx> that returns the post-modified.
[12:11:17] <StephenLynx> no need for a find.
[12:11:49] <GothAlice> Alice's Law #40: There should be one–and preferably only one–obvious way to do it. findAndModify adds a second, and makes the decision highly non-obvious due to the excessively large number of documented differences.
[12:12:08] <Siamaster> I got it!! i could make user1 always be the user which were created first of the two
[12:12:09] <StephenLynx> yeah, yeah, but the other options are worse.
[12:12:23] <Siamaster> is it a heavy operation to get time from objectid?
[12:12:25] <GothAlice> Siamaster: Sort the IDs, the "initiator" is the first one. Very easy.
[12:12:35] <Siamaster> yes
[12:12:35] <GothAlice> StephenLynx: Worse? In what way?
[12:12:51] <StephenLynx> Option A: put threads and posts in the same collection
[12:13:07] <StephenLynx> then differentiate threads from posts in some sort of way
[12:13:18] <GothAlice> StephenLynx: I don't follow how we get from findAndModify to a concrete collection example.
[12:13:38] <StephenLynx> because I need this for this example
[12:13:50] <StephenLynx> I am not defending in general findAndModify
[12:14:15] <GothAlice> Siamaster: The time is a UNIX timestamp stored in the binary data that is the ObjectId. In C, it's a pointer dereference to access it, thus not really possible to make it easier/faster.
[12:15:38] <StephenLynx> when I said "other options" it was ambiguous, I was referring to the other options to my problem with this race condition.
[12:16:47] <GothAlice> StephenLynx: Race conditions are solved by update-if-not-different mechanisms used as locks, typically. findAndModify won't really have an impact over simply an update with a query looking for the previous value, updating it to a new value.
[12:17:26] <GothAlice> db.foo.update({_id: ObjectId(…), value: "some expected value"}, {$set: {value: "some new value"}}) then checking nModified or nUpdated or whatever it's called in the returned value. ;)
[12:17:53] <StephenLynx> it will because the document of the forum that holds the next id of both threads and posts will never wields the same value twice.
[12:18:00] <GothAlice> In this way you can implement versioning (update this record, incrementing the version too, but only if it's currently the version we expect), etc.
[12:18:15] <GothAlice> Sample document?
[12:18:18] <StephenLynx> no
[12:18:28] <StephenLynx> want to see my model?
[12:18:33] <GothAlice> I'm almost afraid to.
[12:18:37] <StephenLynx> lol
[12:18:45] <GothAlice> "holds the next id of both threads and posts" -> fundamental flaw, that sounds like.
[12:18:50] <GothAlice> You're implementing auto increment IDs?
[12:18:51] <StephenLynx> chans.
[12:18:53] <StephenLynx> yes.
[12:18:57] <GothAlice> Game over, man.
[12:18:59] <GothAlice> Game over.
[12:19:05] <StephenLynx> it is a chan.
[12:19:16] <GothAlice> I do not comprehend why that mandates auto-increment IDs.
[12:19:22] <StephenLynx> do you use a chan?
[12:19:24] <GothAlice> Upload IDs are unix timestamps on most chans.
[12:19:28] <GothAlice> (Not auto increment.)
[12:19:40] <StephenLynx> I am not replacing the _id
[12:19:49] <StephenLynx> this is is for user display
[12:19:51] <StephenLynx> and quoting.
[12:20:04] <StephenLynx> people will have to handle this value and read it.
[12:20:18] <StephenLynx> like "check post >>45"
[12:20:26] <StephenLynx> and then >>45 is a link to the post.
[12:20:29] <GothAlice> …
[12:20:29] <mbuf> I am trying to install MMS agent with code from https://github.com/Stouts/Stouts.mongodb, but, the installation and configuration is always skipping
[12:20:34] <GothAlice> Why would you ever need to store that number?
[12:20:37] <mbuf> in Ansible
[12:20:43] <StephenLynx> again
[12:20:45] <StephenLynx> it is a chan.
[12:20:51] <StephenLynx> that is the expected user experience.
[12:20:57] <GothAlice> "Get post #45" -> db.posts.find(…).skip(45).limit(1)
[12:21:12] <StephenLynx> there are multiple boards
[12:21:19] <StephenLynx> each one with their own posts.
[12:21:39] <StephenLynx> board /b/ have its own post 45, board /v/ has its own post 45
[12:21:47] <GothAlice> "Jump to post #45 in the thread I'm looking at" -> as you render the page, "enumerate" the results as you render them, emit the enumeration as the HTML "id", you get it for free.
[12:21:47] <StephenLynx> board /n/ have its own post 45 that is a thread.
[12:22:05] <StephenLynx> and then
[12:22:09] <StephenLynx> old posts are deleted
[12:22:14] <StephenLynx> so I can't trust on their order.
[12:22:22] <StephenLynx> because the post 450 might be the first one for that board.
[12:22:23] <GothAlice> Ah, but that's the thing. I've never, ever seen auto-increment post IDs. Again, they're UNIX timestamps.
[12:22:59] <StephenLynx> again, I am not replacing _id, I am just using a second post identification.
[12:23:32] <StephenLynx> and I can't use something hard to read because that would cause posts that quote other posts unintelligible.
[12:25:18] <StephenLynx> I have been using chans for about ten years now and I am not afraid to do things differently, but I couldn't think of anything better for this. is not that trivial if you take in account the expected user experience.
[12:25:27] <GothAlice> StephenLynx: https://github.com/twitter/snowflake/tree/snowflake-2010
[12:25:51] <StephenLynx> too much overhead.
[12:26:08] <StephenLynx> and the code necessary for me to have these ids with find and modify is really small.
[12:26:08] <GothAlice> A separate service is the only scalable option for auto increment ID generation.
[12:26:18] <GothAlice> But you introduce lock-stepping.
[12:26:24] <GothAlice> So your chan won't scale.
[12:26:30] <StephenLynx> ok, I have a question
[12:26:36] <GothAlice> I.e. "auto increment" and "mongodb" are mutually exclusive.
[12:26:42] <StephenLynx> will this document with find and modify
[12:26:52] <Siamaster> GothAlice but how do I ensure that two fields are unique for each other. i.e {_id:obj, a:1, b:3} is okay if there already is an {_id:obj, a:1,b:2} but another {_id:obj, a:1,b:2} would fail?
[12:26:55] <StephenLynx> check against the other server of the cluster?
[12:27:04] <StephenLynx> when incrementing the field?
[12:27:07] <GothAlice> Only the master.
[12:27:11] <GothAlice> Primary, rather.
[12:27:15] <GothAlice> It's the only one that matters.
[12:27:23] <GothAlice> (And only one you can write to.)
[12:27:34] <StephenLynx> so this document will never wield the same value twice for that document on that field?
[12:28:03] <StephenLynx> because these ids are not global, the scope of them is limited to this forum document.
[12:28:34] <GothAlice> Siamaster: http://docs.mongodb.org/manual/core/index-unique/
[12:28:43] <StephenLynx> is not an unique index either.
[12:28:59] <StephenLynx> because multiple forums might have the same lastId.
[12:29:15] <StephenLynx> /b/ might have it at 45 so as /v/.
[12:29:22] <GothAlice> Siamaster: As a note, one should never really have the same _id across two documents, ever. (Not possible, it's automatically unique.) So your own unique index would only be on "a" and "b" from your example, i.e. the user IDs.
[12:29:52] <GothAlice> StephenLynx: I think you're getting confused as to who I'm addressing with my answers. :P
[12:29:58] <StephenLynx> ah
[12:29:59] <StephenLynx> yeah
[12:30:00] <StephenLynx> :v
[12:30:20] <Siamaster> oh but , what i meant by obj was "random object id"
[12:30:28] <GothAlice> Siamaster: It's not random.
[12:30:48] <Siamaster> unique object id
[12:30:48] <GothAlice> But I understand.
[12:31:04] <GothAlice> Siamaster: See the link I provided.
[12:31:11] <Siamaster> I saw, it doesn't answer my question
[12:31:34] <Siamaster> i want two fields to be unique in relation to each other
[12:32:17] <Siamaster> if I force a unique constraint on user1, I won't be able to have more chats for user1
[12:32:22] <GothAlice> Siamaster: http://docs.mongodb.org/manual/core/index-compound/ < the documentation is quite good, it's generally a good idea to read it. You want a compound unique index.
[12:32:34] <Siamaster> oh! thanks
[12:32:55] <GothAlice> Unique is separate from being compound, thus separately documented.
[12:33:10] <Siamaster> I see. thanks
[12:38:44] <Siamaster> but you cannot have compound and unique indexes?
[12:39:28] <StephenLynx> yes.
[12:39:30] <GothAlice> /me considers buying stock in Picard Brand Face Balm. ;P
[12:39:34] <GothAlice> You sure can. ;)
[12:39:49] <GothAlice> The index "options" can, generally, be mixed freely with the other basic index types.
[12:39:58] <Siamaster> db.test.ensureIndex({a:1, b:1}, {unique:true})
[12:40:04] <Siamaster> it will still make a unique
[12:40:29] <GothAlice> (Compound, multikey, geospatial, text, and hashed are the "types", "time to live", unique, and sparese are "properties" of indexes in general.)
[12:40:36] <Siamaster> i can' have {a:1,b:2} and {a:2,b:3}
[12:40:55] <GothAlice> Those two are unique from eachother.
[12:41:03] <GothAlice> {a:1,b:2} and {a:1,b:3} is also unique.
[12:41:05] <GothAlice> Etc.
[12:41:42] <GothAlice> Ah, no, I see what you're referencing now. Siamaster: Defining indexes is the same as defining sort order.
[12:41:59] <GothAlice> {a: 1, b: 1} means "create an index on a and b, each sorted ascending".
[12:42:08] <GothAlice> The other option is -1, for descending.
[12:42:34] <GothAlice> Ref: http://docs.mongodb.org/manual/core/indexes-introduction/#compound-index
[12:43:27] <Siamaster> Omg I've got a raisin brain
[12:44:00] <Siamaster> i should make a unique constraint on a field called pariticipants which itself is an object
[12:44:15] <GothAlice> Mmm… that'll probably only confuse things more.
[12:44:24] <Siamaster> but it works?
[12:44:31] <Siamaster> it's the only way
[12:44:31] <GothAlice> Sometimes.
[12:44:37] <GothAlice> It's really not.
[12:45:06] <GothAlice> Here's a question that might illuminate one problem with the "just index an embedded object" approach: {a: 1, b: 2} and {b: 2, a: 1}
[12:45:08] <GothAlice> Are those the same?
[12:45:20] <Siamaster> but I sort a and b
[12:45:33] <Siamaster> a being user1 and b being user2
[12:45:41] <GothAlice> The problem is that many languages do not preserve dictionary / associative array order.
[12:46:14] <GothAlice> {user1: 1, user2: 2} and {user2: 2, user1: 1} then
[12:46:17] <Siamaster> but you can't have a unique, compound index such as db.test.ensureIndex({a:1, b:1}, {unique:true})
[12:46:26] <GothAlice> MongoDB does not consider those to be the same, even though the _values_ are identical.
[12:47:03] <Siamaster> because then db.test.insert({a:1,b:2}); db.test.insert({a1,b:3}); will fail
[12:47:40] <Siamaster> so the way would be to db.test.ensureIndex({uniqueObj:1}, {unique:true})
[12:48:09] <Siamaster> and then do db.test.insert({uniqueObj:{a:1,b:2}}); db.test.insert({uniqueObj:{a1,b:3}});
[12:51:11] <GothAlice> Siamaster: https://gist.github.com/amcgregor/827b3932997e083650fb
[12:51:51] <GothAlice> {a1,b:3} is gibberish.
[12:51:57] <GothAlice> (Not parseable, invalid non-code.)
[12:52:19] <GothAlice> However, {a:1,b:2} and {a:1,b:3} are each unique, thus no, they don't fail. See my link.
[12:53:32] <Siamaster> omg now it works
[12:53:34] <GothAlice> When in doubt, open up a mongo shell to the test database and just start hacking on it.
[12:53:54] <Siamaster> you're right, I don't know what I did wrong last time
[12:54:02] <Siamaster> i did!
[12:54:29] <Siamaster> made a mistake in the I guess, sorry :P
[12:54:37] <GothAlice> Notice I first dropped whatever might have been named "foo" before proceeding. You may have had previous indexes from previous experiments left behind.
[13:52:10] <pokEarl> Hello friends, Not sure if this is an appropriate place to ask this, but I am using Jackson with MongoJack to serialize/deserialize documents from Mongo in Java. There is an embedded document in my collection though that sometimes has values, and sometimes is just NaN which breaks things, any suggestions on how to get around this?
[14:12:29] <MBravo> Hi, is there anyone with a good understanding how cloneCollection works? Question being, if there is a duplicate _id in collection being copied, what wll happen? Scanning docs and googling didn't help.
[14:30:53] <GothAlice> MBravo: You actually have duplicate _id values?
[14:31:04] <GothAlice> (Which should be impossible, given the default unique index on that field.)
[15:13:06] <MBravo> GothAlice: I'm copying very similar (and large) collections between different mongo servers so I have to take the possibility of duplicates into account
[15:16:38] <GothAlice> MBravo: In that particular instance, I'm not sure what would happen, but it would be indicative of a corrupted dataset requiring --repair. Not one that you can successfully copy live.
[15:31:16] <MBravo> GothAlice: I think we have a misunderstanding somewhere :) let me illustrate
[15:32:19] <MBravo> Suppose I have a server1, running a mongod instance, which has a db somedb and collection mycoll
[15:32:39] <MBravo> this collection has steady influx of data
[15:33:30] <MBravo> every, say, 24 hours I need top copy all data from somedb.mycoll to server2, into otherdb.mycoll, and then delete the data on server1
[15:33:44] <MBravo> server2 thus serves as an accumulation point
[15:34:04] <MBravo> the easiest way to do this seems to execute db.cloneCollection on server2
[15:35:31] <MBravo> due to rather high turnaround of data I'd like to consider the possibility that there *might* be a duplicate _id in the data coming from server1 (transient) to server2 (accumulation point -> lots of data)
[15:35:53] <MBravo> and I can't find any description of how cloneCollection would behave if there's a duplicate
[15:41:12] <cheeser> how would you get a duplicate _id?
[15:41:19] <cheeser> are you using ObjectId?
[15:47:32] <GothAlice> Well, even if you're not using ObjectId, a primary wouldn't commit a duplicate record with the default unique index on that field.
[15:48:15] <GothAlice> (Thus a duplicate key on _id being indicative of some type of catastrophic failure that should utterly halt "syncing out" until corrected, a la --repair.)
[15:48:36] <GothAlice> Under normal operation there wouldn't even be transient duplication. It just wouldn't happen.
[15:59:38] <GothAlice> MBravo: Have you already tried to do what you say you are wanting to do?
[15:59:41] <GothAlice> https://gist.github.com/amcgregor/c7bb5e29d0c3357edfd4
[16:00:07] <GothAlice> In terms of syncing data to a central source, this is pretty much not the way to do it.
[16:00:55] <GothAlice> Notably, the command also always recreates indexes. For large collections, the copy operation will lock the collection and trounce server performance while the indexing is going on.
[16:05:50] <MBravo> GothAlice: tried yes, got duplicates no
[16:07:50] <MBravo> checking out your gist - interesting... let me go take another look
[16:12:58] <MBravo> while I'm retesting, a) I have mongo 2.6 not 3.0 b) the behaviour in your gist seems to directly contradict documentation
[16:14:20] <MBravo> which says "if ...collection exists in the database, then MongoDB appends documents from the remote collection to the destination collection."
[16:15:17] <Pinkamena_D> With text search, can I know the field which was matched? Say I set up a text index which matches on three fields, can I get back from the query which field was matched, or simply the string which was matched, in order to output it to search results?
[16:19:48] <GothAlice> MBravo: When in doubt, source code. https://github.com/mongodb/mongo/blob/v2.6.3/src/mongo/db/cloner.cpp
[16:20:16] <GothAlice> Note that this module appears to have been rewritten and moved in 3.0.
[16:22:14] <MBravo> GothAlice: indeed. thanks for your input, I now have some food for thought.
[16:22:26] <MBravo> perhaps I should go along the lines of http://serverfault.com/questions/585445/copy-mongodb-collection-between-two-servers-and-different-mongo-versions
[16:22:40] <MBravo> first comment under first answer
[16:23:08] <GothAlice> That's insane.
[16:23:34] <GothAlice> mongodump | mongorestore using a query to only fetch records created or modified after the last sync… much simpler.
[16:24:08] <GothAlice> Also, after the first time that's ever done, the indexes won't need recreation, so no great impact from that when performing subsequent updates.
[16:26:07] <GothAlice> No need for ssh, ruby, bash loops, ssh, regular expression transformation, nor a highly inefficient storage format (JSON).
[16:29:45] <lmatteis> hello. i'm a bit confused how aggregates are supposed to be used. are they supposed to be used in a live manner? wouldn't they take a long time to compute if the aggregate is complex?
[16:30:00] <lmatteis> or do i need to store the results of an aggregate query somewhere? say in a separate collection
[16:30:13] <juliofreitas> Hi! I'm creating a system where my user could define the week. They will have tasks for to do, then I can create my week starting at Tuesday and would finish my tasks until next Tuesday. Everyday the system can give me more tasks. Which is the best way for structure my database and make the queries (tasks by day, tasks at week). Can someone help me or send a tutorial?
[16:30:39] <GothAlice> lmatteis: http://s.webcore.io/image/142o1W3U2y0x < almost everything you see here is the result of an aggregate query. The top click-through chart is an aggregate that processes ~17K records. The page generates in a few hundred milliseconds.
[16:31:18] <lmatteis> GothAlice: so i guess it caches it?
[16:31:21] <GothAlice> To keep speeds fast, I ensure my queries are O(n) complexity with a fixed n for any given time range being calculated upon.
[16:31:32] <GothAlice> Ref: http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework
[16:31:43] <GothAlice> (Google terms to search for: mongodb pre aggregation)
[16:33:12] <lmatteis> GothAlice: i've inherited this code base which keeps logs of tons of stuff in a collection "logs" and then create separate collections, similar to "views"
[16:33:41] <lmatteis> but i don't think they use the agggregate stuff, they just query "logs" and update the other collections
[16:33:46] <lmatteis> using cron jobs
[16:34:13] <GothAlice> I do something similar, but mine is entirely, 100%, to-the-second live.
[16:34:40] <GothAlice> I have a "hits" collection that all incoming click data gets dumped into. The pre-aggregate collections (your "views") are "upserted" at the same time.
[16:35:43] <lmatteis> so it does make sense? or would a simple .aggregate call from the client be enough?
[16:35:51] <lmatteis> rather than creating intermediary collections
[16:36:05] <GothAlice> I.e. the "per-hour" pre-aggregate collection records date's have their minute/second/microsecond set to zero ("rounding down" to the hour) and something like: db.hourly.update({date: Date(…)}, {$inc: {total: 1, …}}, upsert=True)
[16:36:17] <GothAlice> If the hourly record didn't exist, it'll be created automatically. (Upsert.)
[16:36:32] <GothAlice> It's very, very important to consider exactly how you'll be querying that data.
[16:37:00] <GothAlice> I store my data in time-based slices (throwing away finer time-based granularity, such as the actual microsecond the click happened, which is useless data) because I'm querying in time-based ranges.
[16:37:48] <GothAlice> This means that I know exactly how much data needs to be processed to answer any given range of time. One week at hour granularity? 168 records must be processed, worst-case, to answer that query.
[16:38:37] <GothAlice> If I were to instead aggregate query the original "hits" collection (that's microsecond accurate), my worst-case is dictated by how active our users are.
[16:38:53] <GothAlice> (That is flatly unacceptable as it means you can't easily predict performance.)
[16:40:41] <lmatteis> i'm confused
[16:40:53] <GothAlice> Read the article I linked. :)
[16:41:06] <GothAlice> It covers the subject quite thoroughly.
[16:41:44] <GothAlice> Basically: a document is added to "hits" for each click that is made. That means on a slow day there'll be no records added, but on a busy day there may be millions of records added.
[16:42:47] <GothAlice> "Pre-aggregating" into segments defined by how we want to query the data (hourly / daily time granularity) changes O(n) from measuring clicks, to measuring hours (or days). A potentially near-infinitely smaller set of data to process to answer queries, depending on how busy it gets.
[16:43:20] <lmatteis> but wouldn't running an aggregate directly on "hits" be enough?
[16:43:35] <lmatteis> like how do you know you have to create separate collections?
[16:43:38] <GothAlice> A million hits in a day, querying that day's activity to display an hour-by-hour chart, would process a million records, counting them up. Chopped up hourly, to produce the same chart requires only examining 24 records _no matter how many clicks there are, ever_.
[16:45:36] <lmatteis> GothAlice: in couchdb you just have a map-reduce view, and couch jeeps it updated
[16:45:46] <GothAlice> In MongoDB you use upserts to do the same.
[16:46:09] <lmatteis> on saperate collections then?
[16:46:13] <GothAlice> Yuup.
[16:46:15] <GothAlice> Read the article.
[16:46:21] <GothAlice> http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework
[16:47:53] <GothAlice> I can't really explain it any better than "processing millions or billions or trillions or any other arbitrary unbounded number of records" vs. "processing a fixed, small number of records".
[16:48:17] <lmatteis> yeah indeed thats why views exist
[16:48:27] <GothAlice> In MongoDB, that's called pre-aggregation.
[16:48:32] <lmatteis> but like, you earlier said that that complicated image was done using aggregates?
[16:48:45] <GothAlice> Yup.
[16:48:47] <lmatteis> or pre-aggregates?
[16:48:55] <GothAlice> Aggregate queries on pre-aggregated collections.
[16:49:00] <lmatteis> ah
[16:49:01] <GothAlice> I'm aggregating an aggregate to produce that dashboard.
[16:49:21] <lmatteis> ok
[16:50:15] <GothAlice> There are some normal queries in there (i.e. "find out the latest order ID" for use in the aggregates), of course. But that ~17K records processed to display that will never increase.
[16:50:24] <GothAlice> No matter how active our users become.
[16:50:59] <GothAlice> (It's so high because we're performing competition-by-industry and you-vs-everyone comparisons.)
[16:51:45] <lmatteis> hrm ok
[16:53:25] <lmatteis> GothAlice: but i mean, what's the difference between "aggregate" or simply querying the thing then like with .view()
[16:53:36] <GothAlice> There is no "view".
[16:53:42] <lmatteis> i thought that aggregate would create a "view"somewhere... like couchdb
[16:53:47] <lmatteis> sorry .find()
[16:53:57] <GothAlice> Find doesn't store the result anywhere but RAM.
[16:54:03] <lmatteis> ah
[16:54:09] <lmatteis> and where does aggregate store them?
[16:54:34] <GothAlice> By default: RAM at the moment you make the query, cleaned up after completion of iteraton, closing of the cursor, or query timeout.
[16:54:50] <GothAlice> Aggregates can (optionally) write the results to a collection, but that's not very useful. It's a one-time snapshot of the results as of the moment you made the query.
[16:54:53] <GothAlice> (Not a live view.)
[16:55:01] <GothAlice> Thus: pre-aggregation using upserts.
[16:55:22] <GothAlice> (Not very useful for live analytics, anyway.)
[16:55:25] <GothAlice> (Useful for other things. ;)
[16:55:32] <lmatteis> i need live-analytics
[16:55:35] <lmatteis> similar to your use case
[16:55:44] <GothAlice> Have you read the article I linked?
[16:56:52] <lmatteis> a little
[17:01:20] <pokEarl> I'm having trouble with serialization of NaN values with Jackson/in Java, anyone here who is wise about such things? :(
[17:05:51] <StephenLynx> what is the value stored?
[17:05:58] <StephenLynx> on mongo?
[17:07:56] <GothAlice> NaN is a bit in the floating point representation, AFIK. Similar to ±Inf.
[17:08:54] <GothAlice> Ref: https://en.wikipedia.org/wiki/IEEE_floating_point#Formats
[17:09:58] <pokEarl> Yeah, its normally a nested document though (say like Document1{id:1, Document2 { value1: 1, value2: 2}} So I have a Pojo for document 1 and a pojo for document 2, which is used for serialization/deserialization. But then sometimes document2 is not a nested document its just document2: NaN, and then stuff breaks
[17:14:51] <GothAlice> cheeser's the resident Java guru.
[17:15:51] <svm_invictvs> cheeser: heh, figured it out
[17:16:52] <svm_invictvs> cheeser: Posted a bug and test case reproduceing it consistently. I'm not even sure it's a bug, honestly. Basically the @Id annotated String fucks it up.
[17:17:31] <svm_invictvs> tldr, object is stored with ObjectId _id (not a String), and the rest of the code assumes it's a string so it all shits the bed there.
[17:17:56] <GothAlice> ObjectIds are not strings, so yeah, clearly there's a bug to fix there.
[17:18:05] <jacksnipe> problem with inserting a user into a collection with a 2dsphere index: location: { type: "Point", coordinates: [ 9.392014269192378, -163.4010148661396 ] } } longitude/latitude is out of bounds, lng: 9.39201 lat: -163.401
[17:18:21] <jacksnipe> shit nvm figured it out I'm retarded
[17:18:25] <jacksnipe> like super retarded
[17:18:25] <GothAlice> :P
[17:18:27] <svm_invictvs> GothAlice: Well, the only reason I don't call it a bug is that the Morphia docs explicity say not to use @id annotated Strings and expect them to be generated properly.
[17:18:42] <svm_invictvs> GothAlice: So if it's a "bug" it could be a documentation bug
[17:19:17] <GothAlice> Well, notably, "the rest of the code assumes it's a string" is the wrong part.
[17:19:44] <svm_invictvs> Kind of
[17:19:51] <svm_invictvs> I dig pretty deep into Morphia's source
[17:20:04] <svm_invictvs> There's not really a good way to handle it, none as far as I had seen.
[17:22:55] <svm_invictvs> GothAlice: If I get a wild hare up my ass about it, I'll submit a pull request :P
[17:43:51] <Doyle> Hello. Is there a good way to move all data from several separate replica sets onto one large replica set or sharded replica set? Export/Import?
[17:59:20] <_sillymarkets> Hey, can anyone help me with the Aggregation Pipeline? I'm using Moped for Ruby to connect to the db. When I use shell, I can get my aggregation result. But I only get an empty array when I try to use my ruby script to pump the same info
[18:07:59] <Doyle> Hey. Is it possible to combine several existing replica sets with unique data on them into a sharded cluster?
[18:43:35] <jigax> hello everyone, I currently have a node app which we use to upload some images. I'm storing the image info in mongo and uploading the actual file to s3. I was wondering if this is the ideal way or should I be storing these directly in mongo as base64? thanks
[18:46:56] <_sillymarkets> hey, is anyone having trouble using Aggregation Framework with Ruby mongo gem? I have 2.0.6 installed but i keep getting undefined method "aggregate" for Mongo::Collection
[18:50:37] <_sillymarkets> db version is 2.6.3
[18:53:56] <_sillymarkets> someone please help me :(
[19:12:14] <Bradipo> Hello. Having some slow query issues here, even though there are indexes.
[19:13:42] <kali> Bradipo: you've tried explain() ?
[19:13:45] <Bradipo> Specifically, what is logged is something liek that:
[19:13:49] <Bradipo> conn19] getmore db.results cid:9150063901757207802 getMore: { query: { issue.issueId: ObjectId('51f68d6de4b0b95738c289b5'), recorded: { $lt: new Date(1437494735000) } }, orderby: { recorded: -1 } } bytes:104063 nreturned:1007 9776ms
[19:13:59] <Bradipo> kali: No, I have not.
[19:14:18] <kali> Bradipo: try it, and gist me a getIndexes() somewhere
[19:14:32] <Bradipo> Ok.
[19:14:58] <Bradipo> Is paste site alright? Or must it be gist?
[19:15:17] <kali> anyting is fine. just not on the # :)
[19:15:23] <Bradipo> Sure.
[19:15:54] <Bradipo> I seem to have two indexes with the same key...
[19:16:13] <kali> oO
[19:19:03] <Bradipo> http://pastebin.com/VcyggXYC
[19:19:13] <Bradipo> That has the indexes and one of the logged slow getmore queries.
[19:20:14] <kali> can you add the explain ?
[19:20:22] <Bradipo> I don't seem to have an explain().
[19:21:09] <kali> just get the query running in the mongo shell, then add explain() after the find(...)
[19:21:15] <Bradipo> Oh.
[19:21:23] <Bradipo> Well, the query in the shell is actually fast.
[19:21:47] <Bradipo> Getting explain now...
[19:21:50] <kali> did you make it with the order clause too ?
[19:22:20] <Bradipo> Yes.
[19:23:23] <kali> mmmm... are you paginating a big result list for this query maybe ?
[19:24:02] <Bradipo> Explain is still running...
[19:25:02] <Bradipo> http://pastebin.com/G2FWuHJh
[19:25:11] <Bradipo> This has the query I'm making from the shell.
[19:25:27] <Bradipo> But when I put .explain() on it hasn't yet returned. :-)
[19:26:00] <kali> this is weird
[19:27:39] <Bradipo> Yeah.
[19:27:44] <Bradipo> I'm confused because we have indexes.
[19:28:26] <kali> you haven't answered... are paginating a big lost of these results ? with skip and limit ?
[19:28:32] <Bradipo> No limit.
[19:28:42] <Bradipo> I tried to add limit because I only wanted 10, but it didn't have any impact.
[19:28:43] <Bradipo> No skip.
[19:28:59] <kali> so you're taking the 10 first results, and that's it
[19:29:07] <Bradipo> I only want the first 10 results.
[19:29:15] <Bradipo> For the moment though, assume no limit is in place.
[19:29:48] <Bradipo> Using morphia to build the query, so I'm not sure exactly what query is being sent.
[19:29:56] <Bradipo> But I think it's the one above (that's logged in mongodb.log).
[19:30:17] <kali> you can use the internal profiler to check if that's true
[19:30:29] <kali> and see the plan the optimizer chose
[19:30:30] <Bradipo> Wow, I cannot believe this explain() hasn't returned yet.
[19:30:56] <kali> i think you may want to hint() the right index. see if that's helping
[19:31:26] <kali> and there might be something i'm not aware of with multiple valued index (issue being an array) and using index for ordering results
[19:31:43] <Bradipo> Initially there was no index on recorded.
[19:31:53] <Bradipo> Only the multi key indexes you see in the paste.
[19:32:05] <Bradipo> So I added recorded because I thought maybe the orderby was choking without it.
[19:32:17] <kali> the {"issues.issueId"; 1, recorded:-1 } should be the right one
[19:32:25] <Bradipo> Yeah, and there is already that index.
[19:32:41] <kali> try hint()ing the query maybe
[19:33:27] <Bradipo> For explain, or you mean the actual query in the application?
[19:34:08] <Bradipo> Oh, I see.
[19:34:35] <Bradipo> I'm going to Ctrl-C this explain...
[19:34:59] <Bradipo> So hint it to use the multi-key index?
[19:35:12] <kali> yeah, try it
[19:36:11] <Bradipo> Hmm, I gave it a bad hint. :-)
[19:36:46] <Bradipo> I added .hint( { "issue.issueId": 1, recorded: 1 } )
[19:36:51] <Bradipo> But apparently that's not what it wants.
[19:37:10] <kali> recorded: -1
[19:37:14] <Bradipo> Oh yeah.
[19:37:19] <kali> you have to match the index definition
[19:37:23] <Bradipo> Right.
[19:37:40] <Bradipo> Ok, query running now.
[19:38:08] <Bradipo> Hmm, taking a long time...
[19:39:10] <Bradipo> Yeah, this query is not coming back either (at least not in any reasonable amount of time).
[19:39:24] <kali> :/
[19:39:31] <kali> check out the profiler
[19:39:52] <Bradipo> So, just to be sure, I added: .hint( { "issue.issueId": 1, recorded: -1 } ).explain()
[19:40:05] <kali> look there: http://docs.mongodb.org/manual/reference/method/db.setProfilingLevel/#db.setProfilingLevel
[19:40:12] <kali> it will log the query with the execution plan
[19:42:25] <Bradipo> Ok, I've turned on profiling for queries longer than 500ms.
[19:43:40] <Bradipo> Will those show up in mongodb.log?
[19:43:57] <kali> yes, and in a magic collection
[19:44:04] <kali> with all the gory details
[19:44:34] <Bradipo> system.profile?
[19:44:45] <kali> yes
[19:45:10] <Bradipo> db.profile.find() returns nothing.
[19:45:31] <kali> you need to run db.system.profile.find() in your actual database, i think
[19:45:47] <Bradipo> Oh.
[19:45:55] <Bradipo> So use database then db.system.profile.find()?
[19:45:58] <Bradipo> Yeah, that's magic...
[19:46:14] <Bradipo> Still empty.
[19:46:31] <kali> you're back in the "right" database ?
[19:46:34] <Bradipo> Yes.
[19:47:08] <Bradipo> Does a query actually have to complete?
[19:47:19] <kali> mmmm maybe
[19:47:36] <kali> db.currentOp() may show the running query in the mean time
[19:47:40] <Bradipo> Ok, yeah, it logged on.
[19:47:52] <Bradipo> query: { query: { issue.issueId: ObjectId('54d99c29e4b0ab741d365573'), recorded: { $lt: new Date(1437507882000) } }, orderby: { recorded: -1 } } nreturned:101 3492ms
[19:48:37] <Bradipo> Seems to be no more information there than we already knew. :-)
[19:49:05] <Bradipo> db.currentOp() does have some data.
[19:49:15] <kali> that's the log output, but the system.profile collection should show a lot more
[19:49:28] <Bradipo> Yeah, it's not showing anything.
[19:49:39] <Bradipo> "secs_running" : 528,
[19:49:46] <Bradipo> That's my explain probably.
[19:50:02] <Bradipo> Yeah, it has the hint and the explain.
[19:50:27] <Bradipo> Heh, there are about 4 of them running.
[19:51:02] <kali> killOp may help you )
[19:51:16] <Bradipo> Wait...
[19:51:26] <Bradipo> db.getProfilingLevel() returns 0...
[19:52:09] <Bradipo> No it doesn't, that was a different db.
[19:52:30] <Bradipo> Ok, now system.profile has data.
[19:53:32] <kali> aha :)
[19:54:20] <Bradipo> http://pastebin.com/aiy4GJjh
[19:54:30] <Bradipo> I'm afraid it still doesn't give very much more details.
[19:55:09] <kali> aw
[19:55:59] <Bradipo> Searching for getmore mongodb slow returns a few results in google, one of them pointing to fixing an aggregate query.
[19:56:40] <Bradipo> But I couldn't find anything specific.
[19:57:11] <Bradipo> Most of the queries run decently, ut there are some that take way too long.
[20:04:24] <Bradipo> Is it possible that there is an index but there just isn't enough memory for the index so it's reading from disk?
[20:04:31] <Bradipo> How are indexes stored in mongodb?
[20:05:33] <kali> it is a possibility
[20:06:00] <kali> indexes are btrees and there performance degrade significantly when they no longer fit in ram
[20:06:12] <kali> s/there/their/
[20:06:15] <Bradipo> I wonder if this is the case...
[20:06:24] <Bradipo> The database is quite large...
[20:07:00] <kali> collection.stats() wil give you hints
[20:07:03] <Bradipo> According to show dbs it is: 191.8603515625GB
[20:07:21] <Bradipo> Didn't know about .stats()
[20:11:47] <Bradipo> Is it possible for an index to only be partially indexed?
[20:12:15] <Bradipo> Oh, interesting.
[20:12:27] <Bradipo> totalIndexSize is 3,752,884,800
[20:12:37] <Bradipo> Definitely more than is in memory.
[20:17:10] <Bradipo> Why would db.x.stats() return not found?
[20:18:13] <Bradipo> Oh...
[20:18:16] <Bradipo> Because there is nothing in it.
[20:24:15] <Bradipo> So this doesn't even return: db.results.find( { $getmore: { "issue.issueId": ObjectId('5363531be4b08b0137618a98'), recorded: { $lt: new Date(1437508601000) } }, $orderby: { recorded: -1 } })
[20:43:07] <Doyle> Is there an official position on hosting multiple mongd's (replica set members) on a single host? Considering the possibility of moving several smaller replica sets spanning a lot fo hosts to a few large super hosts.
[20:43:31] <Doyle> 3 hosts each hosting mongod processes for several replica sets.
[20:43:35] <Doyle> thoughts?
[20:55:32] <johnflux> I want to do a find() and know whether there are any matches
[20:55:41] <johnflux> I don't care how many, just whether there are any
[20:55:47] <johnflux> how should I do this?
[20:56:12] <StephenLynx> find x +1
[20:56:17] <StephenLynx> use first x.
[20:56:39] <StephenLynx> if there is more than x, you know there are more.
[20:59:28] <johnflux> hmm I didn't understand a word of that :-(
[20:59:44] <johnflux> db.mycollection.find().... then what?
[21:01:12] <johnflux> oh, I could do limit(1)
[21:01:15] <johnflux> and then just check the length
[21:02:15] <blizzow> Should monitor the index size on my collections via the mongos, the mongod, or both?
[22:56:19] <Bradipo> kali: Thanks for your time.
[22:56:51] <Bradipo> kali: I think the problem is that the index is just so large that it has to scan the disk just to read the key.
[23:01:58] <dbclk> question guys
[23:02:12] <dbclk> if I have geographical disperse network
[23:02:48] <dbclk> with a mongo cluster on Site A (rprimary site) and Site B (Priority 0)
[23:02:58] <dbclk> if Site A goes down
[23:03:07] <dbclk> and Site B has only 1 mongod node
[23:03:33] <dbclk> if my site is sending request to that mongod node (prioirty 0), could it still accept read request?
[23:03:37] <dbclk> i understand it can never do writes
[23:04:08] <dbclk> but, just want to know if there would be a problem if one node is the only one up in a mongo cluster and that node so happens to be a Priority 0 node