[00:42:27] <StephenLynx> GothAlice, I got a race condition and I wonder if this would work: if I use a findAndModify to increment a field in a document, will that cause the document to lock for writings while it increment one of it's fields eliminating the racing condition?
[06:13:29] <Razerglass> hello can someone help me with syntax to delete a item from a array with mongoose?
[06:38:53] <svm_invictvs> Is it possibel to query using $ref
[06:59:46] <amitprakash> Unfortunately, this will ignored indexes on reffield
[07:00:28] <amitprakash> a better way would be to db.collection.find('reffield' : { "$ref" : 'othercollection', "$id" : 'blah' , "$db" :'othercollectiondb' })
[07:17:36] <mbuf> in cloud.mongodb.com, how does one determine the mongodb_mms_group_id and mongodb_mms_api_key?
[07:17:38] <svm_invictvs> cheeser: That reproduces my problem from earlier
[07:17:56] <svm_invictvs> cheeser: But, it looks like in the Morphia documentation it' snot clear if @Id private String theObjectId; is okay
[07:18:10] <svm_invictvs> And that's where it gets screwed up
[07:39:30] <taspat> hi, I'm noob and would like to ask u this quick question: can I have a collection and insert objects in order by some of that object's property. And then say query the collection with something like "take(10, collection)" so mongodb return first 10 ordered items with a little quering cost?
[08:16:11] <taspat> joannac: ty. But it is very expensive right? Because every time you need sort the collection. Can you maintain the order when inserting/updating? And then do take (10, col) ?
[08:16:27] <taspat> I hope I'm explaining myself..
[11:56:33] <Siamaster> is that really bad practice?
[11:57:03] <GothAlice> Siamaster: If you hexlify an ObjectId it goes from 12 bytes to 24 + null + 4 byte length.
[11:57:08] <GothAlice> It should obviously be sub-optimal.
[11:57:54] <Siamaster> but right now my entity chat has 3 ObjectIds ChatId, User1Id and User2Id
[11:58:04] <GothAlice> You also require additional processing (conversion back) before you can use the values within the ObjectIds, like the creation time.
[11:58:25] <GothAlice> So you don't have a list, you have a concrete 3-tuple.
[12:00:27] <StephenLynx> GothAlice, did you read what I asked yesterday? you think that could work?
[12:00:27] <GothAlice> It's the same thing as embedding it in _id, except you need to create a new index on {initiator: 1, target: 1}, {unique: true}
[12:00:45] <Siamaster> what if they both issue the first message at the same time?
[12:01:09] <StephenLynx> you get an error on the second with code 11000 if I am not mistaken
[12:01:20] <GothAlice> Siamaster: All operations in MongoDB are applied in a linear order.
[12:01:24] <Siamaster> no? lets say you and me chat
[12:01:34] <GothAlice> I.e. if two requests come in "simultaneously"… they're not actually simultaneous. One will "win" and go first.
[12:01:37] <Siamaster> and we issue the chat at same time
[12:01:49] <Siamaster> there will be two entities where on one I'm the initiator and the other you are
[12:01:55] <GothAlice> Down to the millisecond, with millisecond accurate clock synchronization between machines?
[12:02:12] <Derick> and don't forget the object ID also has the PID in it
[12:02:12] <StephenLynx> "I.e. if two requests come in "simultaneously"… they're not actually simultaneous. One will "win" and go first." hm, now that probably fixes my problem.
[12:02:44] <GothAlice> (And even then, MongoDB still linearizes inserts, and yeah, ObjectIds are fundamentally designed to not be nearly as naive as "auto-incremnet IDs", which are wrought with problems. Just ask Twitter. ;)
[12:03:29] <GothAlice> That's an unfortunate approach, potentially introducing race conditions.
[12:03:33] <StephenLynx> GothAlice so if I use findAndModify to increment a field of a document and use this field to get auto-incremented ids, it makes impossible for me to get the same ID twice?
[12:03:41] <GothAlice> For which, StephenLynx, findAndModify with upsert: True is your friend.
[12:04:25] <GothAlice> However, findAndModify is broken.
[12:04:29] <StephenLynx> I won't need the upsert, the document that I will use to generate the id will be created before hand.
[12:04:49] <StephenLynx> I have these two collections
[12:05:00] <GothAlice> http://docs.mongodb.org/manual/reference/command/findAndModify/ < it introduces more special cases and weirdness than it solves.
[12:05:14] <StephenLynx> one has a field called threadId
[12:05:36] <StephenLynx> there can't be a threadId equal to another postId in the same forum.
[12:05:40] <GothAlice> (That's 8 "screenfulls" of special casing, on my monitor, right there. To heck with that.)
[12:05:49] <Siamaster> so what do you guys suggest I do? I can't secure myself against the race condition?
[12:06:00] <StephenLynx> so I use the forum document to get the id of the new post or thread.
[12:06:51] <GothAlice> Siamaster: Write your queries to assume that the record does not exist, and that whatever query is run will create the record as needed. (That's the general approach.)
[12:08:02] <StephenLynx> and where are the special cases?
[12:08:05] <GothAlice> User A sends a message to User B. db.chat.update({initiator: ObjectId(User A), target: ObjectId(User B)}, {$push: {messages: {_id: ObjectId(*new*), sender: ObjectId(User A), message: "Hi."}}}, upsert=True)
[12:08:08] <Siamaster> but how would a query like that look like in mongo? :S
[12:08:52] <GothAlice> StephenLynx: http://docs.mongodb.org/manual/reference/command/findAndModify/#behavior < all of these
[12:08:58] <GothAlice> Oh, it can insert multiple times accidentally.
[12:09:20] <GothAlice> http://docs.mongodb.org/manual/reference/command/findAndModify/#comparisons-with-the-update-method < behaves differently from update in a number of ways.
[12:09:40] <GothAlice> Notably: "By default, findAndModify method returns an object that contains the pre-modified version of the document…" which is… a strange default.
[12:11:49] <GothAlice> Alice's Law #40: There should be one–and preferably only one–obvious way to do it. findAndModify adds a second, and makes the decision highly non-obvious due to the excessively large number of documented differences.
[12:12:08] <Siamaster> I got it!! i could make user1 always be the user which were created first of the two
[12:12:09] <StephenLynx> yeah, yeah, but the other options are worse.
[12:12:23] <Siamaster> is it a heavy operation to get time from objectid?
[12:12:25] <GothAlice> Siamaster: Sort the IDs, the "initiator" is the first one. Very easy.
[12:12:35] <GothAlice> StephenLynx: Worse? In what way?
[12:12:51] <StephenLynx> Option A: put threads and posts in the same collection
[12:13:07] <StephenLynx> then differentiate threads from posts in some sort of way
[12:13:18] <GothAlice> StephenLynx: I don't follow how we get from findAndModify to a concrete collection example.
[12:13:38] <StephenLynx> because I need this for this example
[12:13:50] <StephenLynx> I am not defending in general findAndModify
[12:14:15] <GothAlice> Siamaster: The time is a UNIX timestamp stored in the binary data that is the ObjectId. In C, it's a pointer dereference to access it, thus not really possible to make it easier/faster.
[12:15:38] <StephenLynx> when I said "other options" it was ambiguous, I was referring to the other options to my problem with this race condition.
[12:16:47] <GothAlice> StephenLynx: Race conditions are solved by update-if-not-different mechanisms used as locks, typically. findAndModify won't really have an impact over simply an update with a query looking for the previous value, updating it to a new value.
[12:17:26] <GothAlice> db.foo.update({_id: ObjectId(…), value: "some expected value"}, {$set: {value: "some new value"}}) then checking nModified or nUpdated or whatever it's called in the returned value. ;)
[12:17:53] <StephenLynx> it will because the document of the forum that holds the next id of both threads and posts will never wields the same value twice.
[12:18:00] <GothAlice> In this way you can implement versioning (update this record, incrementing the version too, but only if it's currently the version we expect), etc.
[12:20:29] <mbuf> I am trying to install MMS agent with code from https://github.com/Stouts/Stouts.mongodb, but, the installation and configuration is always skipping
[12:20:34] <GothAlice> Why would you ever need to store that number?
[12:20:51] <StephenLynx> that is the expected user experience.
[12:20:57] <GothAlice> "Get post #45" -> db.posts.find(…).skip(45).limit(1)
[12:21:12] <StephenLynx> there are multiple boards
[12:21:19] <StephenLynx> each one with their own posts.
[12:21:39] <StephenLynx> board /b/ have its own post 45, board /v/ has its own post 45
[12:21:47] <GothAlice> "Jump to post #45 in the thread I'm looking at" -> as you render the page, "enumerate" the results as you render them, emit the enumeration as the HTML "id", you get it for free.
[12:21:47] <StephenLynx> board /n/ have its own post 45 that is a thread.
[12:22:14] <StephenLynx> so I can't trust on their order.
[12:22:22] <StephenLynx> because the post 450 might be the first one for that board.
[12:22:23] <GothAlice> Ah, but that's the thing. I've never, ever seen auto-increment post IDs. Again, they're UNIX timestamps.
[12:22:59] <StephenLynx> again, I am not replacing _id, I am just using a second post identification.
[12:23:32] <StephenLynx> and I can't use something hard to read because that would cause posts that quote other posts unintelligible.
[12:25:18] <StephenLynx> I have been using chans for about ten years now and I am not afraid to do things differently, but I couldn't think of anything better for this. is not that trivial if you take in account the expected user experience.
[12:26:36] <GothAlice> I.e. "auto increment" and "mongodb" are mutually exclusive.
[12:26:42] <StephenLynx> will this document with find and modify
[12:26:52] <Siamaster> GothAlice but how do I ensure that two fields are unique for each other. i.e {_id:obj, a:1, b:3} is okay if there already is an {_id:obj, a:1,b:2} but another {_id:obj, a:1,b:2} would fail?
[12:26:55] <StephenLynx> check against the other server of the cluster?
[12:27:04] <StephenLynx> when incrementing the field?
[12:28:43] <StephenLynx> is not an unique index either.
[12:28:59] <StephenLynx> because multiple forums might have the same lastId.
[12:29:15] <StephenLynx> /b/ might have it at 45 so as /v/.
[12:29:22] <GothAlice> Siamaster: As a note, one should never really have the same _id across two documents, ever. (Not possible, it's automatically unique.) So your own unique index would only be on "a" and "b" from your example, i.e. the user IDs.
[12:29:52] <GothAlice> StephenLynx: I think you're getting confused as to who I'm addressing with my answers. :P
[12:31:04] <GothAlice> Siamaster: See the link I provided.
[12:31:11] <Siamaster> I saw, it doesn't answer my question
[12:31:34] <Siamaster> i want two fields to be unique in relation to each other
[12:32:17] <Siamaster> if I force a unique constraint on user1, I won't be able to have more chats for user1
[12:32:22] <GothAlice> Siamaster: http://docs.mongodb.org/manual/core/index-compound/ < the documentation is quite good, it's generally a good idea to read it. You want a compound unique index.
[12:40:04] <Siamaster> it will still make a unique
[12:40:29] <GothAlice> (Compound, multikey, geospatial, text, and hashed are the "types", "time to live", unique, and sparese are "properties" of indexes in general.)
[12:40:36] <Siamaster> i can' have {a:1,b:2} and {a:2,b:3}
[12:40:55] <GothAlice> Those two are unique from eachother.
[12:41:03] <GothAlice> {a:1,b:2} and {a:1,b:3} is also unique.
[12:45:06] <GothAlice> Here's a question that might illuminate one problem with the "just index an embedded object" approach: {a: 1, b: 2} and {b: 2, a: 1}
[12:54:29] <Siamaster> made a mistake in the I guess, sorry :P
[12:54:37] <GothAlice> Notice I first dropped whatever might have been named "foo" before proceeding. You may have had previous indexes from previous experiments left behind.
[13:52:10] <pokEarl> Hello friends, Not sure if this is an appropriate place to ask this, but I am using Jackson with MongoJack to serialize/deserialize documents from Mongo in Java. There is an embedded document in my collection though that sometimes has values, and sometimes is just NaN which breaks things, any suggestions on how to get around this?
[14:12:29] <MBravo> Hi, is there anyone with a good understanding how cloneCollection works? Question being, if there is a duplicate _id in collection being copied, what wll happen? Scanning docs and googling didn't help.
[14:30:53] <GothAlice> MBravo: You actually have duplicate _id values?
[14:31:04] <GothAlice> (Which should be impossible, given the default unique index on that field.)
[15:13:06] <MBravo> GothAlice: I'm copying very similar (and large) collections between different mongo servers so I have to take the possibility of duplicates into account
[15:16:38] <GothAlice> MBravo: In that particular instance, I'm not sure what would happen, but it would be indicative of a corrupted dataset requiring --repair. Not one that you can successfully copy live.
[15:31:16] <MBravo> GothAlice: I think we have a misunderstanding somewhere :) let me illustrate
[15:32:19] <MBravo> Suppose I have a server1, running a mongod instance, which has a db somedb and collection mycoll
[15:32:39] <MBravo> this collection has steady influx of data
[15:33:30] <MBravo> every, say, 24 hours I need top copy all data from somedb.mycoll to server2, into otherdb.mycoll, and then delete the data on server1
[15:33:44] <MBravo> server2 thus serves as an accumulation point
[15:34:04] <MBravo> the easiest way to do this seems to execute db.cloneCollection on server2
[15:35:31] <MBravo> due to rather high turnaround of data I'd like to consider the possibility that there *might* be a duplicate _id in the data coming from server1 (transient) to server2 (accumulation point -> lots of data)
[15:35:53] <MBravo> and I can't find any description of how cloneCollection would behave if there's a duplicate
[15:41:12] <cheeser> how would you get a duplicate _id?
[15:47:32] <GothAlice> Well, even if you're not using ObjectId, a primary wouldn't commit a duplicate record with the default unique index on that field.
[15:48:15] <GothAlice> (Thus a duplicate key on _id being indicative of some type of catastrophic failure that should utterly halt "syncing out" until corrected, a la --repair.)
[15:48:36] <GothAlice> Under normal operation there wouldn't even be transient duplication. It just wouldn't happen.
[15:59:38] <GothAlice> MBravo: Have you already tried to do what you say you are wanting to do?
[16:00:07] <GothAlice> In terms of syncing data to a central source, this is pretty much not the way to do it.
[16:00:55] <GothAlice> Notably, the command also always recreates indexes. For large collections, the copy operation will lock the collection and trounce server performance while the indexing is going on.
[16:05:50] <MBravo> GothAlice: tried yes, got duplicates no
[16:07:50] <MBravo> checking out your gist - interesting... let me go take another look
[16:12:58] <MBravo> while I'm retesting, a) I have mongo 2.6 not 3.0 b) the behaviour in your gist seems to directly contradict documentation
[16:14:20] <MBravo> which says "if ...collection exists in the database, then MongoDB appends documents from the remote collection to the destination collection."
[16:15:17] <Pinkamena_D> With text search, can I know the field which was matched? Say I set up a text index which matches on three fields, can I get back from the query which field was matched, or simply the string which was matched, in order to output it to search results?
[16:19:48] <GothAlice> MBravo: When in doubt, source code. https://github.com/mongodb/mongo/blob/v2.6.3/src/mongo/db/cloner.cpp
[16:20:16] <GothAlice> Note that this module appears to have been rewritten and moved in 3.0.
[16:22:14] <MBravo> GothAlice: indeed. thanks for your input, I now have some food for thought.
[16:22:26] <MBravo> perhaps I should go along the lines of http://serverfault.com/questions/585445/copy-mongodb-collection-between-two-servers-and-different-mongo-versions
[16:22:40] <MBravo> first comment under first answer
[16:23:34] <GothAlice> mongodump | mongorestore using a query to only fetch records created or modified after the last sync… much simpler.
[16:24:08] <GothAlice> Also, after the first time that's ever done, the indexes won't need recreation, so no great impact from that when performing subsequent updates.
[16:26:07] <GothAlice> No need for ssh, ruby, bash loops, ssh, regular expression transformation, nor a highly inefficient storage format (JSON).
[16:29:45] <lmatteis> hello. i'm a bit confused how aggregates are supposed to be used. are they supposed to be used in a live manner? wouldn't they take a long time to compute if the aggregate is complex?
[16:30:00] <lmatteis> or do i need to store the results of an aggregate query somewhere? say in a separate collection
[16:30:13] <juliofreitas> Hi! I'm creating a system where my user could define the week. They will have tasks for to do, then I can create my week starting at Tuesday and would finish my tasks until next Tuesday. Everyday the system can give me more tasks. Which is the best way for structure my database and make the queries (tasks by day, tasks at week). Can someone help me or send a tutorial?
[16:30:39] <GothAlice> lmatteis: http://s.webcore.io/image/142o1W3U2y0x < almost everything you see here is the result of an aggregate query. The top click-through chart is an aggregate that processes ~17K records. The page generates in a few hundred milliseconds.
[16:31:18] <lmatteis> GothAlice: so i guess it caches it?
[16:31:21] <GothAlice> To keep speeds fast, I ensure my queries are O(n) complexity with a fixed n for any given time range being calculated upon.
[16:31:43] <GothAlice> (Google terms to search for: mongodb pre aggregation)
[16:33:12] <lmatteis> GothAlice: i've inherited this code base which keeps logs of tons of stuff in a collection "logs" and then create separate collections, similar to "views"
[16:33:41] <lmatteis> but i don't think they use the agggregate stuff, they just query "logs" and update the other collections
[16:34:13] <GothAlice> I do something similar, but mine is entirely, 100%, to-the-second live.
[16:34:40] <GothAlice> I have a "hits" collection that all incoming click data gets dumped into. The pre-aggregate collections (your "views") are "upserted" at the same time.
[16:35:43] <lmatteis> so it does make sense? or would a simple .aggregate call from the client be enough?
[16:35:51] <lmatteis> rather than creating intermediary collections
[16:36:05] <GothAlice> I.e. the "per-hour" pre-aggregate collection records date's have their minute/second/microsecond set to zero ("rounding down" to the hour) and something like: db.hourly.update({date: Date(…)}, {$inc: {total: 1, …}}, upsert=True)
[16:36:17] <GothAlice> If the hourly record didn't exist, it'll be created automatically. (Upsert.)
[16:36:32] <GothAlice> It's very, very important to consider exactly how you'll be querying that data.
[16:37:00] <GothAlice> I store my data in time-based slices (throwing away finer time-based granularity, such as the actual microsecond the click happened, which is useless data) because I'm querying in time-based ranges.
[16:37:48] <GothAlice> This means that I know exactly how much data needs to be processed to answer any given range of time. One week at hour granularity? 168 records must be processed, worst-case, to answer that query.
[16:38:37] <GothAlice> If I were to instead aggregate query the original "hits" collection (that's microsecond accurate), my worst-case is dictated by how active our users are.
[16:38:53] <GothAlice> (That is flatly unacceptable as it means you can't easily predict performance.)
[16:40:53] <GothAlice> Read the article I linked. :)
[16:41:06] <GothAlice> It covers the subject quite thoroughly.
[16:41:44] <GothAlice> Basically: a document is added to "hits" for each click that is made. That means on a slow day there'll be no records added, but on a busy day there may be millions of records added.
[16:42:47] <GothAlice> "Pre-aggregating" into segments defined by how we want to query the data (hourly / daily time granularity) changes O(n) from measuring clicks, to measuring hours (or days). A potentially near-infinitely smaller set of data to process to answer queries, depending on how busy it gets.
[16:43:20] <lmatteis> but wouldn't running an aggregate directly on "hits" be enough?
[16:43:35] <lmatteis> like how do you know you have to create separate collections?
[16:43:38] <GothAlice> A million hits in a day, querying that day's activity to display an hour-by-hour chart, would process a million records, counting them up. Chopped up hourly, to produce the same chart requires only examining 24 records _no matter how many clicks there are, ever_.
[16:45:36] <lmatteis> GothAlice: in couchdb you just have a map-reduce view, and couch jeeps it updated
[16:45:46] <GothAlice> In MongoDB you use upserts to do the same.
[16:46:09] <lmatteis> on saperate collections then?
[16:47:53] <GothAlice> I can't really explain it any better than "processing millions or billions or trillions or any other arbitrary unbounded number of records" vs. "processing a fixed, small number of records".
[16:50:15] <GothAlice> There are some normal queries in there (i.e. "find out the latest order ID" for use in the aggregates), of course. But that ~17K records processed to display that will never increase.
[16:50:24] <GothAlice> No matter how active our users become.
[16:50:59] <GothAlice> (It's so high because we're performing competition-by-industry and you-vs-everyone comparisons.)
[16:54:09] <lmatteis> and where does aggregate store them?
[16:54:34] <GothAlice> By default: RAM at the moment you make the query, cleaned up after completion of iteraton, closing of the cursor, or query timeout.
[16:54:50] <GothAlice> Aggregates can (optionally) write the results to a collection, but that's not very useful. It's a one-time snapshot of the results as of the moment you made the query.
[17:09:58] <pokEarl> Yeah, its normally a nested document though (say like Document1{id:1, Document2 { value1: 1, value2: 2}} So I have a Pojo for document 1 and a pojo for document 2, which is used for serialization/deserialization. But then sometimes document2 is not a nested document its just document2: NaN, and then stuff breaks
[17:14:51] <GothAlice> cheeser's the resident Java guru.
[17:15:51] <svm_invictvs> cheeser: heh, figured it out
[17:16:52] <svm_invictvs> cheeser: Posted a bug and test case reproduceing it consistently. I'm not even sure it's a bug, honestly. Basically the @Id annotated String fucks it up.
[17:17:31] <svm_invictvs> tldr, object is stored with ObjectId _id (not a String), and the rest of the code assumes it's a string so it all shits the bed there.
[17:17:56] <GothAlice> ObjectIds are not strings, so yeah, clearly there's a bug to fix there.
[17:18:05] <jacksnipe> problem with inserting a user into a collection with a 2dsphere index: location: { type: "Point", coordinates: [ 9.392014269192378, -163.4010148661396 ] } } longitude/latitude is out of bounds, lng: 9.39201 lat: -163.401
[17:18:21] <jacksnipe> shit nvm figured it out I'm retarded
[17:18:27] <svm_invictvs> GothAlice: Well, the only reason I don't call it a bug is that the Morphia docs explicity say not to use @id annotated Strings and expect them to be generated properly.
[17:18:42] <svm_invictvs> GothAlice: So if it's a "bug" it could be a documentation bug
[17:19:17] <GothAlice> Well, notably, "the rest of the code assumes it's a string" is the wrong part.
[17:19:51] <svm_invictvs> I dig pretty deep into Morphia's source
[17:20:04] <svm_invictvs> There's not really a good way to handle it, none as far as I had seen.
[17:22:55] <svm_invictvs> GothAlice: If I get a wild hare up my ass about it, I'll submit a pull request :P
[17:43:51] <Doyle> Hello. Is there a good way to move all data from several separate replica sets onto one large replica set or sharded replica set? Export/Import?
[17:59:20] <_sillymarkets> Hey, can anyone help me with the Aggregation Pipeline? I'm using Moped for Ruby to connect to the db. When I use shell, I can get my aggregation result. But I only get an empty array when I try to use my ruby script to pump the same info
[18:07:59] <Doyle> Hey. Is it possible to combine several existing replica sets with unique data on them into a sharded cluster?
[18:43:35] <jigax> hello everyone, I currently have a node app which we use to upload some images. I'm storing the image info in mongo and uploading the actual file to s3. I was wondering if this is the ideal way or should I be storing these directly in mongo as base64? thanks
[18:46:56] <_sillymarkets> hey, is anyone having trouble using Aggregation Framework with Ruby mongo gem? I have 2.0.6 installed but i keep getting undefined method "aggregate" for Mongo::Collection
[20:18:16] <Bradipo> Because there is nothing in it.
[20:24:15] <Bradipo> So this doesn't even return: db.results.find( { $getmore: { "issue.issueId": ObjectId('5363531be4b08b0137618a98'), recorded: { $lt: new Date(1437508601000) } }, $orderby: { recorded: -1 } })
[20:43:07] <Doyle> Is there an official position on hosting multiple mongd's (replica set members) on a single host? Considering the possibility of moving several smaller replica sets spanning a lot fo hosts to a few large super hosts.
[20:43:31] <Doyle> 3 hosts each hosting mongod processes for several replica sets.
[23:03:07] <dbclk> and Site B has only 1 mongod node
[23:03:33] <dbclk> if my site is sending request to that mongod node (prioirty 0), could it still accept read request?
[23:03:37] <dbclk> i understand it can never do writes
[23:04:08] <dbclk> but, just want to know if there would be a problem if one node is the only one up in a mongo cluster and that node so happens to be a Priority 0 node