[04:32:50] <Waheedi> I have a question regarding the replication for writes on secondaries, Is there a way to keep specific amount of writes to happen on the secondaries in a period of time? I believe now whenever the primary get writes it replicates it instantly I want to make that a bit slower so I don't lose read speed at all
[04:35:30] <Waheedi> i know I can do that for the primary through modifying my client/app to write on primary slowly but i do actually want to write instantly on primary but not on secondaries..
[04:39:36] <joannac> Replication is for HA, not for read scalability. So, no.
[06:21:18] <eddie888> how do i delete documents from query?
[06:26:31] <Boomtime> if using the shell call .remove instead of .find
[06:26:42] <Boomtime> this is true with most languages/drivers too actually
[07:35:30] <dbackeus> I want to create a unique compound index for two fields, organization_id and slug, but only create the index if slug isn't null. Can I use a partialFilterExpression for this, if so how would the expression look like?
[08:32:34] <rom1504> so I've been doing something like `StreamSupport.stream(collection.find().spliterator(), false).parallel()` in java
[08:32:47] <rom1504> and I'm getting a Exception in thread "main" com.mongodb.MongoCursorNotFoundException: Query failed with error code -5 and error message 'Cursor 69590691813 not found on server <server>' on server <server>
[09:16:28] <rom1504> well I added .noCursorTimeout(true)
[09:16:32] <rom1504> hopefully that was the problem
[10:53:51] <livcd_> where can i find an info about how to set a mongodb replicas across different DCs (some best practices) ?
[10:58:45] <Derick> livcd_: it's better to ask specific questions here... cause we'd just be able to say "google it"? (at least, that's what I would hvae to do)
[10:59:26] <livcd_> Derick: just wanted to refresh on what has been said in courses
[14:28:36] <fish_> yeah :) sorry, let me take a step back
[14:29:17] <fish_> so I have a bunch of app servers running mongos. now I introduced new instances that serve many more requests than the old ones and I see that requests to mongodb are much slower on those
[14:29:31] <fish_> even though the systems aren't bound by any resource bottleneck as far as I see
[14:30:07] <fish_> so I was thinking: maybe mongodb somehow makes sure one client (IP) can't utilize more resources than others
[14:30:30] <fish_> but now I think more it might be mongos which simply is overwealmed by the higher number of operations on those new instances
[14:31:22] <fish_> cheeser: does this make more sense?
[14:32:28] <fish_> of course mongos isn't running into obvious limits either
[14:34:28] <cheeser> those new instances might also still be syncing, too.
[14:35:12] <fish_> cheeser: new instances = new app instances, so they only run mongos
[14:35:27] <cheeser> oh, well, that doesn't make any sense
[14:36:23] <fish_> cheeser: see.. this is why I ask weird questions :)
[14:37:06] <fish_> in general, mongos doesn't need much 'tuning' I guess? I never spend much thoughts on mongos performance until now
[14:37:31] <fish_> maybe it simply has a specific numnber of open connections and pools new ones or something
[14:37:34] <cheeser> i haven't personally heard of any such issues
[15:22:59] <vagelis> Hello i would like to ask what im supposed to do to handle non ascii characters. I mean should i encode somehow(keys/values) the entire document to utf-8 before saving to mongodb?
[15:23:44] <vagelis> It might be silly question but im not expert on encodings etc :/
[15:54:08] <mapc> rom1504: yes, thank you. I know that.
[15:54:12] <macwinner> for upgrading to 3.2.1, i'm thinking of changing my plans a little. rather than do upgrade to wiredtiger at the same time as upgrade to 3.2.1, I'm going to first do upgrade to 3.2.1 and keep mmap storage engine.. then after all nodes are upgraded, I'll stop the secondaries, delete data directory, change mongod.conf to use WT, then restart and wait for resync.. does this sound better than doing the update to WT at the same time as teh upd
[15:54:44] <mapc> I thought maybe there is a very condensed pure syntax kind reference, where I can look up whether I really got every possible special construct.
[16:49:11] <Pinkamena_D> it has happened to be before that a collection which was basically logs was filling up disk space and eventually took up the whole disk. In this case I was unable to delete old logs using collection.remove(), it still just said it was out of disk space. Is there any way to recover from this situation without having to back up everything to another disk?
[16:49:40] <Pinkamena_D> The error, when inserting OR removing, was 'can not take a write lock when out of disk space'.
[17:22:19] <pchoo> in aggregation pipelines, can I take the value of afield and use it as a field key, assigning another value to it? e.g. I've got several objects with like {ref: 'abc', status: 'new', time: 123}, and I would like {_id: 'new', abc: 123 }
[17:24:34] <GothAlice> pchoo: When in doubt, try it out. Turns out, no: http://s.webcore.io/2x1y2i0f0E3A
[17:25:12] <pchoo> GothAlice: thanks, My hunch was no, I was hoping someone here could prove me wrong :)
[17:25:32] <GothAlice> It's… generally a bad idea overall to have variable key names.
[17:26:06] <GothAlice> I.e. that approach will eventually kick your dog when it comes time to do things like add indexes. Which you can't against variably named fields.
[17:26:08] <pchoo> yeah, I was trying to create a hash I could use as a lookup source for building a graph object
[17:26:40] <pchoo> I'm tasked with building a timeline chart out of our support ticket data
[17:28:00] <GothAlice> At work we do event tracking, with discrete timed events such as "ticket opened", or "redirect link clicked", with associated data pre-aggregated for O(1) performance reporting.
[17:28:11] <GothAlice> https://gist.github.com/amcgregor/1ca13e5a74b2ac318017#file-eek-py-L1-L27 is one example of this (including sample pre-aggregate)
[17:29:14] <GothAlice> http://s.webcore.io/142o1W3U2y0x < looks like this
[17:29:30] <GothAlice> (With the linked example code handling the top left click-through comparison chart.)
[17:30:43] <GothAlice> However, a bare timeline a la Facebook feed or whatnot… that's a series of typed events.
[17:30:55] <GothAlice> I.e. a fairly straight-forward query, not even needing an aggregate in most instances.
[17:31:08] <pchoo> unfortunately not quite the same graph I need
[17:31:37] <GothAlice> Just trying to supply ideas that might trigger a eureka for you. :)
[17:32:04] <pchoo> we are using sir portly, and have "counters", so each one has a state, with a start and end and minutes. The graph is per ticket, and a sum of the minutes for each state in a stacked bar chart
[17:59:40] <godzirra> Can I ask a mongoose question in here? Not sure where else to ask.
[18:04:38] <Derick> godzirra: best just to ask questions
[18:05:41] <godzirra> Derick: Thanks... I know that rule ;) I was just making sure that I was okay to ask a question that isn't 100% about mongo per se.
[18:06:10] <godzirra> Anyways.. my question. I have a newly created object that has a schema with references to other objects. I know how to call .populate() when finding an object, but how do I populate a reference when I already have the object?
[18:06:37] <silasbarta> So, question about performance: I have a collection indexed on a field that is a string with a URL. When I order by URL (which has many duplicates), it is *really* slow to get the next one. It’s slightly faster for indexes on timestamps, and *much* faster for indexes on regular IDs. Is there a reason for this? Is there a way around it? Like, mapping to an ObjectId that preserves url order?
[18:19:14] <reuf> hello - i have drivers(id, name, address) and i have vehicles(type,producer,drivers{driver1, driver2,...}) , one vehicle can be driven by multiple drivers - how would i model this in mongoose?
[18:19:32] <GothAlice> silasbarta: Lexicographical order… is typically going to be wrong for URLs, especially any that involve numeric elements. Additionally, without joins, MongoDB discourages the use of references except in specific situations, as they'll require inefficient secondary lookups. When examining the performance of any query, get an .explain() of it; it might help to gist/pastebin your query and .explain() results.
[18:21:03] <GothAlice> reuf: How will you be presenting that data to users? I.e. when looking at a car, will you need the names of the drivers, too?
[18:21:28] <GothAlice> Or are you looking for a simple fast "is X user a driver for Y car"?
[18:21:30] <reuf> GothAlice, yes, i would present him list of drivers and he will be able to pick which drivers will drive which vehicle
[18:21:46] <reuf> so i would need to keep track which drivers were asigned to which vehicle
[18:22:29] <GothAlice> Cool. So, ignoring mongoose for a moment, I'd do something like: vehicle = {kind: "Sedan", …, drivers: [ {_id: ObjectId(…), name: "Alice", assigned: ISODate()}, … ]}
[18:22:30] <silasbarta> @GothAlice K, will take a look. Only necessary to make sure that equal urls get grouped together, not actually necessary to do them in alphabetical order.
[18:23:53] <GothAlice> reuf: This represents a list of embedded (nested) documents, each "driver" embedded document comprising the ID of the driver (for lookup and management), their name (saving you a lookup when showing the list), and the date/time they were assigned as a driver to that car. (Possibly useful, but mostly a demonstration of what can be done.)
[18:25:07] <GothAlice> reuf: Now, the important operations become: $push to add a driver, $pull to remove a driver, you can use $elemSelect when querying to find data back only about certain drivers (i.e. identify just the time driver X was assigned to car Y), etc., etc.
[18:25:34] <reuf> GothAlice, it makes sense, thanks
[18:25:37] <GothAlice> reuf: Now, on the mongoose front, pro tip: just use the JS driver. Mongoose will, at some point, result in self-harm. (It does things that break expectations.)
[18:26:05] <GothAlice> For example, Mongoose has a tendency to store ObjectIds as strings. They're not strings.
[18:27:51] <godzirra> GothAlice: Really? I've never ran into that with mongoose. (Disclaimer: Not a super fan, but we use at work so...)
[18:28:22] <GothAlice> godzirra: From my experience assisting people using DAOs, Mongoose is the #1 cause of problems amongst those available.
[18:29:03] <GothAlice> (Besides, MongoDB 3.2 has document validation… so… having a separate schema layer seems entirely unnecessary now.)
[18:29:30] <godzirra> Mongoose definitely has it's share of problems. Just never ran into the one you mentioned.
[18:29:35] <silasbarta> oh and thanks for the pointers @GothAlice, didn’t menan to be ungrateful there.
[18:29:35] <godzirra> Circular dependency hell is my biggest mongoose issue.
[18:30:20] <GothAlice> Heh, suddenly the eggshells have come out. ^_^ silasbarta: No worries! It's just to assist with a query, one needs to see the query. And .explain() can tell you a lot about what MongoDB thinks it's doing. :)
[18:30:48] <GothAlice> godzirra: My favourite ever: a collection literally named "[object Object]".
[18:31:07] <GothAlice> How to fix that makes for a great pop quiz question. ;P
[18:31:16] <silasbarta> And why can’t you do a .explain() on a .count()?
[18:31:42] <GothAlice> silasbarta: Because count isn't explainable. It won't necessarily follow standard query processing.
[18:32:01] <GothAlice> I.e. if you try to get a .count() and it figures out it can use a single index (i.e. _id), it'll count index buckets. Not records.
[18:41:43] <godzirra> Hm. apparently I'm incorrectly populating an array of references. Well, incorrectly trying to. Since it's not working...
[18:41:46] <GothAlice> I think I started disabling it on recompile around 2.6 or so.
[18:43:17] <GothAlice> godzirra: Do the referenced objects span across multiple discrete collections or databases?
[18:43:34] <GothAlice> If not, an optimal approach to storage would simply be an array of IDs.
[18:44:04] <GothAlice> (Or, if you want to cache some of their values to save on lookups, an array of embedded documents which include an object ID field as previously shown.)
[18:44:05] <godzirra> GothAlice: They do not. It currently is just a list of ObjectIds.
[18:44:23] <GothAlice> Cool. So… uh… what's the issue? Document and query examples?
[18:47:55] <godzirra> The issue is it's not populating.
[18:48:19] <godzirra> I've tried populate('questions') and an example I found on staackoverflow where I try populating like: populate('questions.question')
[18:52:39] <StephenLynx> mongoose is the PHP of mongo
[18:52:54] <StephenLynx> a little bit worse,because at least PHP get stuff done
[18:52:57] <godzirra> Yes. We get it. No one likes mongoose.
[18:52:59] <GothAlice> Well, those aren't ObjectIds. While that might explain why it wouldn't work in ordinary software, in Mongoose, all bets are off.
[18:53:42] <godzirra> GothAlice: They're ObjectIds in the database and mongoose will change them back to ObjectIds on execution of the query. You can't really show an object id in javascript, I'm guessing.
[18:53:59] <StephenLynx> if I were to put on the same scale, mongoose would be the equivalent of not liking ebola
[18:54:00] <godzirra> StephenLynx: Great. Still not helpful and still not my choice. :p
[18:54:36] <GothAlice> ObjectId('hex') — an ObjectId actually represents several fields at once, and being able to access those fields is somewhat important. I.e. _id, if an ObjectId, eliminates the need to track creation time. Because it's already there. ;)
[18:55:23] <godzirra> So how would I replace the ids with their objects in plain mongo?
[18:55:40] <GothAlice> With a second query. (Because that's what it is.)
[18:56:53] <GothAlice> In p-code: for record in db.collection.find(…): record['questions'] = list(db.questions.find({_id: {$in: record['questions']}}))
[18:57:05] <GothAlice> That won't preserve the order, though.
[18:57:59] <godzirra> Thanks. I'll see what I can do to change it.
[18:58:45] <GothAlice> As another refactoring idea, instead of storing the ObjectIds and treating MongoDB as if it were a relational database (which it very much is not), store embedded documents that include the reference and whatever data you're _actually_ interested in. For example:
[18:59:50] <GothAlice> Depends how often the "cached" data (label in this example) changes, though.
[19:00:42] <GothAlice> Then you can entirely bypass the need for a second query and the processing needed to preserve order if desired. Amongst other goodies.
[19:01:19] <silasbarta> GothAlice: Here’s the past of the query with the explain: http://pastebin.com/4gtKH1b4
[19:01:34] <silasbarta> (some info redacted for privacy)
[19:01:48] <GothAlice> silasbarta: .skip() introduces a logarithmic performance degradation the higher the number is, due to index bucket scanning.
[19:02:09] <GothAlice> (I.e. it needs to traverse a tree of buckets, and the more it needs to traverse, the longer it takes.)
[19:02:28] <GothAlice> Generally the point of using $gt queries is to avoid the need to .skip() like that.
[19:02:56] <silasbarta> well that’s my other question, I don’t know how, exactly, mongoengine translates that into a query, but it has the same slowdown, so I assume this is the right translation of .objects().orderby()[500]
[19:03:00] <GothAlice> Or linear time if the query couldn't use an index.
[19:03:16] <silasbarta> But that’s the thing, this *isn’t* slow if the index is on an ObjectId field
[19:03:21] <GothAlice> [500] on there = .skip(500).limit(1)
[19:03:33] <silasbarta> k, so that’s the right translation then
[19:03:59] <GothAlice> Indeed, ObjectIds are compared very rapidly. (Effectively a string comparison of 11 bytes.) $gt on a larger string means it needs to do far more expensive string compares.
[19:04:42] <silasbarta> so then my earlier suggestion could work — add another ObjectId field obtained from e.g. hashing the url field?
[19:05:21] <GothAlice> ObjectIds have meaning. They're a rich structure.
[19:05:30] <godzirra> GothAlice: Probably a better idea.
[19:06:54] <silasbarta> But earlier you were saying the relative speed is due to ObjectIds being shorter, so any transformation that shortens the field would accomplish the same speedup, right?
[19:07:34] <GothAlice> Yes, but the approach you described would be bad beyond imagining, especially if at some point you enable BSON validation and your data disappears. ;P
[19:08:33] <GothAlice> As a note, hashed indexes are a thing, but they're hashes. Sorting on them would be non-sensical and pseudo-random. https://docs.mongodb.org/manual/tutorial/create-a-hashed-index/
[19:09:06] <GothAlice> However, speed-wise, that skip(500) is doing more to hurt you than searching/sorting on a string.
[19:09:10] <silasbarta> right, but as before, all that matters for this use case is that I preserve equality — docs with the same url field have the same hashed field
[19:09:36] <silasbarta> order doesn’t matter, but grouping by the same url does matter
[19:09:50] <GothAlice> So, give it a go. See what the difference is. :)
[19:10:17] <silasbarta> Thanks for the pointer to hashed indexes, totally didn’t know about that :-)
[19:11:10] <silasbarta> But I still don’t understand what you mean about this being bad beyond imagining and how BSON validation would drop the data?
[19:12:23] <GothAlice> I didn't mention your imagining, but the approach of abusing ObjectIds with invalid contents would be "bad beyond imagining".
[19:13:18] <GothAlice> If you enable BSON validation, invalid data such as a random-contents (hashed) ObjectId would either sorta-work (giving insane values for its internal keys such as creation time, node ID, process ID, and incremental counter), explode terribly, or just eat the value.
[19:13:59] <GothAlice> Instead, just store binary data as binary data, and don't try to wrap it in another type. (Attempting to wrap it in another type benefits nothing, hurts everything.)
[19:17:07] <silasbarta> yeah — if mongo attributes meaning to ObjectIds and acts on it when I just care about the contents, then that would be a bad thing. But yeah, hashed indexes look to be exactly the thing for this use case! :-) (-:
[19:18:15] <GothAlice> I'd be curious to know the difference in the resulting query performance for this use case, silasbarta.
[19:18:25] <GothAlice> (Any optimization without measurement is by definition premature. ;)
[19:22:32] <silasbarta> so, premature question then ;-) If I have an index on the url, and a hashed index on the url, and I do this query again, which orders by the url … mongo will still think it has to order by the real url and be slow. But I actually want it (against all reason) to order by the hashed url.
[19:23:33] <silasbarta> what I’m really trying to do is group them into bags such that docs with the same url aren’t split across bags
[19:23:45] <GothAlice> silasbarta: When in doubt, try it out, and slap an .explain() on it. :)
[20:30:55] <GothAlice> As a note, I have more than 30 terabytes of data in GridFS, with it's highly efficiently served directly by Nginx. S3 just adds "yet another thing to manage, configure, and monitor" to the equation, which is something I heavily try to avoid given MongoDB's natural capabilities.
[20:31:14] <GothAlice> And now if yoofoo ever pops back in I can copy/paste that.
[20:33:18] <GothAlice> magicantler: Never have indexes "automatically" created. That way lies madness and losing access to your data for indeterminate amounts of time.
[20:35:34] <GothAlice> The transparent proxy in use slurps data into GridFS, too. (My devices proxy, the proxy records everything they ever access.)
[20:35:40] <GothAlice> But that's a custom Python thing.
[20:42:44] <GothAlice> magicantler: When is a three minute deploy a five hour deploy? When you add an index and forget that your ODM/DAO automatically construct indexes on startup. A mistake one only makes once. ;P
[20:56:33] <funhouse> Hey guys, just installed on windows, having an issue authenticating, I can connect but cant auth. using user: admin pass:pass
[21:42:01] <Freman> good morning ladies, gents and gerbals of all ages...
[21:42:27] <Freman> can anyone think of a way to deliberately slow down mongo? I want to test some timeout behaviour :D
[21:45:51] <GothAlice> Freman: MongoDB features a testing mode which may accomplish what you wish. The typical approach, though, is to spin up as many processes randomly inserting, updating, and fetching, as needed to reach the load levels desired.
[21:47:25] <Freman> Thanks GothAlice, I'll investigate this testing mode first - on account of not wanting to thrash my poor laptop :D
[21:50:02] <Freman> I wonder if I can take a nap while these tests run - you know, the whole "it's compiling" excuse lol
[22:11:05] <GothAlice> Freman: Oh, and if you're on Mac, you can use the Network Link Conditioner system preference pane to set up things like "I'm now on a 3G connection with 1.5% packet loss" testing.
[22:12:06] <GothAlice> Or "everything is fast, except DNS takes three seconds", etc., etc.
[22:12:06] <Freman> db.log_entries.find({preview: /actionGet/, "$where" : "sleep(100)"}) - I'm not sure what unit of time it's using but 100 seconds have passed...
[22:12:22] <Freman> yeh I am on osx but everything is local
[22:12:59] <GothAlice> Testing hardware? No, that preference pane modifies your mac's network connection to simulate whatever bandwidth, latency, or packet loss you want.
[22:13:08] <GothAlice> https://docs.mongodb.org/manual/reference/command/sleep/#dbcmd.sleep < sleep is a command, not a query component
[22:13:20] <Freman> which only works if you're testing over a network, not to localhost
[22:13:36] <Freman> yeh but sleep also works in $where :D
[22:14:18] <GothAlice> But that doesn't do anything to "load" the server, vs. the real sleep command which actually sets global locks.
[22:14:47] <Freman> that's ok, I'm testing the connection timeout behaviour of the client
[22:32:12] <alasi> Hey, mongodb question. I'm doing collection.find().sort({date: -1}).limit(1).each((err, item) => {}) but it doesn't limit it to just one entry. It finds a null entry first, messing up my promise resolves! What's going on here?
[22:43:27] <m3t4lukas> alasi: this query will in current versions of mongod most likely always return the first document you insterted into the collection. What do you mean by null entry? It at least has to have an ObjectId :P
[22:43:45] <alasi> It's fine though, I can compare against null
[22:55:05] <Freman> p.backChannel.Clone().DB("admin").C("$cmd.sys.inprog").Find(bson.M{}).One(&meh) works
[23:10:23] <johnzorn> I'm trying to use the $inc operator in a findAndModify on a mmapv1 db and I'm receiving "not valid for storage engine" I can't find anything that says $inc should not work...