[01:42:09] <mrkirby153> Okay, so I'm having an issue with the java MongoDB library
[01:43:31] <mrkirby153> Every time I do something on a collection (e.g Collection#findOneAndReplace()) it appears to be spawning a new connection to the DB and not closing it
[01:45:18] <cheeser> there's a connection pool inside MongoClient
[01:45:43] <cheeser> you don't want a connection opened and closed every time.
[01:46:59] <kurushiyama> mrkirby153: It is not unusual to have several hundred to even thousands of connections open to a MongoDB instance. Compared to other DBMS, connections are pretty cheap.
[01:48:04] <mrkirby153> Wanted to give MongoDB a spin
[01:48:39] <cheeser> multiple connections to the db is perfectly valid even for MySQL. You should just a connection pool with the driver. in mongodb's case, we provide the CP as part of the client/driver
[01:48:53] <kurushiyama> mrkirby153: Well, one pretty simple peace of advice. MongoDB is __very__ well thought out. So if you encounter a problem, dont doom MongoDB, but seek help.
[01:49:10] <mrkirby153> Yeah, I may become a permanent resident of this channel XD
[01:49:28] <mrkirby153> I tried mongodb but then moved away from it because it was "Not MySQL"
[01:49:28] <Boomtime> @mrkirby153: in case it wasn't clear; keep the MongoClient object persistent - don't go creating/destroying these
[01:51:58] <cheeser> mrkirby153: you can configure it via whatever logging framework you're using. http://mongodb.github.io/mongo-java-driver/3.2/driver/reference/management/logging/
[03:00:07] <lotus> Hey guys, support for mongoose in here? I'm trying to get at Model.findById in a pre-hook and it's saying undefined...
[04:01:07] <kalx> Hi all. Is there anyway to insert a record into mongo with a field containing the current db server date/time? I saw there is a $currentDate operator but seems this is for updates only.
[07:55:29] <Zelest> brain is not working.. how can I find a document, and unless it exists, create it?
[08:01:19] <ravenx> im' kinda confused as to what upsert does in mongoimport
[08:42:07] <Folkol> Hello. I just stumbled upon the fact that I can get optimistic locking with MongoDB, which is great. But I also read that Mongo will retry on WriteConflict... This is not good enough, is it possible to disable the automatic retry? (Or supply custom merge-code in case of conflict?) I have tried looking this up in the doc, but failed.
[09:03:28] <tpayne> Is it possible to count on an aggregate?
[09:08:40] <lipiec> Hi! Is there some mailing group to which I could subscribe and be notified about new version release?
[09:11:00] <Folkol> tpayne: I am not sure that I understand what you are asking for, but check out "accumulators".
[09:11:29] <tpayne> Folkol: just wondering if i can count instead of returning results of an aggregate
[09:11:45] <tpayne> right now i have the count the documents in code, would like to optimize it
[09:12:57] <Folkol> You can group the results of aggregate. There is an example here: https://docs.mongodb.org/manual/aggregation/
[09:39:15] <Ceunincksken> Hello all, I'm trying to setup a sharding environment, but in a proof of concept, adding a new shard simply slows down the insert performance. How is that possible?
[09:40:06] <ravenx> could be a poorly picked shard key
[09:40:27] <Ceunincksken> I've picked a hashed value of the '_id' key, so seems good to me.
[09:41:21] <Ceunincksken> I'm using 5 shards and the sharded collection takes rougly 2000 documents per collection. This is correct since in total I'm importing 10.000 documents. So according to me, the shard key is correctly chosen, or should I pick something else?
[09:41:54] <ravenx> as in your chosen key** is good
[09:42:46] <Ceunincksken> So, what could be the problem then of my performance drop on inserts?
[09:43:25] <ravenx> journalling, writeconcerns or index creation maybe
[09:43:29] <ravenx> normally i create the index after i insert.
[09:44:09] <Ceunincksken> Sorry but I don't quite understand. I have an empty collection in which I import critical data, so journalling must be set to true to ensure that the data is being imported.
[09:44:39] <Ceunincksken> And I can't create the index after I insert because sharding does need an index. Any suggestions?
[09:54:36] <Ceunincksken> It's really driving me crazy. I tought that insert speed should go up when using sharding on Mongo but instead it goes down :-(
[13:04:53] <Ceunincksken> I have setup 10 virtual servers on a single host. Each of this server does have 512 Mb Ram, 1 CPU (1 Core) and 4 Gb disk space (Proof of Concept).
[13:05:05] <kurushiyama> Tomasso: Sorry, I dont know poo about Ruby, except that I refused to learn it. Just plain deduction from my side ;)
[13:05:10] <Ceunincksken> I've created 3 configuration servers, 1 router, and 6 shards.
[13:05:23] <kurushiyama> Ceunincksken: So far, so good.
[13:05:38] <Ceunincksken> I have a collection which I'm sharding on a hashed value of the '_id' field.
[13:05:50] <kurushiyama> Ceunincksken: So far, so suboptimal.
[13:06:04] <Ceunincksken> The sharding key seems to be good. When I insert 60.000 documents, each shard does roughly contain 10.000 documents.
[13:06:44] <Ceunincksken> And now the problem, if I remove alet's say 4 shards (so working with 2 shards), I don't see a performance drop in Mongo. On the other hand, when I'm adding 5 more shards bringing the total on 11, I don't gain any performance gain.
[13:07:04] <Ceunincksken> In fact, using a single server (non-sharded) is performanter than using a sharded environment (with 6 servers). How can that be the case?
[13:07:10] <kurushiyama> Ceunincksken: How should there be a agin
[13:07:23] <kurushiyama> All shards share the same underlying storage
[13:08:10] <kurushiyama> Clearly making it the limiting factor of your installation.
[13:08:25] <Ceunincksken> But it's not a single disk.
[13:08:35] <kurushiyama> Ceunincksken: But the same controller.
[13:08:40] <Ceunincksken> The host have hundreds of TB available.
[13:08:46] <Ceunincksken> Probably the same controller yeah.
[13:08:54] <kurushiyama> Ceunincksken: we are talking of IOPs, not space
[13:09:14] <Ceunincksken> But is my assumption true than then I cannot get a performance gain in a virtual environment when the storage is on the same controller?
[13:09:19] <kurushiyama> And from what I hear, your cluster is not the only setup running on this host
[13:09:42] <Ceunincksken> That's correct. We do have a single host server on which various applications are running.
[13:09:48] <kurushiyama> Ceunincksken: You can not get a performance gain in any virtual environment unless you really know what you are doing.
[13:11:06] <Ceunincksken> One additional question to test if my setup is rigt. Would I see a performance gain when I'm installing Mongo on my local machine and adding 1 extra disk? 1 shard would then be on the same space as my OS while the other shard would be on a seperate disk.
[13:11:39] <kurushiyama> Ceunincksken: As a rule of thumb: have one mongod on one machine.
[13:12:12] <kurushiyama> Ceunincksken: RAM is an important factor. WT per default takes 50% of the available RAM just for caching.
[13:13:00] <Ceunincksken> How about installing 2 virtual servers on my local machine where each server does get it's own physical hard disk. Would than give me a performance gain? (Again, this is just for testing purposes).
[13:13:01] <kurushiyama> You dont want your instances battling for resources.
[13:13:24] <kurushiyama> Ceunincksken: In general, I suggest testing against a replset
[13:13:58] <Ceunincksken> Why testing again a replica set to see if sharding does get an performance gain?
[13:14:37] <kurushiyama> Ceunincksken: That's a bit like testing wether MongoDB can store documents
[13:15:01] <kurushiyama> Ceunincksken: Yes, you do get a performance gain – if you provide more performance, and not sharing the existing.
[13:15:46] <Ceunincksken> Ok, but in a Virtual Environment, like our customer, each mongod instance should have it's own storage controller. Is that assumption correct?
[13:16:21] <kurushiyama> Ceunincksken: If IOPs are your limiting factor – yes.
[13:16:41] <Ceunincksken> Ok, but how can I find out if IOPS are the limiting dactor?
[13:17:01] <kurushiyama> Or cloud manager, as it is called nowadays
[13:17:35] <kurushiyama> Ceunincksken: Or any other monitoring tool.
[13:18:01] <kurushiyama> Ceunincksken: you need to create a baseline, finding out what the max IOPs are your "disks" can take
[13:18:21] <Ceunincksken> Sorry but I'm quite new to this and I'm a bit confused.
[13:18:47] <kurushiyama> Ceunincksken: Now if your monitoring tool reports that you are constantly hitting this limit, you know your IOPs are the problem.
[13:19:21] <Ceunincksken> Would you mind helping me a bit with this?
[13:19:23] <kurushiyama> Ceunincksken: Ok, it is clear what to use for HW monitoring?
[13:22:00] <Ceunincksken> I know that this might not be the correct place but it's quite important and I'm searching already for 3 weeks why I'm having this behaviour.
[13:23:10] <kurushiyama> Ceunincksken: This is getting OT, lets go to private
[14:40:48] <Zelest> someone got bored of irccloud :o
[15:11:48] <kashike> is there a more in-depth upgrading guide for java driver 2 to 3? inserting/updating/querying is what's blocking me, specifically
[15:31:21] <Jester01> Hey folks, mongodb takes minutes to start up, burning cpu. Is this normal and can something be done about it? (v3.0.2, linux, mmapv1)
[15:34:51] <cheeser> kashike: ok, good. just making sure you'd seen that. what's missing for you?
[15:45:56] <kashike> cheeser: one moment, I'll re-open the project and branch and find the troubles I was having again
[15:50:39] <kashike> cheeser: hm, interesting - perhaps I was using an older v3 edition, but methods seem to be back that were missing
[15:50:51] <kashike> these were missing, along with others, before: http://api.mongodb.org/java/2.13/com/mongodb/DBCollection.html#save-com.mongodb.DBObject- http://api.mongodb.org/java/2.13/com/mongodb/DBCollection.html#update-com.mongodb.DBObject-com.mongodb.DBObject-
[15:57:25] <cheeser> kashike: it's possible we might've axed too much accidentally and had to course correct :)
[15:58:27] <kashike> looks like that might be the case - super glad to see these methods back, I ran into so many headaches with the bson Document stuff
[16:06:50] <cheeser> oh! to track versions of each value in different languages?
[16:07:18] <GothAlice> Correct, basically to pivot into a per-language pool of sub-documents efficiently.
[16:08:26] <GothAlice> Most ODMs I've encountered handle translated strings _very_ badly. I.e. turning the field into a mapping of {lang: "String"} or individually into arrays of per-language sub-documents, which… eliminates any possibility to efficiently pull out a document in a specific language.
[16:10:29] <GothAlice> StephenLynx: Don't worry, though, mine isn't meant to be a layer between your app and pymongo, it's meant to augment your use of pymongo. ;)
[16:11:04] <StephenLynx> meh, I don't bother too much with what happens outside my neighborhood.
[16:13:24] <StephenLynx> I don't particularly like the idea of query building.
[16:13:33] <StephenLynx> that is, generic query building.
[16:13:58] <StephenLynx> if its a specific query for a specific operation then you might as well just take in the values that replace certain parameters of the query
[16:14:02] <GothAlice> Once I teach the Op and Ops classes how to evaluate themselves, I'll have a complete in-Python version of the MongoDB query format, allowing things like validation before sending to MongoDB, execution of aggregate pipelines in-Python, etc.
[16:14:04] <StephenLynx> like I started doing with sql based dbs.
[16:14:21] <StephenLynx> I also started distancing myself from OOP
[16:14:27] <StephenLynx> specially for web back-ends.
[16:14:38] <GothAlice> OOP is a structural choice, not much more.
[16:15:09] <GothAlice> In my case I need Python-side validation of Mongo queries (validation schema) to validate MongoDB update operations themselves. It's weird and meta.
[16:15:37] <GothAlice> (I'm attempting to export .find() and .update() to the web in a safe, schema-defined way.)
[16:47:33] <exonintrendo> I'm fairly new to mongo, I've used it once or twice but not necessarily to create the 'structure' for a new application. I'm working on building a program that does basic monitoring on scripts that have run. I have the idea of a Group, Item (items belong to a group), and an Alert (alert is specific to each item, alert or email user if the item's value passes a threshold). I'm not sure if these should all be their own documents or if they should all
[16:47:33] <exonintrendo> just be nested within eachother in one Group document.
[16:48:34] <exonintrendo> Hopefully that makes sense.
[16:49:41] <exonintrendo> I'll read, but my initial though (which may also be wrong) was to have a Group contain an array of Item IDs, and Items contain an array of Alert IDs. Would that be correct?>
[16:50:10] <StephenLynx> that. embedding is a valid tool, but abusing it is worse than dealing with multiple collections.
[16:50:39] <exonintrendo> And is it fair to think of a 'collection' as a 'table' in a SQL context?
[16:51:32] <exonintrendo> I'll read up on this post, thanks for the quick info!
[16:51:41] <StephenLynx> it is your main data aggregation layer, after the db, like in sql.
[16:52:30] <StephenLynx> but thats pretty much the only similarity, afaik.
[16:53:38] <StephenLynx> data is not divided into lines that each occupy the same space and follow the same model, you don't reference them with foreign key, they can store complex data structures
[16:54:27] <kurushiyama> exonintrendo: After that, we need a little more info. What is an Item? Which fields would define an Item? Are there just more than a couple of dozen items per Group? etc.
[16:55:51] <exonintrendo> kurushiyama: as i said its a basic monitoring service I'm writing for myself. Let's say you have a group of 'Sites' which you want to have a script ping each item in the group. Let's say I have example.com as an item, and it has an alert of 300 seconds (that is: if the item's time doesn't get updated in the database after 5 minutes, it will send an alert).
[16:56:51] <exonintrendo> So a group is mainly a title, description, and a list of items, an Item is a title, type (value which could be age, or a numerical value) and it has a set of alerts that trigger on that value. So maybe in the case of the ping, I email myself after 5 minutes, but a second alert triggers after 10 minutes that sends me a text message
[16:59:03] <kurushiyama> In that case, I'd probably use a Group collection, an item collection and a Ping collection. Then finding an alert is pretty easy using aggregations.
[16:59:32] <kurushiyama> exonintrendo: Or even simple queries.
[17:01:22] <kurushiyama> db.pings.find({group:"fooGroup",date:{$gte:nowMinusThreshold}). If the result set is empty, you have an alert. Note that this is just one way of doing it.
[17:03:53] <kurushiyama> exonintrendo: s per the query: of course you can use item instead of group, or a combination, or run aggregations and save all found alerts to an "alerts" collection...
[17:16:00] <kalx> Hi all. Is there anyway to insert a record into mongo with a field containing the current db server date/time? I saw there is a $currentDate operator but seems this is for updates only.
[17:16:19] <StephenLynx> new Date() or the equivalent on our language
[17:18:06] <kalx> So the date must be generated by the on the caller's side? No mechanism for Mongo generating it? (similar to mysql's NOW() as an example)
[17:18:28] <cheeser> dates are stored in UTC so i doubt you'll have a problem.
[17:24:09] <kalx> thx. It's more just to guarantee accuracy in case a server's time is not in sync with other servers (even seconds can matter).
[19:49:24] <matthias__> i, how can i efficiently get about 25 random item set from a collection with ~ 500.000 items ? (with optional search parameters)
[19:53:51] <DrPeeper> you want to find 25 random documents from a collection?
[19:55:49] <StephenLynx> what is the use case for that?
[19:55:55] <DrPeeper> i'd like to index that, but instead should I have a second collection?
[19:56:31] <StephenLynx> I think your query is slow because skips are slow.
[19:56:34] <DrPeeper> StephenLynx: random sampling is genearlly used for research if the sample is respective of the population
[19:57:01] <DrPeeper> but in the case of a database, you can do quantitative research on the entire population, probably faster than taking the initial sample...
[19:57:11] <matthias__> i have a large database of image-urls and i want to pick 25 randomly
[20:02:11] <DrPeeper> yeah, could have just put a sequential key in 0..n
[20:02:20] <starfly> you could have one app thread compile a list of random IDs and stick them in a collection, then another thread (the one you want to perform) perform the finds with a covering index
[20:03:55] <DrPeeper> yeah, the compute would be (n log (n))(log (n))
[20:11:40] <DrPeeper> so should I create that second collection, or should I index the element?
[20:12:15] <kurushiyama> I am not convinced skips are slow. I tried it on a 10M docs collection with the empty query, a skip of 5M and a limit of 1m. took just a few msecs
[20:13:07] <kurushiyama> DrPeeper: So it can be _one_ of 30 values, like an ENUM?
[20:14:17] <DrPeeper> kurushiyama: one of 30 three-char alpha .. e.g. RWT, CAE, BBQ
[20:14:29] <kurushiyama> DrPeeper: If it is a more or less fixed number of values, like an ENUM which get's rarely changed (if at all), I'd index said field.
[20:19:54] <kurushiyama> Well, the good thing about agencies and federal regulations is that even when they announce a change now, it is likely that we are going to be retired by the time they are implemented.
[20:22:33] <kurushiyama> DrPeeper: But to answer your question: Yes, use an enum (or the according structure of the lang you are using) for those values, simply write them to the collection, and have that field (compound)indexed, if necessary.
[21:31:51] <micom> hello guys, its really a beginner question, l use mongodb 3.2 and the default engine is wiredtiger, there is a memory engine but in beta stage, in my production code I cannot accept beta software. Is placing the database in ram disk a good idea with wiredtiger?
[21:32:07] <micom> i mean on the partition create in RAM
[21:33:46] <cheeser> the memory file store is not intended for production use at all
[21:34:17] <micom> yes, i mean to use wiredtiger but place the database files in the partition which is linux ram partition
[22:12:54] <neybar> is there a way to tune mongo to have the id use a timestamp with microseconds? I'm doing api logging, and I'd rather not add an additional field since the id already has most of the info I want.
[22:34:43] <kurushiyama> I'll plan my world domination as a benevolent dictator based on that feature! Now I only have to find how to leverage it for taking over the world... ;)
[22:35:15] <cheeser> all you need is ... wait for it ... time!
[23:16:07] <silasx> So … I have one collection with an indexed (obviously) id field of 12 bytes. I have another one with an indexed binary field of 12 bytes. .skip() on the latter is much much slower despite it having the same .explain() strategy. What could I be doing wrong? There’s no reason such iteration should be slower, right? Bytes are bytes. I’ve thought of just making the latter itself be an ObjectID field instead of binary but was told that ObjectIds are t
[23:16:08] <silasx> to have semantic information and thus are dangerous to use. So ...
[23:17:40] <StephenLynx> from what I know about skips
[23:27:56] <silasx> but that’s the thing, we had really slow skips for indexed attributes.
[23:28:23] <StephenLynx> were you sorting by the index?
[23:28:26] <silasx> Fields I mean. We thought it was because they were long strings (~80 char), but even when we turned them into 12-byte binary fields, they were just as slow
[23:29:03] <silasx> ObjectIds are 12 bytes too, but (as primary keys at least) miraculously fast
[23:33:55] <kurushiyama> StephenLynx: But the index is traversed, if present, which was basically my point. So it is not necessarily the documents which are traversed.