PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 27th of November, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:31:46] <diamonds> hey guys I'm getting mongo set up again on my ubuntu machine, first time in a year or so
[00:31:51] <diamonds> Just wanna say http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/
[00:32:00] <diamonds> could someone PLEASE make this more complicated?
[00:32:13] <diamonds> there's only like 15 different commands to get mongo installed
[00:32:23] <diamonds> I bet we could get that up to 25 without too much trouble
[00:33:43] <GothAlice> diamonds: Huh, I always thought my one command was too little. I must needs model my Gentoo installation policy on this Ubuntu one; like those airline safety instructions that go longer in French than in English, clearly I must be missing out on something. :)
[00:34:26] <diamonds> sorry, turns out they just didn't put a short version in there
[00:34:39] <diamonds> GothAlice, what's the 1 line on gentoo btw?
[00:34:52] <joannac> diamonds: ? I see 5
[00:35:19] <diamonds> you begrudge me a little hyperbole?
[00:35:27] <diamonds> no I think the issue was my not reading closely enough
[00:35:33] <GothAlice> diamonds: emerge mongodb
[00:35:47] <GothAlice> (Press enter twice, wait a few minutes, bam, you have MongoDB.)
[00:35:48] <diamonds> GothAlice, well I can do that too if I don't need the newer stuff
[00:36:01] <GothAlice> diamonds: Need newer? emerge mongodb --autounmask-write :)
[00:36:01] <diamonds> I assumed it was "here's the simple install steps" but there's more there (install specific, pin version etc.)
[00:36:30] <GothAlice> diamonds: Can even check out the version from git: emerge =mongodb-9999 --autounmask-write
[00:37:17] <diamonds> ty everyone!
[00:37:22] <diamonds> GothAlice, that sounds sublime
[00:37:25] <diamonds> I'll have to look at gentoo
[00:37:36] <diamonds> but "autounmask-write" is a bit clunky....
[00:38:16] <GothAlice> diamonds: Packages marked as not being stable (i.e. "needs testing for production use") are "masked". --autounmask will tell you what needs to be unmasked, --autounmask-write will actually commit the needed changes for you.
[00:38:29] <GothAlice> diamonds: If you ever have any Gentoo questions, feel free to PM me any time. :)
[00:56:34] <Derek57> Is it wrong of me to want to force .hint() for a production query? The query optimizer would rather pick _id than a compound index with a 2dsphere and an array, and might just be me but I found that a bit odd.
[00:57:59] <GothAlice> Derek57: Does that index actually apply to the query?
[00:58:05] <GothAlice> (What's the .explain?)
[00:58:28] <Boomtime> it can be necessary to hint in circumstances where the empirical tests made by the planner could not reasonably resolve a conflict reliably
[00:58:34] <Boomtime> however, you should always investigate why
[00:58:56] <Boomtime> it is more likely there is something wrong with the query, or the index is inappropriate or unusable
[00:59:36] <Derek57> Yeah, the first two parameters of the compound are the first two of the query. However, there is more to the query, which I'm guessing might be it. Let me get the .explain, just waiting on a buddy to finish something up that he's trying.
[01:02:53] <Derek57> What just got me is that it chose _id over the compound, even though the compound was in allPlans. Either way, getting that .explain(). )
[01:14:33] <Derek57> GothAlice: My mistake, theres 4 entries in my compound index, which are also the first 4 in the query. Then the query has four other keys it searches by which I was under the impression would be fine as I haven't decided if I want to include them (as this query doesn't happen as often as just the first four).
[01:14:33] <Derek57> Mind my extra indexs in allPlans, just doing some testing here and there. My compound index is called "testSearch".
[01:14:34] <Derek57> http://vpaste.net/wtuVI
[01:14:42] <Derek57> Boomtime: ^ :)
[01:16:50] <Boomtime> ah, geo.. can you provide the original query that produced this?
[01:18:27] <Derek57> I believe I can export it, two seconds.
[01:19:38] <GothAlice> Boomtime: I think I'm missing something; the indexBounds on _id of 1 and 1… what does that represent?
[01:23:17] <Boomtime> it is most likely a sign that the index was not bounded at all - it did not apply bounds, it was a complete scan
[01:24:00] <Boomtime> i have no idea why it would do that for _id though, this whole explain output makes no sense to me yet - i am hoping the query sheds some light
[01:25:46] <Derek57> Sorry for the delay
[01:25:48] <Derek57> http://vpaste.net/5tecP
[01:27:17] <Boomtime> uh-huh...
[01:28:06] <Boomtime> your query specifies parameters that include almost every document no matter what index it uses
[01:28:55] <Boomtime> at a mongo shell, please switch to this database and type: db.COLLECTION.getIndexes()
[01:29:14] <Boomtime> where COLLECTION is the collection name you are using
[01:32:15] <Boomtime> also, that is not your whole query
[01:32:32] <Boomtime> there is a sort (orderby) parameter that you did not include
[01:33:12] <Boomtime> it is ordered by _id - that is why that plan wins
[01:34:06] <Derek57> I'm going to respect all my code, you're most likely right that I've done something wrong here. I'll come back if there's no solution that I can find. Sorry for bugging ya. :)
[01:34:11] <Derek57> *reinspect
[01:34:30] <Boomtime> something is wrong here though, the plans return different result sets.. this is so very strange
[02:05:13] <Derek57> Boomtime: So we tried solving the issue, didn't seem to work. We created a link to check out. It has two keys in the results that are of use to this question, explainResult and query. Both are just exports from our PHP script. Now, I'm considering something, we still have a single index for our geo index, but with the query that's being executed I was under the impression the compound would win. Either way, thanks for checking this out. :) http://tinyurl.
[02:05:13] <Derek57> com/phobesf
[02:05:15] <Derek57> ** http://tinyurl.com/phobesf
[02:06:05] <Derek57> Instead of _id winning in this example, it's the single index. No idea what was fixed to rid of _id (I'm not directly the one coding this.)
[02:06:54] <Boomtime> at a mongo shell, please switch to this database and type: db.COLLECTION.getIndexes()
[02:07:07] <Derek57> Two seconds.
[02:08:04] <Derek57> Boomtime: http://vpaste.net/qNLDC
[02:11:22] <Derek57> I do again apologize for the mess, were in the middle of a massive rewrite. Our previously dev wasn't very... Fantastic.
[02:11:32] <Boomtime> yeah.. nothing matches up
[02:11:46] <Derek57> Doesn't match up?
[02:13:14] <Boomtime> the only explain output that avoids a scanAndOrder is the _id.. but I have yet to see an orderBy directive
[02:13:57] <GothAlice> Derek57: Which MongoDB client driver are you using, and are you using an abstraction layer on top of that? I recall recently having an issue with orderBy({_id: 1}) being added silently by an ODM last week.
[02:14:52] <Boomtime> however, the explain output is internally contradictory - i do not understand it - possibly/probably just a formatting thing
[02:15:50] <Derek57> 1.5.8 PHP Mongodriver. And if it was sorting by _id wouldn't it of won being the main index?
[02:16:28] <Derek57> And is something I could possible do to improve? Were literally just json encoding it and then outputing it.
[02:16:41] <Derek57> *improve the explain
[02:20:48] <Derek57> Should I just screw it and use .hint? I'm getting to that point, atleast until we do a reinstall and can start from scratch.
[02:22:29] <Boomtime> is it faster?
[02:22:42] <Boomtime> i mean, if you run the query, does it finish faster?
[02:27:11] <Derek57> Boomtime: Literally half the speed.
[02:27:26] <Derek57> Much much better.
[02:28:04] <GothAlice> Derek57: You have measured, and found a valid optimization. Hinting in this case is, seemingly, a great idea.
[02:28:39] <GothAlice> Derek57: I'd re-evaluate after all those work-in-progress indexes get cleaned up, though.
[02:29:10] <Derek57> GothAlice: I completely agree, we'll definitely need to retest all this down the road.
[02:29:16] <GothAlice> Derek57: (Ye gads, you let someone create all those indexes in a production environment?)
[02:29:36] <Derek57> GothAlice: It's a very, very long story..
[02:29:54] <GothAlice> Hope they were at least initiated with background processing enabled…
[02:30:25] <Derek57> GothAlice: If I remember correctly, they were made before any data was inserted.. We all have our "noobish" times I guess.
[02:30:48] <GothAlice> *phew* Okay, so it isn't an egregious error. ;)
[02:31:37] <GothAlice> (Blocking all access in production during an unthinking index build is something one only does once. ;^)
[02:34:32] <Derek57> GothAlice: You got that right, a mistake I made a very long time ago and will never make again. XD
[02:37:27] <Derek57> I got a buddy asking this question as an example. Should we include _id in compound if we chose to sort by it? And it should be first?
[02:38:56] <GothAlice> _id is usually highly discriminatory (i.e. if you're using it, you're usually searching for one or a small set of IDs.), however in your case I'd omit order. The natural order should be roughly insertion order, no, Boomtime?
[02:39:54] <Derek57> It should be insertion order I believe. Newest to oldest if I'm not mistaken.
[02:40:12] <GothAlice> … well, no, natural would be the reverse of that, oldest to latest.
[02:40:35] <Derek57> Ah my mistake, it's a bit late here.
[02:41:17] <GothAlice> If you reeeeally want to sort on _id, you could include it at the end of the compound set; just make sure to actually use all of the previous fields in the compound to make use of it. (And make sure you have the direction set correctly for your sort, or MongoDB won't use it for sorting.)
[02:41:49] <Derek57> And if by something like a int, than first, right?
[02:42:03] <GothAlice> I don't understand that question.
[02:42:56] <Derek57> If to sort by an integer instead of _ID, you'd want that at the beginning of the compound correct? (Thinking I'm going to have to close up shop, I'm having a hard time phrasing my questions. xD)
[02:44:02] <GothAlice> (XP Nearly 2200 here, and I was up until 0400 this morning…) Sorting would be the last operation performed… I would think placing it as late as possible in the compound would be advantageous, unless *also* filtering on that field. Won't matter if it's _id or some arbitrary other field.
[02:48:15] <GothAlice> Derek57: What really will matter, for use in sorting, is the direction of the query's sort vs. the direction of the index. A compound on {created: 1, username: -1} when querying {created: {$gt: today-week}} and sorting by {username: 1} won't use the index to optimize the sort.
[02:50:54] <GothAlice> (Totally fake example; not sure why one might end up with a descending sort on username ever, but…)
[02:51:03] <Derek57> GothAlice: Well bloody hell, 4am? Must of been one busy night. Alright, I'm going to attempt to put this full compound index together. Just to clarify so I don't make a mistake; in order to use a compound index, do we have to use every key within the index for our query? Or can some keys be omitted from the query?
[02:51:12] <Derek57> GothAlice: Haha that's perfect fine, I understand. ^_^
[02:51:56] <GothAlice> Use of index subsets works on prefixes only. [foo, bar, baz] will fulfill queries against [foo, bar, baz], [foo, bar], and [foo].
[02:52:11] <GothAlice> I.e. if you want to skip "bar" in that, you lose the optimization of "baz" as well.
[02:53:25] <GothAlice> It's a narrow line to walk between creating a distinct index for every query (maximum query optimization, maximum disk usage overhead on index storage) and trying to construct "one index to rule them all" as a massive compound. (minimum optimization, minimum overhead) ;)
[02:55:59] <GothAlice> Derek57: I average three indexes, all compound, across my collections at work.
[02:56:14] <Derek57> GothAlice: True, we just have a lot of query possibilities so I'll admit we are trying to come up with that one or two index that "rule them all". xD We were kinda thinking of querying all keys all the time, and if they technically don't need to be used we'd just query a default parameter that would always return documents (i.e {$gte: 0}). Is that crazy to think of doing?
[02:56:54] <Derek57> GothAlice: How big? We have about 30 keys, 5-15 which we wanna filter through using indexes, through a collection of 2.1mil documents. :P
[02:57:06] <GothAlice> Crazy? No. There are valid use cases for even the most lunatic of idea. It does fail the obviousness test, though.
[02:57:33] <GothAlice> (And that type of optimization would require some heavy benchmarking to ensure it doesn't cripple your queries.)
[02:58:59] <GothAlice> Derek57: We only have a few hundred thousand records at work so far. (At home my dataset is a very different story. Billions of records totalling 25 TiB of data and more indexes than you can shake a stick at, since it's a metadata filesystem implementation. Most of those are highly sparse indexes, though.)
[03:02:36] <GothAlice> Same story at home: there's only a few primary (non-sparse non-metadata) indexes to handle hierarchy queries (compound), tag searches (multi-key), and full-text (which I rolled my own implementation of; this dataset dates back to MongoDB 1.4, and relational before migrating to Mongo.)
[03:05:35] <GothAlice> Interesting note: my filesystem driver implementation always hints metadata searches. My philosophy there was that since I know which metadata is being queried, why have MongoDB waste its time trying to figure out the optimal index to use each time?
[03:29:41] <Derek57> GothAlice: Alright I seem to be on a good roll. ETA time to build looks like... Forever, but I guess I kinda expected that, some of our arrays is kinda big. I'll see how it looks in the morning. Again, can't thank you and Boomtime enough, really getting me going on making this thing works.
[03:47:50] <tim_t> Can someone give me some morphia help?
[03:48:52] <joannac> what kind of morphia help?
[03:49:00] <tim_t> http://pastie.org/private/kgkhypsau1xvxyhboz6ixg
[03:49:41] <tim_t> I need help trying to do as described in the paste paste
[03:49:50] <tim_t> *pastie paste
[03:50:10] <tim_t> or if it is even possible to do that?
[03:51:08] <joannac> what does a document in that collection look like?
[03:54:05] <tim_t> I'll save one and paste it. one moment please
[04:15:19] <tim_t> @joannac : http://pastie.org/private/nzolpxpaq0gt5mwbvcqydw
[04:16:46] <joannac> oh i see
[04:17:30] <joannac> you can do it as one query (to get the list of reviewers match the account you want)
[04:17:54] <joannac> and then a query on users with $in : [all the user ids]
[04:17:57] <tim_t> yeah so if i have a bunch of these i want to be able to ask morphia "give me a list of users that have been assigned to this account" and have it do this without me having to iterate and copy
[04:18:19] <tim_t> okay
[04:18:32] <tim_t> i can query a query result?
[04:18:39] <tim_t> i did not know that
[04:22:29] <tim_t> is there an example you can point me to? All I am seeing in the documentation is doing one query but not another using $in
[04:41:19] <joannac> no, you can't query a query result
[04:41:57] <tim_t> right i see that
[04:42:03] <joannac> tim_t: the sample document you gave me, what collection is it from?
[04:42:52] <joannac> reviews?
[04:42:53] <tim_t> its an entry in a collection called Reviewers
[04:43:13] <joannac> okay
[04:43:51] <tim_t> the closest thing i have found is this : http://stackoverflow.com/questions/7727539/multiple-search-in-mongodb-with-morphia
[04:43:57] <joannac> so you can do a search on that collection like db.reviewers.find({"account.$id" : whateverobjectid})
[04:44:30] <joannac> and then from those results, extract the user.$id, put them in an array, and then db.users.find({_id: yourarrayhere})
[04:44:35] <joannac> that's 2 queries
[04:44:54] <tim_t> okay. makes sense. i'll try that
[04:51:43] <tim_t> Arg. well if I have to extract and put them in an array i might as well just stick with what i did before and save an extra step
[08:19:16] <crocket> Is it possible to insert a model in schema static method?
[13:01:28] <guest9999> hello. how would i search for a document match if i only have 'foobar' with this structure? http://pastebin.com/5a1PHbkC
[13:49:00] <scruz> hello
[13:51:34] <scruz> aggregation problem: i have a field, choice, that can take any value in the range 1-5. suppose i want to use this knowledge in a $group so that i’ll get the count of each record for each option (that is, 1, 2, 3, 4, 5). is there a way to do this?
[14:55:15] <davonrails> Hi everybody ! Is anyone know how I can order search results made with a regex with mongo ? I’d like to have best match first … (Doing a app searching through a list of city names, and I’d like if I search « Los » to have « Los Angeles » first before « East Los Angeles ». any idea ?
[15:20:39] <mesler> Greetings all. I'm learning mongodb & mongoose, and have run into a strange issue between one query and another. If I do a find() on a collection I have, I get a result that contains all the properties I expect to see, including a "name" field. When I query one of these objects by _id, however, I do not see the individual properties of the result. Is there something obvious I'm doing wrong, or has anyone else had this
[15:20:39] <mesler> issue? I must be doing it wrong. :/
[15:21:50] <mesler> So in the query where I find all objects, each object has a name property, but when I do a find by _id, the single object returned has no name property.
[15:55:28] <mesler> Meh, user error. Please disregard.
[16:05:18] <nothingPT> hello
[16:48:19] <dizlv> Sup, I have some logs in mongo that looks like {favorited_by: 'username', keywords: ['keyword1', 'keyword2']}. I want to aggregate them in such way that I can see how many times keyword was liked by particular user
[16:48:53] <dizlv> basically, smth like this: {favorited_by: 'username', keywords: [{word: 'keyword', count: 23}]}
[16:49:16] <dizlv> any ideas what will query look like?
[16:49:20] <dizlv> I came up with this:
[16:49:30] <dizlv> db.a.aggregate([
[16:49:32] <dizlv> {$unwind: "$keywords"},
[16:49:35] <dizlv> {$group : {_id : {word : "$keywords", user : "$favorited_by"}, count : {$sum : 1}}}
[16:49:36] <dizlv> ])y;
[16:49:49] <dizlv> but it does not group by user obvsly
[18:28:23] <freeone3000> What's the best way to reduce page faults?
[18:28:59] <GothAlice> freeone3000: Make sure you always have enough RAM to memory map the entirety of the on-disk stripes.
[18:29:21] <GothAlice> If this can hold true, you should only page fault new pages in once.
[18:29:49] <GothAlice> (Page faults are part of the standard operation of memory mapped files, FYI, and unless extreme are generally not concerning.)
[18:30:32] <freeone3000> I'm hitting 600 when running an analytics set. Average is around 4.
[18:30:47] <freeone3000> Err, that's in page faults per second.
[18:32:02] <freeone3000> It's 35.48GB of data. Does this mean that 15GB of RAM is insufficient, and I should upgrade to... what? 30? 60GB?
[18:32:03] <GothAlice> :/ Hmm, that usually means it's having to fire up the disk to load more data… a lot… for that query. If you're using $out stage or the "handle large aggregates on-disk" option, it'd need to page fault a lot in order to do its work.
[18:32:44] <GothAlice> freeone3000: The approach I take once the cost of scaling vertically (throwing more RAM at it) becomes prohibitive is spin up more shards. A sharded setup will split your data roughly in half between two hosts, in thirds between three, etc.
[18:32:57] <GothAlice> That'll let you use more, but smaller servers, and distribute the load around. :)
[18:33:10] <GothAlice> See; http://docs.mongodb.org/manual/sharding/
[18:33:34] <freeone3000> Yeah, I'm familiar. We opted not to do this because it's a replication set and sharding added an unimaginable degree of complexity.
[18:34:17] <freeone3000> Went from 8 servers to over 20.
[18:34:19] <GothAlice> freeone3000: https://gist.github.com/amcgregor/c33da0d76350f7018875#file-cluster-sh-L139-L142 — only in theory.
[18:34:31] <GothAlice> You have 20 replica members?
[18:34:48] <GothAlice> Wait, no, misinterpreting that. How many DB hosts do you currently have in your RS?
[18:35:05] <freeone3000> GothAlice: We have 8 replica members. Sharding the replica-set would double each, plus introduce a mongos to each region, for 16 + 4 + 3 config.
[18:35:37] <GothAlice> 8 members in the same datacenter? :/
[18:35:48] <freeone3000> GothAlice: Two us-east, two us-west, two ap-southeast, two eu-west.
[18:35:54] <GothAlice> Ah, *phew*.
[18:35:55] <GothAlice> :D
[18:36:09] <GothAlice> Are all of them actively serving slave-OK queries?
[18:36:25] <freeone3000> GothAlice: Yeah.
[18:36:39] <freeone3000> We did it primarily to reduce latency. Failover was secondary.
[18:36:46] <rendar> freeone3000: AWS?
[18:36:49] <freeone3000> rendar: Yep.
[18:37:02] <freeone3000> Each of those regions also hosts a server cluster. It's a replica-set so that user data is portable.
[18:37:02] <rendar> why do everyone use AWS?
[18:37:09] <GothAlice> rendar: I don't. They ate my servers and data.
[18:37:14] <freeone3000> rendar: cause it's really hard to buy singaporean servers otherwise.
[18:37:20] <rendar> GothAlice: lol!
[18:37:24] <GothAlice> rendar: Their SLA specifies "use multiple zones, that'll prevent cascading failures" — they lied.
[18:37:33] <freeone3000> Ah, yes, cloudfront. -.-
[18:37:34] <rendar> eheh
[18:37:57] <rendar> freeone3000: so you're saying that there aren't any other options?
[18:38:23] <freeone3000> rendar: There's no other *good* options. Azure doesn't offer a whole bunch of automated failover and automatic load balancing/scaling.
[18:38:29] <GothAlice> And for two weeks EBS volume state was "stuck" throughout US-East across all sub-zones. Servers don't like it when block volumes become unresponsive…
[18:38:50] <rendar> freeone3000: i agree on that, personally i hate OpenStack, its not even near to what AWS do
[18:39:14] <rendar> btw somethings in OpenStack are very good
[18:39:15] <freeone3000> rendar: Co-hosting has an incredibly large start-up cost for negliable pay-off - we don't have the ability to actually fly someone to Oregon, Singapore, and ireland (or really, anywhere on those continents) to manage a server for us.
[18:39:19] <GothAlice> freeone3000: We rolled our own automation, it speaks OpenStack and AWS.
[18:39:37] <freeone3000> So, we use AWS.
[18:39:40] <rendar> freeone3000: i agree
[18:40:02] <freeone3000> (Except in frickin' mainland china because the gosh-dang chinese government... but that's neither here nor there.)
[18:42:27] <freeone3000> Anyway, the solution to this problem is... throw more ram at the server?
[18:42:31] <GothAlice> freeone3000: Storage costs in the "cloud", though, are insane. For my home dataset: http://cl.ly/image/0t1x2Q2L1u0E — it costs less (after initial costs are paid off in 4 months) to self-host bulk data. So much less that I could replace every drive in the array every month and still be cheaper than most of them. (The "cloud hosted" MongoDB provider would cost me half a million a month!)
[18:42:38] <GothAlice> freeone3000: If you can't shard, yes. RAM is needed there.
[18:45:26] <GothAlice> (Without effecting the others.)
[19:01:45] <Derek57> Hi all, weird one. For those who were around yesterday, I'm in the middle of doing a mass cleanup of my Mongo instance. After deleting 3/4 of the indexes I previously had and doing so massive system cleanup, I'm attempting to create a compound index in the background that has about 11 keys in it through a database of 2.2 million documents.
[19:01:46] <Derek57> However, when I start the background index creation my instance goes to the gutter. One of my CPU cores spike to 100%, queries no longer respond, my server refuses to shutdown properly, and after about 5 minutes Mongoshell will stop working. On top of it all the creation of the background index is super slow, only doing about 1000 documents in 5 minutes (that was before I gave up and took the server down forcibly)
[19:01:47] <Derek57> I've confirmed with currentOp() that I'm for sure creating a background index and not one in the foreground (I copied the operation export), there's no writing being done to the server and hardly any read. Comparing to the single index sizes I had previously, I'm guessing the index will be about ~4-5GBs big, I have 40GBs of RAM, 48GB of swap, 4 virtual cores, and 20GB of SSD to play around with so it's not like the system is capping either.
[19:01:47] <Derek57> Has anyone heard of something like this before? Or know of any documentation about it? Been Googling around all morning, but nothing close to what I'm experiencing.
[19:05:25] <GothAlice> Derek57: Yes, sorta.
[19:05:52] <GothAlice> Derek57: http://docs.mongodb.org/manual/tutorial/build-indexes-in-the-background/#considerations
[19:06:08] <GothAlice> And: http://docs.mongodb.org/manual/core/index-creation/#background-construction
[19:06:30] <GothAlice> "However, the mongo shell session or connection where you are creating the index will block until the index build is complete."
[19:06:31] <cheeser> and 11 key index seems ill-advised.
[19:07:16] <GothAlice> Derek57: "If MongoDB is building an index in the background, you cannot perform other administrative operations involving that collection…" Also notably: http://docs.mongodb.org/manual/core/index-creation/#performance
[19:08:23] <GothAlice> There is also an interesting approach using offline secondaries to build the indexes in foreground mode to speed it up.
[19:08:27] <cloudbender> noob question : collection.insertOne is undefined
[19:08:34] <GothAlice> cloudbender: collection.insert
[19:08:45] <cloudbender> docs say insert is deprecated
[19:08:51] <GothAlice> cloudbender: Which driver?
[19:08:55] <cloudbender> node
[19:09:04] <GothAlice> -_-;
[19:09:18] <cloudbender> ^^ does not ease my pain
[19:09:34] <Derek57> Cloudbender: Maybe give this a try, it's a wrapper around the Mongodriver that makes the syntax a little more familiar. https://github.com/mafintosh/mongojs
[19:09:49] <GothAlice> cloudbender: https://github.com/mongodb/node-mongodb-native#inserting-a-document — the examples use .insert…
[19:09:58] <Derek57> GothAlice: Thanks for the links, I gave those a read this morning, maybe didn't read deep enough in to it, let me take a look. :)
[19:10:43] <cloudbender> GothAlice thanx
[19:11:15] <cloudbender> insert is deprecated, and insertOne is undefined. very frustrating.
[19:15:22] <cloudbender> I think maybe the docs are a bit poochy, is all.
[19:20:24] <Derek57> GothAlice: So if I read this all correctly, even with background creation being used important features for the collection will still be block periodically as the index is build in increments?
[19:47:38] <GothAlice> Derek57: Aye. And it'll do it's darndest to ensure any such blocks have minimal impact… which means it'll only run a chunk of index building when basically nothing else is happening. Making it *extremely* slow on busy severs.
[19:47:55] <GothAlice> Like, paint will dry faster.
[19:48:56] <Derek57> GothAlice: Hhahaha right right, that explains quite a bit. Seems like a new plan is in order, that secondaries bit got me thinking on a few things. :) Thanks again!
[19:49:24] <GothAlice> Derek57: Offline indexing via a secondary is a pretty smooth approach, if you can afford to bring a secondary offline to do it, of course.
[19:49:34] <GothAlice> And it never hurts to help. :)
[20:37:03] <arsaw> hello, i am new on mongodb, i think i start to understand it but i would like to have your opinion about to manage a specific issue.
[20:37:03] <arsaw> i have a Collection of users (with many differents informations changing very frequently)
[20:37:03] <arsaw> i have also a Collection of groups (with many others differents informations changing very frequently)
[20:37:03] <arsaw> Users could be linked to a group.
[20:37:53] <arsaw> If want to select all my groups and to know the numbers of users which are linked to this, what is the best way to get this result please ?
[20:39:23] <arsaw> I really dont want to remove my Users collection and put it in my groups because i have to do too much requests on it.
[20:40:53] <arsaw> I was thinking about a solution like this :
[20:40:57] <arsaw> db.system.js.save(
[20:40:57] <arsaw> {
[20:40:57] <arsaw> _id: "getGroupsWithTotalUsers",
[20:40:57] <arsaw> value: function toto() {
[20:40:57] <arsaw> var companies = db.companies.find();
[20:40:58] <arsaw> for (var i = 0; i < companies.length; i++) {
[20:41:00] <arsaw> companies[i].totalUsers = db.users.count({company: companies[i]._id.valueOf()});
[20:41:02] <arsaw> }
[20:41:04] <arsaw> return companies;
[20:41:06] <arsaw> }
[20:41:09] <arsaw> }
[20:41:10] <arsaw> );
[20:42:26] <arsaw> if somebody can gives me his opinion about it, he is welcome ;)
[20:46:09] <GothAlice> arsaw: First up, please don't paste chunks of code into IRC, it's generally considered terrible form. (Use a service like gist.github.com or pastie.org.)
[20:46:35] <GothAlice> (Mostly because people's anti-spam systems will cut you off and not everything will get seen, but it also bumps prior discussion up off the screen.)
[20:46:44] <arsaw> mwarf, sorry, first time on iRC :/
[20:46:59] <GothAlice> No worries, just a note for the future. :)
[20:48:01] <GothAlice> arsaw: Secondly, "stored procedures" aren't very MongoDB-ish. The majority of queries (excluding ones with $with, map/reduce, etc.) never need to execute JS to perform their tasks, which speeds things up quite a bit.
[20:49:07] <GothAlice> In terms of handling groups in my own code, I usually have a "short name" (slug, string identifier, or tag) associated with each. I.e. "staff.admin", "staff.user", "company.principal", etc. Those tags are recorded in a list on each User document. I.e. {_id: 'GothAlice', groups: ['staff.admin']}
[20:49:10] <GothAlice> This makes querying very easy.
[20:50:25] <GothAlice> Find the number of users in any given group: db.users.find({group: 'staff.user'}).count()
[20:50:44] <arsaw> at least, my main objective is to display a table with all the group properties (score, lastupdate, performance ....) many datas related only to this + only the total of users linked to it
[20:51:58] <arsaw> hmm ok for this db.users.find({group: 'staff.user'}) i think i get it, but my main problem is how to link it to all groups to be able to display results in 1 table, do you see what i mean ?
[20:52:23] <GothAlice> Find the total number of users in all groups, excluding groups with no members: db.users.aggregate([{$unwind: '$group'}, {$group: {_id: '$group', count: {$sum: 1}}}])
[20:52:48] <GothAlice> ($group in quotes is a field reference; apologies for how confusing that example was. ;)
[20:53:35] <GothAlice> So in your case I'd run that aggregate to get the total user counts for each group, then when looping over the result of db.groups.find() to generate the table, look up the pre-calculated count when needed.
[20:54:20] <GothAlice> (Note that I place the list of group tags in the user documents, instead of a list of user IDs in the group documents, because users will have fewer groups than groups will have users, if you follow.)
[20:54:22] <arsaw> ok
[20:55:36] <arsaw> yes "I place the list of group tags in the user documents, instead of a list of user IDs in the group documents, because users will have fewer groups than groups will have users" i had the same idea
[20:55:59] <arsaw> i think i have to understand aggregate, i dont know it for the moment, and you are right my solution should be here
[20:56:07] <GothAlice> Adding a user to a group (or removing them) becomes likewise very easy: db.users.update({_id: 'GothAlice'}, {$addToSet: {group: "company.principal"}})
[20:57:49] <arsaw> k
[20:58:09] <arsaw> thanks for your help
[20:58:41] <GothAlice> arsaw: Here are some resources for you: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html (overall relational to document approach, short) http://www.thefourtheye.in/2013/04/mongodb-aggregation-framework-basics.html (aggregation tutorial) and http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/ (what it says in the URL)
[20:59:02] <GothAlice> arsaw: It never hurts to help. :)
[23:39:06] <Mmike> what happens when I call replSetGetStatus? After that, I can't join that node into replicaset any more.
[23:55:01] <joannac> Mmike: erm what?
[23:55:16] <joannac> rs.status() just shows you the current state of the replica set
[23:55:49] <joannac> what do you get when you try and add the node to the replica set?
[23:57:44] <Mmike> joannac, i have 3 boxes, installed mongodb on all of them, added replset to .conf file. Did rs.initiate() on first one. Did rs.status() to make sure it's initiated and primary. Run rs.status() on second node, got 'not initialized'. Did rs.add() on primary, with second's IP. Got ok:1, but secondary stays in 'unknown' state.
[23:58:02] <Mmike> I can connect from primary to secondary (verified with mongo --host second's-ip)
[23:58:08] <joannac> logs from the secondary?
[23:58:12] <Mmike> empty
[23:58:38] <joannac> like, literally empty?
[23:58:54] <GothAlice> Mmike: Uhm, what happens on the secondary if you don't run rs.status() until *after* rs.add()'ing it to the initial primary?
[23:59:14] <Mmike> no, i mean, there is stuff how mongod started and all of that.
[23:59:47] <joannac> pastebin the output of rs.status() from the primary?