pmxbot IRC Log Viewer

[00:00:32] <Synt4x`> reading up on it now, 1 moment

[00:05:19] <Synt4x`> so I would do something like db.mydb.aggregate( [ { $group:{ date:{'date':date} }, person{$addToSet:person} }] )? and I do this while iterating through my Person list?

[00:09:12] <Synt4x`> actually I found an alternative I think, .distinct('person',{'date':date}) should work

[00:11:04] <joannac> don't you have to run that per date?

[00:11:18] <Boomtime> you probably want $group: { _id: "$date", person: { $addToSet: "$person" } }

[00:13:21] <Synt4x`> @joannac yea I did dates = .distinct('date') , for c in dates: people = .distinct('person', {'date':c}) , and then will do a for x in people , .findOne('person':x) and have the data I want

[00:13:24] <Synt4x`> is that a bad way of doing it?

[00:14:09] <Boomtime> that will work, but so will the single operation I quoted for the aggregation pipeline

[00:14:46] <Synt4x`> ok, what does that return to me the 1 line aggregation?

[00:15:52] <Boomtime> it gives you a cursor, similar to a find, but each document in it looks like this: { _id: [date], person: [ 'name1', 'name2', etc ] }

[00:16:54] <Synt4x`> ah ok awesome, and if I wanted to add 'score' to that as well I would just do: $group: { _id: "$date", person: { $addToSet: "$person" }. {$addToSet: "$score"} }

[00:16:55] <Boomtime> the _id field is unique, it is the distinct list of dates that exist

[00:17:13] <Boomtime> the person array in each document is the distinct list of names that occurred on that date

[00:18:18] <Synt4x`> but if a person has multiple entries under one date, i.e. "Joe Smith", "2014/9/25", "50" , "Joe Smith", "2014/9/25", "50", and so on (same score, same person, multiple entries)

[00:18:20] <Boomtime> this is the first time you've mentioned 'score' so I don't know what it is, but what you've written is not valid syntax

[00:18:23] <Synt4x`> it will keep all of them in right?

[00:18:45] <Synt4x`> sorry score is a decimal number representing how they performed

[00:18:50] <Boomtime> $addToSet will keep distinct only, you said you wanted distinct

[00:19:00] <Synt4x`> I do

[00:19:02] <Boomtime> if you want them all then you can use $push

[00:19:04] <Synt4x`> I dont want the repeats

[00:24:23] <Boomtime> you probably want this to include score: $group: { _id: "$date", person: { $addToSet: { name: "$person", score: "$score" } } }

[00:24:57] <Boomtime> 'person' will be an array of documents { name: 'NameA', score: XX }

[00:25:10] <Boomtime> be aware that the set is formed on the combination of name and score

[00:25:36] <Boomtime> if you have the same name on the same date twice but with different scores then it will appear both times in the array for that date

[00:27:21] <Synt4x`> thats actually perfect and exactly what I'm looking for, thanks very much for this!

[00:30:11] <Boomtime> cheers

[00:43:01] <Synt4x`> ok sorry having one more issue, I set dataSet = db.mydb.aggregate() , and then when I do for c in dataSet and print c it shows "ok" , "result"

[00:43:39] <Synt4x`> I was expecting to see {'_id':date, 'person' [{name, score}, {name, score}] , etc.

[00:43:43] <Boomtime> what version of mongodb?

[00:43:58] <Synt4x`> fwiw when I do the .aggregate command in my mongo itself, it works perfectly

[00:44:01] <Boomtime> at a shell, type db.version()

[00:44:03] <Synt4x`> and gives me the full result

[00:44:13] <Synt4x`> 2.6.4

[00:44:30] <Boomtime> right, so you may need to do a .next() or hasNext() or something

[00:44:51] <Boomtime> the shell has some helpers

[00:45:26] <Boomtime> in a driver you will have to do in code what the shell does by guessing what you mean

[00:47:24] <Synt4x`> ah ok, so it seems it's housed in dataSet['result']

[00:55:49] <Synt4x`> still here Boomtime? 1 more question now that I have it working properly, what do I do if I want to add non-distinct elements to this,for instance, opponent_score (which will be repeating almost 100% vs all)

[00:58:25] <Boomtime> I can't really tell from your description what you want to achieve, but you would be better off learning the numerous other powerful operators available in the aggregation pipeline

[00:58:27] <Boomtime> http://docs.mongodb.org/manual/reference/operator/aggregation-pipeline/#pipeline-aggregation-stages

[00:59:38] <Boomtime> if you can't figure it out, then you should construct a set of example docs and desired output as a gist on github along with what you've tried

[01:00:30] <Boomtime> you'll get a better response here if you know exactly what you want and can show you've put some effort in already

[01:04:29] <Synt4x`> yea ok I just assumed you would know easily, it's tough switching from my original method to this on the fly and the documentation on $addtoSet doesn't really give many unique examples

[01:04:37] <Synt4x`> but i'll keep looking I guess, thanks anyways

[01:06:50] <Boomtime> i would need to understand what you want first, there are many possibilities, put some example documents in a gist along with the output you want

[01:07:46] <Boomtime> or pastebin, or [insert temporary text service here]

[01:11:15] <Synt4x`> Here might be an example set, http://pastebin.com/YP5KZwsL

[01:12:18] <Synt4x`> I want it sorted how we already have {Date, (Name, Score) --- the pairing of (name,score) is distinct}, but I also would like to have opponent score listed with each record, and that does not have to be distinct, in fact it will very often be the same repeating with each record, but has the potential to be different

[01:14:24] <Boomtime> i think you need to provide an example output.. i honestly can't tell what you are expecting

[01:14:29] <Synt4x`> does that make sense as I described it?

[01:14:30] <Synt4x`> ok one moment

[01:15:02] <Boomtime> your sample data also conveniently has no collisions

[01:15:24] <Synt4x`> what would be an example of a collision?

[01:15:26] <Boomtime> you need to understand what will happen in all cases, so ensure your sample data covers all possibilities

[01:15:44] <Boomtime> same name on the same date

[01:16:26] <Boomtime> also, same name same date but with different score

[01:16:44] <Boomtime> also, those permutations plus whatever complexities the addition of 'opponent score' introduces

[01:24:42] <Synt4x`> ok, so this would be a sample input: http://pastebin.com/FGe5HTFt , this would be an expected output: http://pastebin.com/mKUTXHKr

[01:25:26] <Synt4x`> You can see Robert's score of 83.25 does not repeat (it's inconsequential which of the opponent scores vs the 83.25 it chooses), but he has 2 scores on 2014/09/25 because he has 83.25 and 99.25 so they both are included

[01:29:51] <Boomtime> ok, lets focus on 'Robert' records since those are the most interesting

[01:29:53] <Synt4x`> basically of all of the ones that get chosen from the original date, (name, score) $group, I just want to add the opponent_score, whatever that was in that instance, to each of those (name, score)

[01:29:54] <Synt4x`> sure

[01:35:13] <Boomtime> ok, i think i understand

[01:35:55] <Boomtime> you're unique combination is date,name,score and then you simply want whatever happens to be the first opponent-score to be seen on each combination

[01:36:15] <Synt4x`> yes, exactly

[01:38:51] <Boomtime> ok, you probably want to do 2 pipeline stages - the first to de-duplicate opponent-scores, i.e group on the combination of date,name,score and select op-score using $first

[01:40:47] <Boomtime> you can then $group again on $_id.date (as it will have become) and now you build your array as before, including op-score knowing it won't cause any new duplicates

[01:44:04] <Synt4x`> sorry just thinking for a minute to make sure I understand how it would work

[01:45:52] <Synt4x`> will the method you describe make all records on 2014/09/25 for example show the same opponent score? like is it selecting one opp_score ($first) and applying it to all of them

[01:53:05] <Boomtime> no

[01:53:41] <Boomtime> it will make all records that have the same value for a given combination of date,name,score have the same op-score

[01:56:52] <Synt4x`> ok, so it should look like dataSet = db.mydb.aggregate({'$group': { '_id': "$date", 'person': { '$addToSet': { 'name': "$opponent", 'score': "$score" }, 'opponent_score':{ '$first':'opponent_score} } }})

[01:56:55] <Synt4x`> for the first group?

[01:57:28] <Synt4x`> trying to read from the $first page on mongodb docs, but again it only gives one example, so I think this should be correct but not 100%

[01:58:57] <Boomtime> i don't actually know what your given pipeline will produce - you should try it and see

[01:59:05] <Boomtime> i don't think it will be correct

[01:59:19] <Boomtime> my suggestion was to use 2 consecutive pipeline stages

[01:59:44] <Boomtime> i.e { $group: { ... }, $group: { ... } }

[02:00:24] <Boomtime> aggregation is a pipeline, you are using a grand total of 1 stage at the moment, but you can use as many as you like

[02:00:43] <culthero> How do you prevent a mongodb shard from running out of disk space? It has a TTL collection but the volume over a short time increased

[02:00:51] <Synt4x`> ok I think I understand the grouping concept, I guess what I don't fully grasp is where to include the $first to make it non-unique/required

[02:01:24] <Synt4x`> I am assuming if I put it in the original one, it will also be required to be unique, like eahc of those were

[02:02:02] <Boomtime> no

[02:02:24] <Boomtime> your first pipeline stage contains the $first

[02:02:51] <Boomtime> your second pipeline stage contains something like the 'original' we came up with

[02:03:07] <Boomtime> i think you need to go and play with this a lot more, use the shell

[02:03:09] <Synt4x`> ok, I will spend a few hours toying with it now, thanks for all of your help / explanations. Tough for me to learn a new concept from scratch, it seems like the documentation ont his wasn't very thorough

[02:03:13] <Synt4x`> I really appreciate it

[02:03:15] <Synt4x`> yea, absolutely

[02:04:25] <Boomtime> culthero: it sounds like you should be using a capped-collection not a TTL index

[02:04:38] <culthero> you can't use capped collections on replicasets though?

[02:04:46] <Boomtime> why not?

[02:04:46] <culthero> and all shards must be replicasets?

[02:04:51] <culthero> the documentation?

[02:05:26] <Boomtime> can you provide a link?

[02:05:48] <Boomtime> "You cannot shard a capped collection"

[02:05:58] <culthero> hm reading the documentation says that it's OK

[02:06:05] <culthero> i probably was reading a 6 month old block post or something

[02:06:09] <culthero> in the mean time

[02:06:13] <culthero> rip on 100gb database

[02:06:14] <Boomtime> but that isn't replica-sets, nor does it prevent you from using it on a sharded-cluster, just that the collection cannot be sharded

[02:06:31] <culthero> right, the capped collection

[02:06:34] <culthero> would need to be sharded

[02:06:41] <Boomtime> capped-collections have always been available on replica-sets, it's kind of essential

[02:06:41] <culthero> because of a fulltext index on it

[02:07:14] <Boomtime> then you have requirements which are at odds with each other

[02:07:17] <culthero> it is i/o bound, once it gets to something like 2-3m documents it throttles the disk

[02:07:36] <culthero> agreed, but not really sure how to address that scenario

[02:08:23] <Boomtime> that is a tricky one - you may have to do it manually

[02:08:36] <Boomtime> or at least, by a script that figures it out

[02:08:53] <culthero> In general I want to search 30-40m documents relatively instantaneously

[02:09:06] <Boomtime> also, you will need to pay very careful attention to the balancer action

[02:09:28] <Boomtime> 40 million documents is nothing

[02:15:21] <culthero> 40m documents = each document is roughly 5kb, inserted 60-80 per second..

[02:15:41] <culthero> one field is fulltext indexed

[02:15:58] <Boomtime> ok, so that will be a little punishing

[02:16:30] <Boomtime> and you do distributed full-text queries i take it?

[02:16:41] <culthero> yeah they are sharded on hash keys

[02:17:17] <Boomtime> oh well, if you have size requirements you will probably have to track them yourself

[02:17:33] <Boomtime> mms monitoring can give you alerts on shard dbstorage sizes

[02:17:39] <culthero> the idea being is I have to know how many times something occurred.

[02:18:21] <culthero> sure but right now for my setup, 3x config servers + 4 shards + mongos instance is pretty cheap

[02:18:26] <Boomtime> the ttl index is a good idea, but it doesn't respond to size - you need a feedback loop to shorten the ttl if the size starts growing

[02:18:43] <culthero> I can't alter the index on the fly smartly right?

[02:18:54] <culthero> or maybe I can?

[02:20:23] <Boomtime> http://docs.mongodb.org/manual/reference/command/collMod/#index

[02:20:57] <Boomtime> oh, that link was for you, culthero

[02:23:53] <culthero> cheers boomtime

[02:25:02] <culthero> so I will probably run a 3rd node app on my mongos instance that reads something like df -h from each of the shards, and if it drops below like 4gb to reduce the TTL to oldest record - 5 hours

[02:25:16] <culthero> for now I am dropping my TTL signifigantly

[02:25:47] <culthero> or I suppose I can check it every 10 minutes within the loop

[02:26:18] <Boomtime> be careful with changing TTL significantly - it can cause a sudden mass delete that cause performance impact to other ops

[03:27:03] <Synt4x`> sigh still not able to do this without a TON of iteration, and I know there is a better way now with the aggregation. Hackhands has had me waiting ~1 hour with no expert yet, so if anybody here is great with aggregation and wants to make some $ I will paypal you what I would have paid them ($1/minute)

[03:49:08] <Synt4x`> can't even pay someone to help ;(, hackhands guy couldn't help either

[03:49:14] <Synt4x`> I need an aggregate savior :-p

[05:01:08] <Synt4x`> I'm very VERY close to what I want, if anybody is here

[05:03:13] <Synt4x`> db.hu_db.aggregate({'$group': { '_id': "$date", 'person': { '$addToSet': { 'name': "$opponent", 'score': "$score", 'opponent_score':'$our_score' } } }})

[05:03:52] <Synt4x`> this gets me what I want except that if (name, score) is the same, but (opponent_score) is different, it will have 2 entries for (name, score), I want it to only have 1

[05:04:05] <Synt4x`> I need (name, score) to be unique, with opponent score just included as extra data (doesnt have to be unique)

[05:04:06] <joannac> then which one should win?

[05:04:49] <Synt4x`> it's a bit confusing, "opponent_score" is actually me playing, thats my score. "Score" is the score of my opponent who I played against, but this entry is for their result

[05:06:48] <Synt4x`> any idea on it @joannac?

[05:07:18] <joannac> you haven't answered my question

[05:08:01] <Synt4x`> what do you mean which one should win?

[05:08:24] <joannac> you have 2 entries where (name, score) is the same, but (opponent_score) is different

[05:08:31] <joannac> which one should be kept?

[05:08:44] <Synt4x`> doesn't matter, I'm completely indifferent

[05:09:06] <joannac> if you don't care, then why keep the score at all?

[05:10:17] <Synt4x`> because over time I think it should average out, basically I played 2 "teams" that day, he could have easily been matched vs either one of my teams, so which one it compares him to on the calculates is indifferent

[05:10:31] <Synt4x`> but I still want to do a comparison as far as his %difference from my team, and what percentile he falls in for the day

[05:15:41] <joannac> group on (date, name, score) first to only get uniques

[05:15:50] <joannac> and then do the group stage you have above

[05:17:24] <Synt4x`> so I've never done 2 stages, I just do: db.hu_db.aggregate( {'$group': { '_id': "$date", 'person': { '$addToSet': { 'name': "$opponent", 'score': "$score"} } }}, {'$group': { '_id': "$date", 'person': { '$addToSet': { 'name': "$opponent", 'score': "$score", 'opponent_score':'$our_score' } } }} )

[05:17:26] <Synt4x`> like that?

[05:18:24] <joannac> no

[05:18:44] <joannac> and if you try to run that, you'll quickly see why that doesn't work

[05:21:04] <joannac> You need to use $first http://docs.mongodb.org/manual/reference/operator/aggregation/first/

[05:21:10] <joannac> or $last, or something

[05:21:59] <joannac> depending on how much you want to skew your stats, you might want $min or $max

[05:22:12] <joannac> or $avg

[05:22:22] <Synt4x`> hrmm sigh this cycle again, I tried 2 hours earlier with Boomtime to do this, and have spent $50 on hackhands/codementor with people who didnt get it right :-p

[05:22:39] <Synt4x`> how would I actually use first in my group statement, I tried earlier and inside, outside, etc.

[05:22:42] <Synt4x`> all were giving errors

[05:22:47] <Synt4x`> I did it identical to the example

[05:22:51] <joannac> gist what you tried

[05:23:00] <joannac> and what errors your got

[05:23:36] <Synt4x`> this was my most recent: dataSet = db.hu_db.aggregate({'$group': { '_id': "$date", 'opponent': { '$addToSet': { 'name': "$opponent", 'score': "$score",'opp_score':{'$first':'$our_score'} } } }})

[05:23:45] <Synt4x`> pymongo.errors.OperationFailure: command SON([('aggregate', u'hu_db'), ('pipeline', [{'$group': {'_id': '$date', 'opponent': {'$addToSet': {'score': '$score', 'name': '$opponent', 'opp_score': {'$first': '$our_score'}}}}}])]) failed: exception: invalid operator '$first'

[05:24:08] <Synt4x`> oops gist is like pastebin huh? I thought you were asking "for the gist of it"

[05:24:16] <joannac> that's because you're doing it inside an $addToSet

[05:25:18] <Synt4x`> wowwwww im so dumb, I tried so many times to add it outside of the 'addToSet'{} brackets but inside of the opponent{} ones and it never worked

[05:27:23] <Synt4x`> well, waste of $50 and a bunch of time, but this seems to work beautifully!

[05:28:22] <joannac> happy to help :p

[05:29:02] <Synt4x`> so if there is no $sort at all before first, it will just grab whatever one the dictionary decides is first? (also unsorted since it's a dict I believe)

[05:35:26] <joannac> basically, yes

[05:37:34] <Synt4x`> ok that works for me :), thanks again

[06:07:36] <Synt4x`> eek, @joannac are you still here , I thought it worked perfectly but it's not like I intended it

[06:07:56] <joannac> how so?

[06:09:31] <Synt4x`> it only attaches 1 opp_score at the end of the entire list, so it's like {date, {name, score}, {name, score}..., opp_score}, and ideally I want it to have {date, {name, score, opp_score}, {name, score, opp_score}} , I'm just indifferent when there are 1 score (2x) with 2 diff opp_scores, then it can choose either one, doesnt matter

[06:09:38] <Synt4x`> but not for all of them to be tied to one, that would skew it too hard

[06:09:47] <joannac> right

[06:09:54] <joannac> i thought we already solved this problem

[06:09:58] <joannac> with 2 group stages

[06:10:57] <Synt4x`> hrmm I guess I misunderstood it then, let me take a look at some two-group stage examples to see if I can see how

[06:13:26] <Synt4x`> this page: http://docs.mongodb.org/manual/reference/operator/aggregation/group/ only has examples of them using 1 group at a time, do I do them both within the same aggregate command?

[06:13:54] <joannac> http://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/#return-average-city-population-by-state

[06:15:13] <Synt4x`> ok yea I read this, so I guess my question is this then, by the looks of the example, if we group on date, (name, score), it gets rid of all of the other data, so if we re-group on that group, we don't have opp_score anymore

[06:15:19] <Synt4x`> (that was the problem I was having conceptually)

[06:15:42] <joannac> um

[06:15:46] <joannac> look at the example

[06:16:12] <joannac> see how they're grouping on (city, state) and yet they're also operating on the population

[06:16:34] <joannac> in your example, it would be $first: opponent_score

[06:16:44] <joannac> or $last or whichever operator you decided on

[06:19:35] <Synt4x`> sorry if I'm being obtuse here, but the first $group in both examples are: $group{ {state, city}, population }, they both have all 3 items in their first group command

[06:20:13] <joannac> ?

[06:21:08] <Synt4x`> in the example, Return Largest and Smallest Cities by State , also in Return Average City Population by State

[06:21:19] <Synt4x`> db.zipcodes.aggregate( [

[06:21:19] <Synt4x`> { $group : { _id : { state : "$state", city : "$city" }, pop : { $sum : "$pop" } } },

[06:21:19] <Synt4x`> { $group : { _id : "$_id.state", avgCityPop : { $avg : "$pop" } } }

[06:21:19] <Synt4x`> ] )

[06:21:27] <joannac> yes

[06:21:32] <joannac> what about it?

[06:21:34] <Synt4x`> for instance, this first $group has {state, city}, {pop}

[06:21:36] <Synt4x`> all in it

[06:21:39] <joannac> yes

[06:22:21] <Synt4x`> <@joannac> group on (date, name, score) first to only get uniques

[06:22:21] <Synt4x`> <@joannac> and then do the group stage you have above

[06:22:24] <Synt4x`> thats what you said earlier

[06:22:35] <joannac> yes

[06:22:52] <Synt4x`> so if I group on (date, name, score) don't I lose opp_score?

[06:23:10] <joannac> I presumed you would realise you needed to grab it

[06:23:23] <Synt4x`> sorry ;(

[06:23:31] <joannac> but fine, group uniquely on (date, name, score), and grab a single instance of opp_score

[06:24:34] <Synt4x`> ok but that's what we're doing now, we have only 1 instance of opp_score for each date, i.e. {date, {name, score}, {name, score}, ..., opp_score}, but in reality there could be 50 different opp_scores that day, not just one

[06:24:41] <joannac> no

[06:24:48] <joannac> geez

[06:25:06] <Synt4x`> I'm really sorry I'm trying here, I don't know why this isn't sticking

[06:25:13] <joannac> for every unique (date, name, score) document, you should get a single opp_score

[06:26:29] <joannac> I think it's not sticking because you don't actually understand aggregation :)

[06:26:34] <joannac> it's okay, I've been there

[06:26:55] <Synt4x`> yea, I guess it is, I mean it just seems like layers of queries, I should be able to understand it I feel like, but yea, obviously I don't

[06:27:02] <Synt4x`> {'$group': { '_id': "$date", 'opponent': { '$addToSet': { 'name': "$opponent", 'score': "$score" } }, 'opp_score':{'$first':'$our_score'} }}

[06:27:08] <joannac> no addtoset

[06:27:10] <Synt4x`> that's what we have right now, with what you describe above, how will I change this

[06:27:15] <Synt4x`> ah ok so I take away addtoset

[06:27:20] <joannac> remember how I said 2 group stages?

[06:27:58] <joannac> the first group stage is, I only want unique (date, name, score) documents

[06:28:29] <joannac> so that's your _id (remember that $group stage groups on whatever you tell it the _id is)

[06:28:42] <joannac> now, you can't lose your opp_score

[06:28:56] <joannac> so what do you do when you have multiple? $first

[06:31:15] <joannac> so then it looks like: $group: {_id: {date: $date, name: $name, score: $score}, opp_score: {$first: $opp_score}}

[06:31:23] <Synt4x`> ok I got the first part as this

[06:31:33] <Synt4x`> '$group': { '_id': {'date':"$date", 'name': "$opponent", 'score': "$score" }, 'opp_score':{'$first':'$our_score'} }}

[06:31:38] <joannac> yes

[06:31:45] <joannac> now from there, do your second bit

[06:31:45] <Synt4x`> trying the second part now

[06:35:07] <Synt4x`> awesome! got it

[06:36:03] <Synt4x`> well rather hold on, I shouldnt celebrate too quickly

[06:36:03] <joannac> more importantly, do you know why it works?

[06:36:21] <Synt4x`> I have it so it's filtered out the games I don't want, I need to make sure opp_score is included correctly

[06:36:24] <Synt4x`> (this is my query currently)

[06:36:25] <Synt4x`> dataSet = db.hu_db.aggregate({'$group': { '_id': {'date':"$date", 'name': "$opponent", 'score': "$score" }, 'opp_score':{'$first':'$our_score'} }, '$group': { '_id': "$date", 'opponent': { '$addToSet': { 'name': "$opponent", 'score': "$score" } }, 'opp_score':{'$first':'$our_score'} }})

[06:37:02] <joannac> have you actually run that code?

[06:37:42] <Synt4x`> yes, it ran, gave me my results, but I just checked and again only 1 opp_score is at the end, so I'm assuming I made a mistake in my second query, maybe because I kept opp_score outside again the second time

[06:37:54] <[1]zwoop> pwd

[06:44:33] <Synt4x`> ok, so it's definitely my second query that is wrong, the first contains everything I would want it to with a structure of {id: {date, name, score}, opp_score}

[06:45:19] <Synt4x`> in my second I want to turn that into: {date, _id: {name, score}, oppscore:{oppscore}} **I think**

[06:47:42] <joannac> no you don't

[06:47:53] <joannac> for a start, weren't you using $addToSet?

[06:48:24] <Synt4x`> originally yes, but not for the first grouping you said I just used it for the second grouping

[06:48:39] <joannac> yes

[06:48:49] <joannac> so how come there isn't an array in your expected output?

[06:50:05] <Synt4x`> hrmm, right now my expected output is looking like this: {date, opp_score, [{name, score}, {name, score}...]}

[06:50:09] <Synt4x`> err sorry, my actual output

[06:50:11] <Synt4x`> not expected

[06:51:16] <Synt4x`> ah I think I see whats wrong

[06:54:28] <Synt4x`> ugh !@)_($ sigh, I thought I had it, but it's back to including both entries

[06:54:40] <Synt4x`> right now this is my query:

[06:54:41] <Synt4x`> dataSet = db.hu_db.aggregate({'$group': { '_id': {'date':"$date", 'name': "$opponent", 'score': "$score" }, 'opp_score':{'$first':'$our_score'} }, '$group': { '_id': "$date", 'opponent': { '$addToSet': { 'name': "$opponent", 'score': "$score", 'opp_score':'$our_score' } } }})

[06:55:28] <Synt4x`> I tried doing 'opp_score':'$opp_score' which is what I thought it would be, but it didn't put any opp_score anywhere in the output, so I switched it to '$our_score' and I think doing that makes it read from the original set defeating the purpose

[06:55:53] <Synt4x`> so I guess the question is, how do I access opp_score from the first $group query, in the 2nd group query

[06:59:28] <Synt4x`> hold on I may have gotten it (or I have a hypothesis to try at least)

[07:02:02] <Synt4x`> nevermind that failed ;(, I tried accessing like in the example, in the second group statement doing $_id.score, etc.

[07:02:13] <Synt4x`> dataSet = db.hu_db.aggregate({'$group': { '_id': {'date':"$date", 'name': "$opponent", 'score': "$score" }, 'opp_score':{'$first':'$our_score'} }, '$group': { '_id': "$date", 'opponent': { '$addToSet': { 'name': "$_id.opponent", 'score': "$_id.score", 'opp_score':'$_id.opp_score' } } }})

[07:02:15] <Synt4x`> that was the try

[07:05:33] <[1]zwoop> Im using a rest api as the only means to communicate with mongodb. The problem im haivng is that i have to specify the datamodel for every object i want to send/retrive from the database. How is this done more genral so that i can send retrive any object throgh the api?

[07:06:10] <Synt4x`> I still think I want this as expected output: {_id: date , opponent: [ {{name, score}, {opp_score}, {{name, score}, {opp_score}... ] } **or {name, score, opp_score}, {name, score, opp_score}... but I can't make either work

[07:07:11] <joannac> Synt4x`: do you understand what the first $group is doing

[07:07:37] <joannac> do you know what you are getting out of the first $group, what your documents look like?

[07:10:37] <Synt4x`> yes, the first one is doing this [ { _id: {date, score, name}, opp_score}. {_id: {date, score, name}, opp_score}...

[07:10:50] <joannac> okay

[07:11:14] <joannac> what do you want your second group stage to do, then?

[07:11:31] <joannac> what should the unique field be?

[07:13:20] <Synt4x`> ok in the first one it's already filtered out my multiple entries, so for instance there is an entry in the db, {name:gfx_644, score: 110.24, opp_score:106.60} and then another one {name:gfx_644, score: 110.24, opp_score:94.94} (2 entries, same score, diff. opp_score)

[07:13:30] <Synt4x`> when I do the first query, I'm down to only one of those, which is what I want

[07:14:22] <Synt4x`> so now I'd like it to be {_id: date, {name, score, opp_score}} where all {name, score, opp_score} are already unique I believe because of the first query, so just to separate them by date and that's it

[07:15:09] <joannac> you're playing very loose with definitions here

[07:15:39] <Synt4x`> in which way?

[07:16:07] <joannac> {_id: "2014/09/28", "name" : "Robert", "score" : "99.25", "opp_score" : 99.25}

[07:16:19] <joannac> you want documents that look like that?

[07:18:54] <Synt4x`> sorry your right, I was thinking it auto grouped them by date but it's because I as using addToSet, so I want them to look like this: {_id : date , opponent : [ {name, score, opp_score}, {name, score, opp_score}... ] }

[07:21:27] <joannac> okay

[07:21:40] <joannac> so tell me how you would do the _id: date part of that query

[07:21:50] <joannac> not query, group

[07:21:52] <Synt4x`> '$group': { '_id': "$date"

[07:21:56] <joannac> wrong

[07:22:23] <Synt4x`> hrmm this part I thought I was 100% sure on, what is wrong with that?

[07:22:48] <joannac> your documents from the first stage look like this

[07:22:58] <joannac> { _id: {date, score, name}, opp_score}

[07:23:11] <joannac> there's no field called "date"

[07:23:22] <Synt4x`> ahhhh, so I should do: '$group': { '_id': "$id_date" ?

[07:23:27] <joannac> yes

[07:23:39] <joannac> wait, no

[07:23:51] <Synt4x`> err $id_date

[07:23:56] <Synt4x`> ugh

[07:24:05] <joannac> still no

[07:24:17] <Synt4x`> I mean $_id.date

[07:24:27] <joannac> I suggest you play around in the shell while you figure out syntax

[07:24:43] <joannac> and put your group stage together piece by piece

[07:25:40] <Synt4x`> the thing is, it's easy to see how I access the $_id.X's , how do I access opp_score from the first grouping? that's what I don't see, the example says just $pop but that didn't seem to work as I expected

[07:25:57] <Synt4x`> $pop = $opp_score **

[07:26:09] <joannac> um, what

[07:26:23] <joannac> your opp score is actually a top level field

[07:26:34] <joannac> you can use it with $opp_score

[07:26:40] <joannac> in quotes, of course

[07:29:43] <Synt4x`> so I believe my second group should look like this then:

[07:29:44] <Synt4x`> '$group': { '_id': "$_id.date", 'opponent': { '$addToSet': { 'name': "$_id.name", 'score': "$_id.score", 'opp_score':'$opp_score' } } }

[07:30:57] <Synt4x`> however that gives me the following: {u'ok': 1.0, u'result': [{u'_id': None, u'opponent': [{}]}]}

[07:31:00] <Synt4x`> so clearly not correct

[07:32:36] <joannac> okay, so work out why it doesn't work

[07:35:22] <joannac> have you been testing along the way?

[07:35:34] <joannac> did you add a bit and then it suddenly stopped working?

[07:35:36] <Synt4x`> I took out everything from the second query but this: '$group': { '_id': "$_id.date"}

[07:35:40] <Synt4x`> and it still doesn't work

[07:36:00] <Synt4x`> I am sure the 1st query still works, so I know I am getting into the second query with our normal data set

[07:36:07] <joannac> Synt4x`: remove the second part altogether

[07:36:23] <joannac> is the output exactly what you expect?

[07:36:33] <Synt4x`> yes

[07:37:12] <joannac> but trying to group on "$_id.date" gives you nothing?

[07:37:29] <Synt4x`> in the form of [ {id: {date, score, name, opp_score}}, ... ]

[07:37:38] <joannac> nope

[07:37:49] <Synt4x`> thats what the 1st query returns to me

[07:37:55] <joannac> no it isn't

[07:37:58] <joannac> unless you changed it

[07:38:27] <Synt4x`> dataSet = db.hu_db.aggregate({'$group': { '_id': {'date':"$date", 'name': "$opponent", 'score': "$score" }, 'opp_score':{'$first':'$our_score'} }})

[07:38:32] <Synt4x`> that is the first query, I didn't change it

[07:38:43] <joannac> then your assertion is incorrect

[07:39:18] <Synt4x`> {u'ok': 1.0, u'result': [{u'_id': {u'date': u'2014/09/14', u'score': u'91.24', u'name': u'spacejam23'}, u'opp_score': u'92.96'}, {u'_id': {u'date': u'2014/09/14', u'score': u'118.58', u'name': u'matt201490'}, u'opp_score': u'92.96'}, ... ]}'

[07:39:29] <joannac> yup

[07:39:53] <joannac> opp_score is not inside the _id field

[07:39:55] <joannac> it's outside

[07:40:34] <joannac> now add your second $group stage

[07:40:49] <Synt4x`> '$group': { '_id': "$_id.date", 'opponent': { '$addToSet': { 'name': "$_id.name", 'score': "$_id.score", 'opp_score':'$opp_score' } } }

[07:40:58] <joannac> nope, too much

[07:41:04] <joannac> just add the _id part

[07:41:08] <Synt4x`> ok sorry

[07:41:16] <Synt4x`> '$group': { '_id': "$_id.date"

[07:41:26] <joannac> seriously, does no one teach how to debug anymore?

[07:41:40] <joannac> okay, so what's the full aggregation command?

[07:42:15] <Synt4x`> dataSet = db.hu_db.aggregate({'$group': { '_id': {'date':"$date", 'name': "$opponent", 'score': "$score" }, 'opp_score':{'$first':'$our_score'} }, '$group': { '_id': "$_id.date", 'opponent': { '$addToSet': { 'name': "$_id.name", 'score': "$_id.score", 'opp_score':'$opp_score' } } }})

[07:42:21] <joannac> ...

[07:42:37] <joannac> sorry, what's the full aggregation command with just the _id part of the second one?

[07:43:21] <Synt4x`> dataSet = db.hu_db.aggregate({'$group': { '_id': {'date':"$date", 'name': "$opponent", 'score': "$score" }, 'opp_score':{'$first':'$our_score'} }, '$group': { '_id': "$_id.date"}})

[07:43:43] <joannac> lol

[07:43:48] <joannac> i just realised what you did

[07:43:48] <Synt4x`> I think I found it

[07:43:49] <Synt4x`> sigh

[07:43:49] <Synt4x`> yup

[07:43:50] <Synt4x`> me too

[07:43:59] <joannac> is it by chance very late where you are?

[07:44:13] <joannac> you're having braces problems :p

[07:44:28] <Synt4x`> 12:43am and I'm on 5 hours of sleep, yes ;(. I spent 12 hours today trying to fix this one query, something that seemed soooo simple, and I could do iteratively

[07:44:34] <Synt4x`> and somehow has turned into this marathon of pain

[07:45:22] <joannac> anyway, I'm glad you figured it out

[07:45:34] <joannac> and despite all the pain i think you understand aggregation now

[07:45:45] <joannac> even though i had to prod you through it

[07:45:54] <joannac> :)

[07:46:02] <Synt4x`> well I thought I figured it out, I thought I didn't have a { } around the second group

[07:46:07] <Synt4x`> but it's giving me invalid syntax

[07:46:22] <Synt4x`> and yes, you did have to prod my quite a bit, I can't tell you how thankful I am for the help already

[07:46:48] <joannac> you still have braces problems

[07:47:07] <joannac> count them. match them up

[07:48:01] <joannac> as a hint, it should look like db.coll.aggregate({$group : {...}}, {$group: { .. }})

[07:50:44] <Synt4x`> ok wow, that was painful. Now it is hitting me with this though: TypeError: aggregate() takes exactly 2 arguments (3 given)

[07:52:14] <joannac> oh

[07:52:21] <joannac> you're doing this in python, aren't you

[07:52:26] <Synt4x`> yes :S

[07:52:26] <joannac> http://api.mongodb.org/python/current/examples/aggregation.html

[07:52:46] <joannac> you need to put it in an array

[07:52:56] <joannac> [{$group...}, {$group...}]

[07:53:37] <joannac> the mongo shell just takes multiple documents, but python needs it to be in an array

[07:54:22] <joannac> okay, i'm out

[07:54:23] <joannac> good luck

[07:55:20] <Synt4x`> seriously, can I send you some $ on paypal or something?

[07:55:24] <Synt4x`> this has really helped me out a ton

[08:06:03] <Synt4x`> Hooooooooooooooorayyyyyyyyyyyyyyyyyyyyy, got it. and MAN... what a challenging day, thanks so much to everybody here I NEED to sleep but I will pay it forward in any way possible

[09:50:43] <Forest> Hello. i am using node js with mongoDB, i am chunking array and trying to insert the JSON objects into mongodb,but not all of them are inserted,what might be the problem? I get no error message.

[09:53:42] <Forest> This is my code https://dpaste.de/NnZ2 Only 999 elemetns of cesty is isnerted and 2600 of vrcholy. Can anyone help me please?

[10:09:01] <thib> hello

[10:10:19] <shepik> hi, is this really "mogodb chat&support"?

[10:10:41] <thib> i use mongodb on a vagrant virtual machine, i've tried to update it to 2.6.5 but it seems to failed configurate mongodb-org-server, any idea why ?

[12:29:15] <ssarah> Hi guys! :)

[12:29:38] <ssarah> Can you tell me the easiest way to ensure external connections to a mongo machine use ssl? Or point me to the guide manual on it?

[12:49:21] <beebeeep> hello folks, does anybody knows, what means this record in config.locks: config(obj: 237636; size: 0/0/0 Gb)> db.locks.find({state: {$ne: 0}})

[12:49:24] <beebeeep> { "_id" : "balancer" }

[12:50:38] <beebeeep> (balancer was disabled for months)

[13:22:05] <Forest> Hello, i am using mongodb with node js and i have a problem that array of documents i want to insert is larger than 16 MB. Can you help me how can i split the array correctly? I tried to split the array into smaller arrays of 8000 elements,but that does not guarantee the size. Any help would be appreciated.

[15:17:35] <kakashiAL> I am trying to understand this code: https://paste.xinu.at/r4CfO/

[15:17:37] <kakashiAL> it say: give mySchema object a method that get 2 paramters, an id and a callback(cb)

[15:17:40] <kakashiAL> because its asynch it need a callback, but what does .exec.(cb) means?

[16:05:02] <hydrajump> hi if I don't specify fork=true in the mongd.conf does that mean that service start mongod will not run in the backgroun?

[17:41:32] <locojay> hi i ve removed about 100g from gridfs but when i do showdbs the db still hast a size not reflecting the delete docs

[18:38:44] <skaag> how do I enable the oplog on a replica set?

[18:39:09] <Derick> oplogs are automatically created when you start a node with --replSet=name

[18:39:14] <skaag> I ran rs.printReplicationInfo() and I see the last oplog event is from sometime in february 2014

[18:39:54] <skaag> can I put replSet=name in /etc/mongodb.conf and it will behave the same?

[18:41:38] <Derick> sure

[18:43:52] <skaag> have you ever used mongo's MMS?

[18:44:08] <Derick> i've not

[18:48:48] <skaag> now I get: Exception while invoking method 'login' MongoError: not master and slaveOk=false

[18:58:14] <sinother> hello somebody can recomend me our favorite gui for mongodb please?

[19:00:39] <hydrajump> hi if I disable mongod from starting on boot with sudo chkconfig mongod off

[19:00:59] <hydrajump> why when I start with sudo service mongod start will it not go to the background, e.g.

[19:01:01] <hydrajump> sudo service mongod start

[19:01:03] <hydrajump> Starting mongod: about to fork child process, waiting until server is ready for connections.

[19:01:05] <hydrajump> forked process: 2670

[19:01:07] <hydrajump> all output going to: /log/mongod.log

[19:01:20] <hydrajump> my mongod.conf has fork = true

[19:01:22] <hydrajump> pidfilepath = /var/run/mongodb/mongod.pid

[19:02:57] <hydrajump> ok it just took a little while nevermind

[19:22:41] <skaag> how do I enable slaveOk ?

[19:23:26] <Derick> you either connect to a primary node, or use rs.slaveOk( true );

[19:26:57] <hydrajump> Derick: I need a quick check because I don't know if I'm doing the right thing

[19:27:35] <hydrajump> Derick: I have 3 members in a replicaset. I created a new member and just added it to the replicaset on the primary using rs.add("xxx")

[19:30:31] <hydrajump> Derick: rs.status() returns the following for the new member

[19:30:33] <hydrajump> "_id" : 3,

[19:30:35] <hydrajump> "name" : "xxxx:27017",

[19:30:37] <hydrajump> "health" : 1,

[19:30:39] <hydrajump> "state" : 5,

[19:30:41] <hydrajump> "stateStr" : "STARTUP2",

[19:30:43] <hydrajump> "uptime" : 1703,

[19:30:45] <hydrajump> "optime" : Timestamp(0, 0),

[19:30:47] <Derick> hydrajump: please always use a pastebin, don't paste in channel

[19:30:47] <hydrajump> "optimeDate" : ISODate("1970-01-01T00:00:00Z"),

[19:31:59] <hydrajump> Derick: sorry

[19:38:57] <proteneer_osx> to allow parallel writes, does the stuff I"m writing to have its own db?

[19:39:34] <proteneer_osx> i want to avoid having to do sharding for now

[19:45:42] <proteneer_osx> most of the writes are entirely in place

[20:01:50] <jfk> Afternoon

[20:03:34] <jfk> anyone here somewhat familiar with the $redact aggregation operator?

[20:49:25] <mongoexplore> hello, does anyone use mongo-hadoop to create bson?

[20:50:03] <mongoexplore> I have already mongo ready json and want to convert to bson

[20:50:18] <mongoexplore> when I do it I get all data assign to _id field

[20:50:41] <mongoexplore> anyone had the same issue?

[20:52:01] <mongoexplore> anyone online?

[20:57:08] <HairAndBeardGuy_> yes, but sorry, i've never used mongo-hadoop.

[20:58:26] <joannac> ditto. i suggest you share your code so someone can look, though

[20:59:10] <HairAndBeardGuy_> if you're having problems doing it with mongo-hadoop, could you not use another library?

[21:11:20] <mongoexplore> I don't know any other alternative to create bson file on hadoop

[21:11:43] <mongoexplore> do you have suggestion for alternative?

[21:12:04] <mongoexplore> I am converting 500gb of data

[21:12:14] <mongoexplore> so single machine will not work

[21:21:18] <mongoexplore> anyone want last peace of pizza?

[21:27:58] <cheeser> mongoexplore: no one can answer your question if you don't ask it.

[21:30:30] <mongoexplore> cheeser, the question i asked was: I am using mongo-hadoop to convert existing json files into bson... but when I do it everything get assign as _id : {full json}

[21:30:42] <mongoexplore> instead of being document

[21:30:49] <cheeser> pastebin your code

[21:30:49] <mongoexplore> I am using hadoop streaming

[21:31:28] <cheeser> well, java's my thing but I can try to decipher your ruby/python

[21:32:01] <mongoexplore> http://pastie.org/9645534

[21:32:23] <mongoexplore> code is simple mapper and reducers are just simple cat

[21:33:47] <cheeser> what does your output look like?

[21:35:41] <Bumptious> is there an easy way to find a gap in series of integers, each integer from a separate document?

[21:36:12] <Bumptious> as you may expect this is for the purposes of order/position

[21:37:02] <mongoexplore> so input is {"_id":"12", "key":"field"} output bson will look like {"_id": "{\"_id":\"12\", \"key\":"\field\"} "}

[21:38:39] <joannac> Bumptious: why not just sort them?

[21:38:48] <cheeser> mongoexplore: i'm guessing it's because RecordWriters expect a <K,V> pair and you're only giving it one item so it assumed a null value.

[21:39:10] <mongoexplore> ic thanks let me try

[21:39:27] <cheeser> what exactly are you trying to achieve with this?

[21:39:50] <Bumptious> joannac: I will be sorting them, but sometimes one will go missing and I won't know which. it's a small number, 1 to 18

[21:40:19] <mongoexplore> I have single cluster mongo server, which hold around 50gb data, when I load json it takes around 8 hours

[21:40:37] <mongoexplore> when I do mongorestore it takes around 3

[21:40:43] <mongoexplore> so it saves me 5 hours

[21:41:42] <joannac> Bumptious: sort, output, and process it app-side?

[21:42:17] <Bumptious> I want to add a new document back to the list when one goes missing

[21:42:27] <Bumptious> i'm not certain that works

[21:42:50] <joannac> how is it going missing?

[21:42:51] <Bumptious> so say #7 is removed by some other process

[21:43:08] <Bumptious> another process removes it

[21:43:24] <joannac> get that process to send you a signal

[21:43:33] <Bumptious> i'd like to replace it. all it's doing actually would be giving "order:7" to another document

[21:43:37] <joannac> or mark it in the database

[21:46:01] <Bumptious> hmm yeah. i'm not sure why i feel compelled to make it work without that interaction

[21:48:41] <Bumptious> i think because i'm dealing with a game where there's potentially a lot of ansychronous actions going on

[21:50:50] <Bumptious> i'm just going to try it the way you suggest and see if problems arise :), thanks, joannac

[22:57:37] <mango_> Just working on a MongoDB puzzle, if there is a network partition, between D.C A (2 x secondarys) and D.C. B (1 x Primary) when the network partition has been removed, is there a re-election?

[23:00:53] <cheeser> i don't think so. that 3rd machine would just reestablish communication with the other 2 (one of which would be primary)

[23:01:45] <joannac> Well, the one in DC B will step down

[23:01:54] <joannac> and one of the machines in DC A will become primary

[23:02:07] <joannac> so after the partition is gone, the machine in DC A will stay primary

[23:05:27] <cheeser> right

[23:14:42] <diegoaguilar> Hello, Im a bit confused about how I should setup MongoDB security administration

[23:15:03] <diegoaguilar> I want to create users to perform auth, have a root administrator, that's it .. all privileges

[23:15:34] <diegoaguilar> and have a some users for read access to certain databases and some users for total control of SOME databases

[23:16:01] <mango_> @cheeser @joannac - thanks, that's correct, a re-election doesn't take place.

[23:16:05] <diegoaguilar> but at the start of docs it says that each user's role works for a unique database

[23:19:59] <mango_> diegoaguilar: did you try reading the documentation

[23:20:10] <mango_> it should start you off

[23:20:20] <mango_> if there's anything not clear, come back here again

[23:20:34] <diegoaguilar> I am reading it, I first tried reading the mongo website

[23:20:41] <diegoaguilar> but did some stuff wrong I guess

[23:20:53] <diegoaguilar> I downloaded the docs pdf

[23:23:06] <mongoexplorer> have you guys use https://github.com/mongodb/mongo-hadoop/blob/master/pig/README.md

[23:23:25] <mongoexplorer> mongo hadoop. I try to stream to output bson

[23:23:26] <mongoexplorer> didn't work

[23:23:34] <mongoexplorer> trying to pig

[23:23:42] <mongoexplorer> following example on the page

[23:24:03] <mongoexplorer> STORE raw_out INTO 'file:///tmp/whatever.bson' USING com.mongodb.hadoop.pig.BSONStorage;

[23:24:21] <mongoexplorer> gives me error output folder doesn't exist

[23:25:11] <cheeser> using that exact text?

[23:25:12] <mongoexplorer> anyone try it before?

[23:25:15] <lilVaratep> is mongodb a good choice for a social networking mobile app? (scaleable)

[23:25:19] <mongoexplorer> yup

[23:25:31] <mongoexplorer> just load text file which has 3 fields

[23:25:42] <mongoexplorer> and store

[23:25:44] <mongoexplorer> it 2 lines

[23:25:47] <cheeser> it's complaining the *store* ?

[23:26:17] <mongoexplorer> one sec I will put error in pastbeen

[23:26:28] <cheeser> and the pig script

[23:29:11] <mongoexplorer> http://pastie.org/9645823

[23:30:55] <cheeser> that's strange. it's not even running through the mongo-hadoop-pig code...

[23:31:03] <cheeser> did you register that jar in your script?

[23:31:35] <cheeser> doesn't look like it.

[23:31:55] <mongoexplorer> yup both core and mongo jar

[23:32:01] <mongoexplorer> pig

[23:32:19] <cheeser> but not the java driver?

[23:32:52] <mongoexplorer> REGISTER $lib_dir/mongo-hadoop-core-1.3.0.jar; REGISTER $lib_dir/mongo-hadoop-pig-1.3.0.jar;

[23:32:56] <mongoexplorer> this two

[23:33:03] <mongoexplorer> java driver?

[23:33:35] <cheeser> you'll need the mongo-java-driver jar for the bson support

[23:33:46] <mongoexplorer> let me try

[23:34:39] <mongoexplorer> it is included in pom.xml of my udf

[23:34:47] <cheeser> but not in the pig script

[23:35:01] <mongoexplorer> let me try

[23:35:10] <cheeser> transitive deps are not automatically included in a pig script.

[23:47:11] <mongoexplorer> cheeser: the same error

[23:48:11] <cheeser> same stack trace?

[23:48:36] <mongoexplorer> yup

[23:48:47] <cheeser> if there's nothing from mongo-hadoop in the stack trace, it's obviously not a mongo-hadoop problem.

[23:49:07] <mongoexplorer> when I do regual store

[23:49:10] <mongoexplorer> it works fine

[23:49:11] <cheeser> make sure the paths to your jars are correct and that you register them at the top of the script for each script you're running.

[23:49:17] <mongoexplorer> PigStore

[23:49:20] <cheeser> beyond that, i don't know what to tell you

[23:49:35] <mongoexplorer> k tnx

[23:58:57] <diegoaguilar> mango_, is it really needde to have a pem file to enable mongo auth?

Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 13th of October, 2014