PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 12th of May, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[03:16:09] <Lonesoldier728> anyone here use mongoose? trying to figure out what a reference looks like in a doc - does it store just the object id or the whole doc
[03:16:11] <Lonesoldier728> http://stackoverflow.com/questions/37176561/how-to-use-ref-to-store-a-document-correctly-in-mongodb-mongoose
[03:17:52] <Boomtime> @Lonesoldier728: i don't use mongoose, but i expect you'll get some background info from this; https://docs.mongodb.com/manual/reference/database-references/
[03:19:02] <Lonesoldier728> ok looks like just an id
[04:50:10] <YokoBR> hi guys
[04:50:14] <YokoBR> and ladies
[04:51:02] <YokoBR> i'm getting First key in $update argument is not an update operator when i try to findOneAndUpdate
[06:25:29] <YokoBR> i'm getting First key in $update argument is not an update operator when i try to findOneAndUpdate
[06:55:32] <febuiles> Is it possible to do $push and $pull in a single update?
[06:55:47] <febuiles> (operating on the same property)
[07:29:58] <ayee> The automation agent is grabbing my hostname and passing it around to other automation agents. Though my hostname isn't resolvable by dns. It's just some random string. Is there a way to have the automation pass around the ip address instead?
[07:35:10] <Boomtime> ayee: https://docs.cloud.mongodb.com/faq/#why-does-the-monitoring-agent-connect-with-hostnames-instead-of-ip-addresses
[07:35:23] <Boomtime> (it's true of all agents i believe)
[07:35:30] <Boomtime> and you can control it
[07:38:27] <ayee> Boomtime: thanks a lot, that helps
[07:38:48] <ayee> Boomtime: I also have stale agents that have been deleted in ec2, but I can't get rid of them in the webui. any tips?
[07:42:00] <ayee> also I can't find that preferred hostname setting. I'm using the mongodb ops manager, is this only available in the cloud manager perhaps?
[07:46:07] <ayee> hmm, I can't find 'group settings' anywhere. If I click on admin on the top right, I see 'Groups', but not 'Group Settings'
[07:46:31] <Boomtime> well, you shouldn't be using ops manager without a subscription, and since you definitely wouldn't ask here without it, you can raise a support ticket with your subscription
[07:48:28] <ayee> I just installed ops manager today, I'm on the 30 day trial
[07:48:46] <ayee> I'm trying to play/test it out, I haven't gotten to the stage where I can spin up a replica set yet
[07:49:25] <ayee> I actively selling the tool to management though, I think they'll let me pay for it. but I have to show a working demo first. heh
[07:50:22] <ayee> Is the trial version different than the subscribed version, meaning buttons are missing?
[07:50:56] <Boomtime> well, you know how to do what you want in cloud, and it's much easier to set up (since there is nothing to configure or provision) - why don't you just demo that? the UI is nearly identical
[07:52:40] <ayee> They'll want to see an on prem version, working with our prod images, in our private VPC, and working on aws and openstack.
[07:52:58] <ayee> We're not allowed to use cloud services, it has to be on prem.
[07:53:12] <ayee> I could just show them heroku if I could use a cloud service, that would be easier.
[08:10:10] <ayee> ahh, I was confused. I was going to admin -> groups. I should be going to settings -> group settings. Boomtime what value do I add, ends with or regex? and how do I specify an ip for all agents. This is confusing.
[09:44:31] <ayee> hmm I still get host unreachable with ^10.* in my Preferred Hostnames .
[11:17:22] <kurushiyama> ayee: Uhm, ips aren't exactly hostnames, are they? How does your name resolving work?
[13:09:37] <Mmike> Hi, lads. I'm trying to find out when opIdMem option was removed from mongod, does anyone knows?
[13:30:30] <grug> this probably isn't the right place to ask but basically i have a nodejs application that connects to a mongo database using the node mongo driver. i am querying a large collection and my results stream prematurely terminates any time i try to stream results from a query: http://stackoverflow.com/questions/37188184/mongo-connection-terminates-early-unexpectedly even if someone could point me in the right direction (can't see a channel for the mongo driver) ...
[13:30:30] <grug> ... that would be much appreciated! (code is in the SO link i posted)
[14:49:25] <oky> grug: what is your batch size? is it relative to the number of docs? (as in, is it stopping after one batch?)
[14:55:33] <grug> oky: nah it's not stopping after one batch - the batch size is 1000 at the moment
[14:55:58] <grug> changing the batch size to be more only slightly increases the amount of docs that get processed before the stream ends (i.e. it's not linear)
[15:06:37] <oky> grug: is there a cursor timeout?
[15:07:31] <grug> oky: there seems to be a few ways i can set a timeout - which way would you recommend
[15:07:41] <oky> grug: in a way which unsets the timeout
[15:08:14] <grug> i mean, there are a lot of timeout settings that you can tweak, and i'm unsure which one i should be tweaking
[15:08:19] <grug> i could add a cursor flag
[15:08:44] <oky> i'm asking if you are running into a timeout
[15:08:56] <grug> well it's hard to determine whether i am or not
[15:09:21] <grug> theres no error thrown and it doesn't nessarily happen after x minutes every time
[15:09:37] <oky> grug: how long does your process tend to run for?
[15:09:58] <oky> try setting the noCursorTimeout option
[15:11:26] <grug> yeah i set the nocursortimeout option and it still seems to end. the process tends to run for about 5 minutes
[15:11:46] <grug> but when the amount of docs to process is smaller (i.e. under 100k documents) then it might run for less than a minute
[15:12:33] <oky> and regardless, it is not correctly getting all your batches
[15:13:19] <grug> correct :)
[15:13:41] <oky> maybe there's a bug in the way you are filling currentBatch and dispatching it - maybe check how many times it gets filled (+ how big it is) and so on
[15:13:48] <oky> and then also count the number of docs coming over the mongo stream
[15:15:22] <grug> yeah that's what i've been just messing with at the moment. i think it must be a bug with my logic in how i fill my batches and dispatch them because if i remove the amazon upload stuff and purely just count documents coming in over the stream (i.e. in the stream.on('data'... section) then it gives the correct count
[15:15:44] <mylord> how do you find inside the 0th array element of transactions? db.payments.find({"transactions[0].amounts.total”:”1.00”}).pretty()
[15:15:45] <grug> so i'm just trying to work out where the flaw in the logic is - need to do a count of the docs coming in compared ot how many were sent
[15:16:59] <oky> grug: good luck - add more and more prints :-D
[15:18:20] <oky> grug: do you dispatch when the stream ends? (what do you do with the remaining items in the batch)
[15:18:44] <oky> anyways, ttyl - i'm sure you'll figure it out
[15:19:32] <grug> well the problem is that i can't store all the documents in memory - node falls over, i believe, otherwise i'd just store it all in memory and dispatch once the stream ends
[15:20:55] <grug> i do pause the stream, so i imagine that stops the 'end' event from being fired
[15:42:13] <kurushiyama> oky: Could it be that the processing of the "stream" takes quite a while?
[15:43:29] <grug> kurushiyama: were you referring to me or oky?
[15:43:41] <grug> i'm not sure what you mean by the 'processing' of the stream
[15:44:42] <kurushiyama> grug: Sorry. yes. Ok. you read from a cursor, and after a while that reading terminates, as far as I got it?
[15:47:11] <kurushiyama> grug: Or, to put it different: You do a query and iterate over the result set and then suddenly the connection breaks?
[15:48:46] <grug> kurushiyama: yep
[15:49:06] <kurushiyama> grug: How long does it take for the connection to break?
[15:49:09] <grug> the 'end' event is fired before i am done processing my results
[15:49:23] <grug> kurushiyama: depending on how many documents are returned by the query, anywhere between 1 minute-5 minutes
[15:50:29] <kurushiyama> grug: Hm. You might want to disable the cursor timeout just to make sure.
[15:51:34] <kurushiyama> grug: May I ask what you trying to achieve? Maybe we can find an aggregation or something like that, which does not force you to iterate over a large result set?
[15:53:23] <cpama> hi all
[15:53:37] <cpama> just wondering if anyone can shed some insight into this: http://stackoverflow.com/questions/37189133/mongo-php-app-error-fatal-error-uncaught-exception-mongoexception-with-messa
[15:55:01] <grug> kurushiyama: i have disabled the timeout - it doesnt do anything, unfortunately. what i am trying to achieve is that for any documents in my collection, any of them that have to be added to an AWS search index are uploaded
[15:55:07] <grug> unfortunately an aggregation won't help here
[15:55:23] <grug> because there is always the chance that there will be a large set of documents that need to be added to the search index
[15:56:21] <kurushiyama> grug: Do you have MMS charts so we can identify the problem?
[15:56:56] <grug> i don't
[15:57:11] <grug> i think the logic is likely in how im batching up requests and sending them to be processed
[15:57:17] <grug> but i can't see the logic flaw
[15:58:05] <kurushiyama> grug: Uhm, I do not get it. You have multiple requests? But expect a single result set?
[16:02:46] <grug> kurushiyama: i never said that. i have a query that has about 1.3 million results, which i stream since it's too big to fit in a single response (due to bson size limits) - what is happening is that i want to batch up the responses (i.e. every time a data event is fired, i add the document that is fired to a batch) and upload them in batches of X (which may be 1,000 for arguments sake) to my AWS service
[16:03:04] <grug> the problem is that the db fires the 'end' event while i am still processing documents from 'data' events
[16:04:47] <oky> "8:13 grug | i do pause the stream, so i imagine that stops the 'end' event from being fired"
[16:04:48] <kurushiyama> grug: Hm. I am no node expert, tbh.
[16:04:54] <oky> grug: lol
[16:05:36] <grug> oky: ?
[16:06:02] <oky> grug: oh, i was just appreciating how the problem solving went. it started with stating your assumptions, then drilling down and figuring out that the assumption wasn't valid
[16:06:34] <oky> the statements: "i imagine it behaves this way", followed by 30 minutes investigation and then "it does not behave this way" made me laugh
[16:07:55] <grug> well im still not entirely sure whether my assumption is valid or not
[16:12:21] <oky> grug: hopefully i'l get more lols, then :P
[16:15:56] <grug> oky: ok so i added a .on('error', function()... handler and got the following error
[16:16:05] <grug> { [MongoError: cursor killed or timed out] name: 'MongoError', message: 'cursor killed or timed out' }
[16:16:23] <grug> so it has to be a timeout thing... but it doesnt seem to matter what i set timeout values to, it still does the same thing
[16:16:36] <grug> and that error doesnt fire every time either
[16:16:52] <grug> sometimes the cursor ends without an error, but doesnt process everything like i expect
[16:18:39] <grug> it's pretty bizarre
[16:23:16] <mylord> How to find an element in myarray[0].value == “x” ?
[16:37:14] <ioggstream> hi @all. anybody knows why the explain() indexOnly attribute has been removed in 3.0 ?
[17:10:23] <Bookwormser> What replaced auth=true/false in Mongo 3.2? I'm trying to enable username password authentication in the conf file, but i've not been successful.
[17:52:35] <mylord> how do you find in 0th position in a array?
[17:53:31] <mylord> ie, how to find({“myarray.0.amount”:”123” })?
[18:02:05] <kurushiyama> mylord: Iirc, you can not rely on array order. If you have to, there is most likely something wrong with your data model, the first place.
[18:04:35] <cpama> hi all. i need some ideas on where I might have gone astray with my PHP code. Trying to insert an array into a collection. I've tested to make sure it's legit json. I've manually added the data to the collection using robo mongo and it seems to work.
[18:04:41] <cpama> but i can't get the php code happy
[18:05:08] <cpama> http://stackoverflow.com/questions/37189133/mongo-php-app-error-fatal-error-uncaught-exception-mongoexception-with-messa
[18:05:10] <cpama> that's the code.
[18:05:27] <cpama> Last thing I tried was to play around with the write concerns options...
[18:05:46] <cpama> setting to 0 makes the errors go away, but nothing is actually written to the database
[18:06:01] <cpama> setting to 1 gives me the original error message about an invalid key value
[18:06:26] <cpama> thanks in advance for reading the post / question.
[18:07:31] <Derick> 'test.server.2'
[18:07:36] <Derick> you have . in your keyname
[18:07:38] <Derick> you can't do that
[18:07:42] <cpama> ?
[18:07:49] <cpama> but i can add it manually to db via robomongo
[18:08:06] <cpama> so when you say you can't do that... you mongodb can't handle it? or php?
[18:08:18] <cpama> Derick, ^^
[18:08:28] <Derick> MongoDB
[18:08:39] <Derick> robomongo might do something else to it
[18:08:48] <cpama> Derick, hm... let me do a query via command line to see what it created.
[18:08:53] <Derick> the drivers all check for this
[18:09:22] <cpama> it's in my database with "."
[18:09:24] <Derick> mongodb might accept it - but drivers guard against it
[18:09:29] <cpama> i see.
[18:09:34] <Derick> you can then no longer query on it.
[18:09:39] <Derick> don't put a . in your keynames
[18:10:15] <cpama> is there any documentation that might explain why this is the case Derick
[18:10:24] <cpama> (changing code as we speak...)
[18:10:29] <kurushiyama> cpama: Might well be that the dot gets escaped ot sth.
[18:10:35] <Derick> answered on stackoverflow too
[18:10:48] <Derick> 73 $location['_id'] = getNextSequence("locationid");
[18:10:50] <cpama> ok cool. Thanks Derick
[18:11:00] <Derick> that looks like an antipattern too - do not reinvent autoincrements
[18:11:02] <cpama> confusing because the database accepts the value...
[18:11:12] <Derick> it does not work when you need to go beyond one node
[18:11:19] <Derick> (without locking or other issues)
[18:11:24] <kurushiyama> cpama: The reason is simple. A dot has a semantic meaning. Given {foo:{bar:"baz"}}} you'd reference bar as "foo.bar".
[18:11:33] <cpama> kurushiyama, ah!
[18:11:34] <Derick> stick to the auto-generated _id values, or use a real key
[18:11:35] <cpama> light just went on
[18:11:55] <cpama> Derick so I haven't shown the definition of that function... but it is using mongo counters:
[18:12:45] <cpama> Derick, http://pastebin.com/dbH7nmKj
[18:13:01] <Derick> cpama: "mongo counters" is not something real
[18:13:20] <Derick> you're reinventing auto increment there - don't do it, it will bite you later
[18:13:29] <kurushiyama> cpama: Uhm... You are aware of the fact that you should only initialize your client _once_ per application?
[18:13:56] <Derick> you should, although it's clever enough not to make two connections
[18:14:12] <cpama> Derick, this the article i was trying to follow: h it's clever enough not to make two connections
[18:14:18] <cpama> https://docs.mongodb.com/manual/tutorial/create-an-auto-incrementing-field/
[18:14:33] <cpama> Derick so i shouldn't be doing that?
[18:14:51] <kurushiyama> Why is it that the most unfortunate documents always seem to be the most read? ;)
[18:15:00] <cpama> kurushiyama, ok. i will do that. create a global connection object and reuse that...
[18:15:36] <cpama> Derick, kurushiyama so which document/tutorial should i follow?
[18:15:39] <Derick> cpama: you are however using the old deprecated driver, so if this is a new project, please please use the new one: https://github.com/mongodb/mongo-php-driver / https://pecl.php.net/package/mongodb - perhaps in combination with the library: http://mongodb.github.io/mongo-php-library/ / https://packagist.org/packages/mongodb/mongodb / http://mongodb.github.io/mongo-php-library/
[18:16:09] <Derick> cpama: just let the driver generate _id for you, if you don't have another candidate key yourself
[18:16:20] <kurushiyama> cpama: Here is my advice: KISS, and KISS a lot. Use as little moving parts as you can possibly manage.
[18:16:25] <cpama> Derick, I'm limited to this one for now... because I don't control the environment I build in. However, I have asked my sys admin to look into the updates for me
[18:17:02] <cpama> Derick, so if i insert a PHP array without the "_id" it should just create one, no?
[18:17:04] <Derick> cpama: you're using an old PHP version too
[18:17:13] <Derick> cpama: yes - it will create an ObjectID value as _id then
[18:17:41] <cpama> ok. and here I thought I was being a good little programmer by reading the docs.
[18:17:42] <cpama> hee hee
[18:19:14] <kurushiyama> cpama: Which is good. I would suggest playing around with the shell a bit, scrap any GUI tools and get used to MongoDB. One of the best parts you can read is https://docs.mongodb.com/v3.0/applications/data-models/
[18:19:32] <cpama> thanks guys
[18:19:37] <cpama> i will definitely give that a read.
[18:19:43] <kurushiyama> cpama: Experiment by hand, in the shell. use ".explain()" as often as you can.
[18:19:51] <cpama> i'm trying to push our (large) org to move away from RDBMS to nosql
[18:19:58] <cpama> so I have to get a working prototype...
[18:20:03] <cpama> appreciate all the help
[18:20:28] <cpama> fwiw, i started off with couchbase as well, and it's really good.. but i find the support and docs for mongodb are ... way better.
[18:20:36] <cpama> thanks for all your hard work!
[18:22:48] <Derick> no prob :-)
[18:23:18] <kurushiyama> I just say "aggregation pipeline"...
[18:29:05] <cpama> Derick, is there anything else I need to do after the insert? i've changed all "." to "-" and am using this for my insert statement:
[18:29:18] <cpama> $cursor = $collection->insert($location, array("w"=>1));
[18:29:18] <cpama> 75 var_dump($cursor);
[18:29:26] <cpama> oops . 75 = line number.
[18:29:33] <cpama> when I dump the cursor i see this:
[18:29:40] <cpama> array (size=4)
[18:29:40] <cpama> 'ok' => float 1
[18:29:40] <cpama> 'n' => int 0
[18:29:40] <cpama> 'err' => null
[18:29:40] <cpama> 'errmsg' => null
[18:29:48] <cpama> but no records in the collection.
[18:30:51] <cpama> bah. typo.
[18:30:55] <cpama> my bad. i figured itout
[18:31:36] <Derick> you don't have to specify w=1 anymore, it's the default
[18:32:00] <cpama> ok
[19:06:48] <dreamreal> w/in 3
[19:30:02] <erg0dic> if I run an aggregation on mongo 2.4, is there no way to send the result of the aggregation straight into a collection without first loading it in memory?
[19:32:47] <kurushiyama> erg0dic: You mean on the server?
[19:35:40] <erg0dic> kurushiyama: correct, like I run db.collection.aggregate([pipeline]) in the shell on the server
[19:36:59] <kurushiyama> erg0dic: Uhm, so how should that work, given the premise that the output of the current stage is supposed to be the input of the next?
[19:37:51] <erg0dic> so in 2.6 there is an $out stage operator that sends the result set straight to a collection
[19:38:36] <erg0dic> but it does not exist in 2.4, further db.collection.aggregate([pipeline]) does not appear to be able to return a cursor so you get an array of the entire result set at once
[19:39:22] <kurushiyama> erg0dic: Right, but still it is in RAM, sorta. On the server. What prevents you from updating, If I may ask. 2.4 is archeology, so to say. 2.6 is almost at EOL.
[19:40:01] <erg0dic> circumstances beyond my control :)
[19:41:33] <kurushiyama> Daring decision. Well, you have to live with what you are given, and walk down that line. I'd report back the estimated development time and impact on UX and drop a note that this functionality is implemented in 2.6
[19:42:43] <kurushiyama> Maybe with the additional info that 2.4 is not supported any more, should TSHTF.
[19:47:44] <deathanchor> erg0dic: I get around that by coding up my own queries that update another collection
[19:48:07] <deathanchor> the old aggregator was fun to play with, but sucked for performance and large result sets.
[19:49:12] <ayee> kurushiyama: /etc/hostname, or `hostname -f` has a random string that gets returned. It doesn't resolve to any name servers. So I want the automation agent to use the ip on the interface.
[19:50:12] <kurushiyama> ayee: Uhm, sorry. Can you describe the problem, again. It has been a while... ;)
[19:50:44] <erg0dic> kurushiyama: thanks anyways! our team is well aware of the limitations, its only a matter of time
[19:51:28] <ayee> kurushiyama: The ops manager / automation agent(s) combination is grabbing `hostname -f`, and passing that around and trying to connect to it. WHen I try to spin up a replica set of 3, I get host unreachable everywhere.. and I see 'host' as the random string from `hostname -f`.
[19:51:41] <ayee> I want the automation agents to use the interface ip, and host hostname -f
[19:52:18] <kurushiyama> ayee: Basic requirement. Hostnames are required to be properly resolved. Period.
[20:26:47] <jr3> is there a way to write a query that find a document, selects a property, and then sets a new property based on the selected property?
[20:26:52] <jr3> s/find/finds
[20:45:57] <oky> deathanchor: what sort of perf do you get with the current aggregator?
[20:55:03] <kurushiyama> oky: Personally, I get results between "great" and "awesome".
[21:15:46] <oky> kurushiyama: any numbers? with my current mongo install, full table scans on 1 - 3 mill docs take a few seconds
[21:15:54] <oky> (all on a single node)
[21:16:22] <oky> that's with 2.6, though - so just curious what other numbers are like
[21:17:35] <kurushiyama> oky: I rarely do full table scans.
[21:18:27] <oky> kurushiyama: do you do time series queries? when i do a group by time, mongo seems to want to do full scans (regardless of if i'm filtering my docs or not) - i will take a look at the query plan and see what's up in a bit
[21:18:39] <kurushiyama> oky: If you'd give me a sample doc, I could generate an according number of docs and see what performance I can acquire for a given aggregation
[21:19:10] <kurushiyama> oky: Depends on your indices, ofc.
[21:19:21] <oky> kurushiyama: sample doc: {hostname: "blah blah", status_code: 200, timestamp: Date.now() / 1000}. i will verify the indeces
[21:19:23] <kurushiyama> oky: And index order.
[21:20:03] <kurushiyama> oky: Date.now() / 1000?
[21:21:04] <kurushiyama> oky: So you want to group by sec?
[21:21:41] <oky> kurushiyama: yeah, group by 600 seconds or something, requires doing a math oper
[21:22:18] <kurushiyama> oky: Hmm. Not sure about this.
[21:22:52] <kurushiyama> Let me first generate the docs. Like 10M, one week span?
[21:23:20] <kurushiyama> oky: with selection of like 5 http codes?
[21:24:46] <oky> sure
[21:24:58] <oky> kurushiyama: now that you mention it, i remember the time query being ungainly - let me paste what mine looks like
[21:25:12] <oky> there's probably a bunch wrong with what i did
[21:25:37] <kurushiyama> oky: Well, nothing better than "accidentally" fixing a prob
[21:28:57] <oky> i don't even remember what it's doing any more, will take a little bit for me to understand how it's doing a group by time
[21:29:37] <oky> kurushiyama: i wouldn't worry about making a time series query, seems a bit annoying after all
[21:30:33] <kurushiyama> oky: Maybe, with that shallow model, InfluxDB is more suited to you?
[21:31:10] <oky> kurushiyama: ha. i was just giving an example doc - but yes, maybe influxdb is faster
[21:32:15] <kurushiyama> oky: That is not the point. I bet we can make MongoDB fast, too. But grouping by time intervals etc is simply easier.
[21:33:01] <oky> kurushiyama: regardless, i am just curious about current mongodb perf. for table scans
[21:34:52] <kurushiyama> oky: collscans always suck
[21:37:44] <kurushiyama> oky: Ok, docs are generated now.
[21:38:53] <kurushiyama> oky: If you want to group by time, I'd always use an according index. Given the fact that ISODates are stored as 64bit ints, even millions of entries translate just to a few MB.
[21:40:33] <kurushiyama> oky: So, how does your aggregation look like – and what do you want to achieve?
[22:07:31] <kurushiyama> oky: I assume roughly something like http://hastebin.com/zejelizenu.sm ?
[22:08:51] <oky> kurushiyama: sure, looks good to me
[22:10:12] <kurushiyama> oky: Still generating. 3.5M atm, still some way till 10M ;)
[22:10:28] <oky> kurushiyama: what's a query on 1M look like?
[22:10:48] <oky> or even 3.5mil
[22:11:29] <kurushiyama> Let us give it another cigarette for me, then we'll find out ;)
[22:23:08] <kurushiyama> oky: Unoptimized, on my heavily loaded laptop, 32277 msecs, with 4506002 documents. That is 0.00716311266617msecs/doc or 7.163... microseconds/doc
[22:23:20] <kurushiyama> oky: Not bad in my book.
[22:24:07] <kurushiyama> oky: But with the use case I took as an example, we have plenty of room to even optimize.
[22:24:50] <oky> is that 32 seconds to search 4million?
[22:25:06] <oky> what is msec? micro or milli?
[22:25:28] <kurushiyama> oky: No. Iterating over _all_ of them. msecs are milliseconds.
[22:25:55] <kurushiyama> oky: But this is rarely what you want. You want a stat for a given time frame, no?
[22:25:59] <oky> using aggregation framework?
[22:26:04] <kurushiyama> oky: Yes
[22:26:25] <oky> i tend to want to run a filter on the data, then take an aggregation grouped by time
[22:26:36] <oky> f.e. filter that host name = foo
[22:26:55] <kurushiyama> okay. Then, we obviously need an index on name
[22:27:08] <oky> did you already put an index on time?
[22:27:53] <kurushiyama> Oky: nope
[22:28:19] <kurushiyama> oky: We should stick to your use cases, however ;)
[22:28:54] <kurushiyama> So we put a match on "hostname", then?
[22:31:29] <oky> sure, but... this was an example dataset with small amount of fields. query usually looks like: "SELECT AGG(col1), AGG(col2) FROM dataset WHERE filter1, filter2, filter3 GROUP BY time_bucet, dim1, dim2, dim3;
[22:31:49] <oky> the fields being filtered on are usually arbitrary, wouldn't know them ahead of time - but let's say i make an index of every field
[22:32:33] <oky> i think 32s for 4.5 million seems like too much, should be more like 3seconds
[22:32:35] <kurushiyama> oky: That is not ideal, to say the least. Unindexed matches always lead to a collection scan
[22:33:06] <kurushiyama> oky: My Laptop is quite heavily loaded, mind you, has only 4GB of RAM and the dataset was not preheated ;)
[22:33:35] <kurushiyama> oky: Plus, we iterated over 4.5M docs. All of them
[22:34:03] <oky> kurushiyama: what's your CPU?
[22:34:29] <oky> or laptop model
[22:34:51] <kurushiyama> oky: 2.13GHz Core2 Duo...
[22:35:07] <kurushiyama> oky: Late 2010 MBA
[22:35:29] <oky> yeah, sure - i think the laptop being loaded is a problem
[22:35:46] <kurushiyama> oky: And still we are talking of some 7 _micro_seconds per doc.
[22:38:43] <kurushiyama> oky: What I would suggest for optimizing is to narrow down the time frame. For example for charts, I limit the date range. That is where you can get the biggest performance boosts. Another option is to do preaggregations.
[22:39:25] <oky> kurushiyama: if you limit the date range does it not do the full table query? i think that's what i ended up having trouble with
[22:40:48] <oky> kurushiyama: thanks for looking into it
[22:40:59] <kurushiyama> oky: It should not. However, you need to do an early match. Let's say you want the statuses for a hostname named "a" on May 6th: db.example.aggregate({$match:{"hostname":"a",date:{$gt...,$lt}}...)
[22:41:45] <oky> it seems like the perf boosts come when aggregating across multiple machines
[22:43:01] <kurushiyama> oky: Well, that heavily depends. Making proper decisions with early matches, limiting the fields passed to the next stage to the bare minimum (hard to optimize with you data model) and narrowing down to what you really want helps a lot.
[22:55:56] <kurushiyama> oky: With something like that, I am even able to do reduce the aggregation for a single day to slightly over a second
[22:56:08] <kurushiyama> On my crappy machine ;)
[23:08:10] <kurushiyama> oky: And we are talking of an average of 563250.25 events/day. Honestly, I can not see this as slow. Not the slightest. Especially on my rather crappy laptop.