PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 24th of June, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:02:02] <donCams> hi. how do i create a query to find several documents with "where", sorting, and pagination using spring data mongorepository? i can't find any example
[07:36:28] <[AD]Turbo> ciao all
[08:32:36] <fatih> Hi everyone
[08:32:55] <fatih> I have document like: http://d.pr/i/p9uW
[08:33:11] <fatih> I can fetch the document and find the greates number in the bar field
[08:33:27] <fatih> however is there any way I can obtain it via mongo query?
[08:33:29] <Derick> uh, can you show actual (json) documents?
[08:33:50] <fatih> or it is better to make it on client side
[08:33:56] <fatih> Derick: no problem, wait a sec
[08:36:02] <fatih> https://gist.github.com/fatih/fe443b73ea4de3886b67
[08:37:14] <fatih> on the client side i'm fetching a single document (querying for username) and find the greatest number
[08:49:19] <Nodex> are you doing this in an aggregation?
[08:53:12] <fatih> Nodex: what do you mean in an aggregation ?
[08:56:39] <Nodex> are you trying to do this in an aggregation
[08:56:54] <Nodex> it really doesn't get any simpler than that when asking a question
[09:00:52] <fatih> Nodex: I don't use aggregation, I just simple use find() to geth the document
[09:01:11] <fatih> my question was if it is possible to get the greates number with a query
[09:01:20] <Nodex> and all you want is the largest number on a sub document?
[09:01:37] <fatih> I'm a newbiew therefore sorry if my questions doesn't mean you anything
[09:01:49] <fatih> yeah the largest number in the bar sub document
[09:02:04] <Nodex> without a map/reduce you cannot do that in a quesry
[09:02:06] <Nodex> query*
[09:02:16] <fatih> ok thanks Nodex
[09:02:25] <fatih> than the best way is to make that on the client side
[09:02:31] <Nodex> much more efficient to do it client side
[09:02:51] <Nodex> most languages have a way to get the largest key in an array
[09:10:23] <fatih> Nodex: I have already implemented this on client side, but I just wanted to ask again to learn if my approach was right or not
[09:10:27] <fatih> thanks agian
[09:10:34] <Nodex> no probs
[11:12:30] <neophy> I have a collection which has documents some thing like: http://pastebin.com/RsFK5CsT
[11:13:43] <neophy> how to aggregate total number of hostname count?
[11:13:46] <neophy> say
[11:14:03] <neophy> {"hostname" : "aaa", "total" : "5"} {"hostname" : "abc", "total" : "4"} {"hostname" : "bbb", "total" : "3"}
[11:19:34] <kali> neophy: you can use the aggregation framework ($group)
[11:19:39] <kali> neophy: or distinct
[11:19:56] <kali> neophy: or group()
[11:20:10] <kali> aggregation framework is recommended
[11:27:45] <neophy> kali: db.syslogevents.group({key: {'hostname': true},initial: {sum: 0},reduce: function(doc, prev) { prev.sum += 1}});
[11:28:02] <neophy> it will return the count for all days
[11:28:17] <neophy> but I want to get count for today only
[11:29:07] <neophy> how to alter the above query to group based on today timestamp and hostname?
[11:35:05] <aeggea> hello mongodb idiots
[11:36:17] <kali> neophy: add a "cond" to you group, but seriously, you should switch to the AF and get rid of this ugly javascript
[11:37:24] <Derick> aeggea: please keep it civil
[11:39:40] <unsleep> can i do a find inside a second level array?
[11:39:49] <Derick> yes
[11:40:26] <unsleep> i want to do a statistic but i think that put every visit inside asecond level is will be very bad thing. isnt it?
[11:41:47] <Derick> yes, as that makes the size of each document grow an grow and grow - and that means MongoDB will need to move it around on disk a lot
[11:41:58] <Derick> it's better to store each visit in its own document in a different collection
[11:42:24] <kali> and you're in risk of hitting the 16MB limit at some point
[11:43:06] <unsleep> so i was a bit right about it hihihihi
[11:43:32] <unsleep> i think i will slip by hour
[11:43:40] <unsleep> split
[11:44:56] <unsleep> i dont want to need to do a map reduce to process the info and show the stats so i must to think it very carefully
[11:45:26] <Derick> unsleep: instead of Map Reduce, you should most likely use the Aggregation Framework anyway
[11:46:00] <kali> unsleep: that will not completely solve the "growing document" issue
[11:47:20] <unsleep> sure, having a million visits an hour would be a big problem ¬_¬
[11:48:00] <Derick> unsleep: you can either do pre-aggregation (ie, just store the total, and not each visit) or really just have one document in a collection per visit and use A/F
[11:48:11] <Derick> i wouldn't use a mix
[11:49:02] <unsleep> i was thinking about that.. "preprocess" the info with php and destress the db process
[11:49:45] <unsleep> i think taht use the db like a "onthefly calculator" is a big mistake
[11:51:19] <unsleep> i could do two colections but that is very similar to do one and then a map reduce
[11:57:34] <unsleep> having in count tha 15mb limit i dont see other way to do it than store everything and then do an aggregation or map reduce
[12:02:18] <unsleep> or simply forget to store who done what when.... like you said
[12:17:57] <unsleep> do you recommend to use $inc?
[13:03:59] <dreamchaser> hi. i want to export all document created between saturday and sunday to an csv file. while i use the mongoexport, the query part always have some errors.
[13:04:27] <dreamchaser> mongoexport --db watersystemdatabase --collection nidatas -q '{date:{ "$gte":ISODate("2013-06-22T00:00:00.000Z"),"$lt": ISODate("2013-06-22T00:00:10.000Z")}}' --out dbdump.csv
[13:04:50] <dreamchaser> assertion: 16619 code FailedToParse: FailedToParse: Bad characters in value: offset:15
[13:05:12] <dreamchaser> while the code will execute correctly in db.nidatas.find
[13:05:17] <dreamchaser> need help...~~
[13:06:14] <dreamchaser> the code above just extract very little data, i try to fix the syntax for now..
[13:10:52] <kali> dreamchaser: you can't use this javascript syntax in the query argument of an export
[13:11:03] <kali> dreamchaser: you need to fallback to the strict json syntax
[13:11:28] <kali> dreamchaser: http://docs.mongodb.org/manual/reference/mongodb-extended-json/ one of these should work :)
[13:12:08] <kali> dreamchaser: (i think it's the "strict" mode)
[13:12:34] <kali> dreamchaser: and you need double quote arount the key names ("date")
[13:12:52] <dreamchaser> thank kali. looks like Date(epoche) works. another question, does mongoexport needs field to be specified?
[13:13:11] <dreamchaser> is there a way to export all field?
[13:13:14] <aeggea> no it does not, as you can read from the docs
[13:13:17] <kali> dreamchaser: in csv/tsv, i think it's needed, yes
[13:23:40] <dreamchaser> thanks kali
[13:38:49] <hotsnow> { "_id" : ObjectId("51c5a886b38ccf329cbb4258"), "net" : { "eth0" : { "ip" : 1 }, "eth1" : { "ip" : 2 }, "eth2" : { "ip" : 3 } } }
[13:38:50] <hotsnow> I have many documents like this. how can i get the doc where ip == 3 ?
[13:45:05] <kali> hotsnow: you can't. you should never use values as keys, only keywords
[13:45:29] <kali> hotsnow: you need to refacetor to { "_id" : ObjectId("51c5a886b38ccf329cbb4258"), "net" : [ { iface: "eth0" : "ip" : 1 }, ... ]
[13:50:39] <hotsnow> thanks kali
[13:52:11] <hotsnow> but i don't know the the difference between keys and keywords, as the word you said
[13:53:28] <aeggea> do you know anything at all?
[13:53:51] <hotsnow> I mean i didn't know why shouldn't use that defined.
[13:53:57] <kali> the "keys" are what is left of the ":" in the documents. the keywords are the words that means something to your application
[14:08:59] <pwned> hotsnow: because if you use an array, you can query it like find({"net.ip": 3})
[14:10:33] <pwned> hotsnow: and you can add to it like update({"_id": 5}, { "$push": { iface: "lol0", "ip": 42 } })
[14:11:41] <hotsnow> thanks owned, I will try to change my doc defined
[14:11:45] <pwned> $push: "net" I mean
[14:11:55] <hotsnow> thanks pwned
[14:12:50] <pwned> in other words arrays are mutable and searchable
[14:15:34] <kali> ha ho ! there was a followup
[14:15:40] <kali> hotsnow: sorry, missed it :)
[14:16:13] <hotsnow> i defined the data structure like perl hash, so i make a mistake
[14:18:31] <hotsnow> It doesn't matter, thanks everybody
[14:18:47] <bobinator60> if I have an array inside my document, can I have two $elemMatch clauses to have them match two different elements of the array?eg values: $elemMatch{type:distance, value:50}, values: $elemMatch{type:color, value: blue}
[14:19:04] <Nodex> yes
[14:19:18] <bobinator60> Nodex: yes, to me?
[14:19:21] <Nodex> yes
[14:19:34] <Nodex> I dont see any other question being asked ;)
[14:19:49] <bobinator60> Nodex: i just dropped in. there could have been 7 pending questions
[14:20:31] <bobinator60> anyway, thanks, and i thought so. mongoengine is the culprit
[14:20:39] <starfly> Not so much, bobinator60, although there was pending idiot who's left
[14:21:02] <bobinator60> this completed idiot is leaving, too. thanks!
[14:21:09] <Nodex> I don't have joins visible sorry
[14:21:17] <starfly> not you, bobinator60!
[14:39:32] <vandnas> test
[14:55:15] <bobinator60> has anyone ever seen this error: Error: unrecognized flag [-] 45
[14:55:41] <bobinator60> its coming from the mongolab web query, but I don't know if its a mongolab error or a mongodb
[14:56:13] <kali> bobinator60: mongodb error usually have a five figures error code
[15:01:52] <bobinator60> grrr
[15:03:02] <bobinator60> that makes two pieces of software between me and mongodb that are unreliabile today
[18:46:06] <n06> Does anyone know how indexes are distributed across a cluster? does each shard have to hold the full index?
[18:55:10] <kali> n06: its shard have a partial index for the values it owns
[18:55:29] <n06> kali thanks so much
[19:31:10] <saml> update["foo."+userInput] = someData; db.collection.update(someQuery, update); is this safe?
[19:31:42] <saml> I wantt o update foo.somesubfield of document
[19:32:25] <n06> if i remember correctly mongo can execute arbitrary javascript, so no that is not safe
[19:32:30] <saml> uncaught exception: can't have . in field names [crop.default]
[19:32:33] <n06> you have to sanitize the user input
[19:32:41] <saml> db.Images.update({_id: "1.jpg"}, {"crop.default": {x:1}})
[19:32:49] <saml> i guess i can't update like that
[19:33:07] <n06> you have a big XSS vuln that way
[19:33:09] <n06> so be careful
[19:33:39] <crudson> saml: use $set to just update one attribute or else you will just replace the document
[19:34:10] <saml> yup just found that out
[19:34:13] <saml> haha my doc is gone
[19:37:32] <crudson> re. n06's comments: as long as you are not passing input directly to $where, db.eval() or map reduce function strings you can't just pass user inputted javascript into a query like that.
[19:38:11] <n06> crudson, can or can't? I can't tell if you are agreeing with me or not
[19:38:28] <n06> do only $where and db.eval() execute code?
[19:38:52] <crudson> I am generally disagreeing :) http://docs.mongodb.org/manual/faq/developers/#how-does-mongodb-address-sql-or-query-injection
[19:39:42] <n06> crudson, interesting, thanks for the heads up
[19:39:56] <n06> i would still caution not sanitizing user input. Never trust your users :)
[19:40:13] <n06> *caution against
[19:41:57] <crudson> n06: no probs. There are situations where you have to be careful, but it's a different deal than with SQL strings. A string containing javascript doesn't magically become query json unless using eval/where, as detailed above.
[19:42:56] <kali> using $where or eval() (or even map reduce) is a bad idea anyway
[19:43:14] <kali> it should be the exception
[19:43:35] <kali> (and that the effort to prevent injection becomes minimal)
[19:44:09] <n06> kali, mapreduce is a super useful function, but i have only used it an a user-isolated case.. its never anywhere near an endpoint
[19:45:15] <n06> crudson, absolutely, i understand now. My point from before was simply that keeping good security practices is always a good idea. Never trust anything or anyone but yourself haah
[19:46:28] <crudson> n06: very true
[19:48:15] <kali> n06: map reduce is not so usefull since the aggregation framework has been introduced :)
[19:53:05] <crudson> kali: eval is not so bad as before with 4.2, with less strict locks and multi-threading. Map reduce is very useful. Try doing a .aggregate on a big collection, or anything more complex than the exposed functions provide.
[21:53:31] <Bartzy> Yo :D
[21:54:52] <Bartzy> I have a comments collection and a photos collection. Each commend has a "photo_id". I want to do a map reduce that gets the number of comments for each photo, but something like gaussian function
[21:55:21] <Bartzy> so essentially I want to see what is the count of comments for -most- of the photos.
[21:55:23] <Bartzy> How can I do that ?
[22:03:33] <paulkon> Bartzy: so a mean of the number of comment docs corresponding to all photos combined?
[22:03:45] <paulkon> still not sure what you want to do
[22:04:05] <Bartzy> paulkon: I want to see how many photos have 1 comment, how many have 2 comments, 3, 4.. up to something normalized
[22:04:08] <Bartzy> like
[22:04:20] <Bartzy> http://en.wikipedia.org/wiki/Gaussian_function
[22:05:39] <paulkon> and you want to render out a guassian distribution graph of the comment data
[22:05:39] <paulkon> ?
[22:05:46] <Bartzy> eventually yeah
[22:06:27] <Bartzy> Have no idea how to normalize the data though. I now got that I need to get data that looks like: 1 comment -> 20 photos, 2 comments -> 35 photos, 3 comments -> 400 photos
[22:06:28] <Bartzy> etc
[22:06:35] <Bartzy> but have no idea how to normalize it to gaussian
[22:09:51] <Bartzy> paulkon: ?
[22:10:17] <crudson> Bartzy: do something like: db.grp.aggregate([ {$group: {_id:'$a', count:{$sum:1}}}, {$group: {_id:'$count', total:{$sum:1}}} ])
[22:10:33] <Bartzy> grp = comments ?
[22:11:04] <crudson> so two $groups in a row. First will create count for each 'thing', second will create distribution of those counts.
[22:11:05] <Bartzy> what is $a ?
[22:11:29] <crudson> yeah, substitute, grp/a etc for your comments/photo_id
[22:11:34] <Bartzy> so $a is photo_id ?
[22:11:34] <Bartzy> cool
[22:11:45] <Bartzy> and with map reduce, is it 2 map reduce ?
[22:12:01] <Bartzy> one to count each photo's comment, and one to count the distribution (like you said) ?
[22:14:52] <Bartzy> crudson: ? :D
[22:15:11] <crudson> Bartzy: if you have too much data for aggregate then you could do it that way
[22:15:13] <Bartzy> it worked really well, and seems much much faster than the map reduce - is it because it's not in JS ?
[22:16:37] <Bartzy> crudson: If I want to ignore comments where there is no such field like "photo_id" ?
[22:17:48] <crudson> it doesn't have to allocate temp collections for example
[22:18:09] <Bartzy> ok
[22:18:24] <Bartzy> it just seems so much faster, maybe my map reduce was bad. Can aggregate write results to a collection ?
[22:18:43] <paulkon> for each photo in the photos collection return the number of comment docs and send that in an array to the client and render that data with http://www.jstat.org/
[22:18:49] <paulkon> if I understand you correctly
[22:19:01] <paulkon> unless you need that data on the server beforehand
[22:19:35] <crudson> Bartzy: to filter the input documents use $match in aggregation, or 'query' option in mapreduce
[22:20:05] <Bartzy> crudson: I meant write the result of the aggregation to a collection instead of seeing it on the shell
[22:20:09] <Bartzy> like map reduce can
[22:20:27] <crudson> Bartzy: without seeing the mapreduce code I couldn't comment on that
[22:20:34] <crudson> regarding performance
[22:21:19] <Bartzy> it's :
[22:21:37] <Bartzy> http://paste.laravel.com/yh1
[22:21:42] <crudson> if you want to save the output then you can do that when you get the result.
[22:22:57] <paulkon> are you doing this in node?
[22:23:02] <paulkon> client driver
[22:23:10] <Bartzy> paulkon: No, just in the shell
[22:23:13] <paulkon> ah ok
[22:23:31] <Bartzy> just testing stuff out, want to get around some of the questions we have about our data, and see how mongo can solve those
[22:23:43] <Bartzy> I don't really understand when to use aggregation and when to use map reduce
[22:24:02] <Bartzy> other than the fact that aggregation seems easier because of the piping
[22:25:16] <paulkon> aggregation is more efficient depending on what you're doing
[22:25:35] <crudson> it's a matter of data volume and complexity of calculation
[22:26:46] <Bartzy> ok
[22:26:56] <Bartzy> the more complex the more suitable map reduce becomes ?
[22:27:12] <Bartzy> or more accurately, the less suitable aggregation becomes ?
[22:28:17] <paulkon> other way around
[22:32:03] <Bartzy> paulkon: Why ?
[22:32:18] <Bartzy> paulkon: map reduce is code, you can do what ever you need
[23:00:21] <orngchkn> How can I disable the warning "warning: ClientCursor::yield can't unlock b/c of recursive lock ns"? It's flooding my logs (writing about a meg per second)…
[23:01:31] <orngchkn> As I understand it, findAndModify is the "culprit" and They say that it's an error that can be ignored. But how can I get mongo to stop printing it altogether so I don't have to babysit the log filesystem
[23:01:57] <orngchkn> I keep getting degraded mongo perf when the log filesystem hits 100% usage (even though I have logrotate running hourly…)
[23:05:45] <paulkon> https://groups.google.com/forum/#!topic/mongodb-user/s62QnfT8Vbc
[23:06:36] <paulkon> I guess that's expected behavior
[23:36:25] <orngchkn> paulkon: It says in that thread that a good index will usually help the issue. What's the best way to figure out if the indexes that I have in place are insufficient?
[23:36:51] <orngchkn> (I'm using Delayed::Job with Mongoid which ought to have pretty good indices)
[23:37:47] <orngchkn> Googling…
[23:38:57] <crudson> orngchkn: you can run with --notablescan if you want it to error on non-indexed queries
[23:39:35] <crudson> that's the strict way to ensure indexes are there, but that is not the same as them being "appropriate" or "optimal"
[23:50:34] <kurtis> Hey guys; I'm working with millions of Documents. During a user's "session", they typically only deal with a subset of these tuples. The largest I've seen so far is 13 million+. I'm running multiple queries on these sets of Documents during a user's session. Would it be smart to cache the ObjectID to all of these Documents using Redis (or something similar?) for quicker queries? Or, is there a better mechanism for caching the set of Documents for all of the