pmxbot IRC Log Viewer

[02:02:02] <donCams> hi. how do i create a query to find several documents with "where", sorting, and pagination using spring data mongorepository? i can't find any example

[07:36:28] <[AD]Turbo> ciao all

[08:32:36] <fatih> Hi everyone

[08:32:55] <fatih> I have document like: http://d.pr/i/p9uW

[08:33:11] <fatih> I can fetch the document and find the greates number in the bar field

[08:33:27] <fatih> however is there any way I can obtain it via mongo query?

[08:33:29] <Derick> uh, can you show actual (json) documents?

[08:33:50] <fatih> or it is better to make it on client side

[08:33:56] <fatih> Derick: no problem, wait a sec

[08:36:02] <fatih> https://gist.github.com/fatih/fe443b73ea4de3886b67

[08:37:14] <fatih> on the client side i'm fetching a single document (querying for username) and find the greatest number

[08:49:19] <Nodex> are you doing this in an aggregation?

[08:53:12] <fatih> Nodex: what do you mean in an aggregation ?

[08:56:39] <Nodex> are you trying to do this in an aggregation

[08:56:54] <Nodex> it really doesn't get any simpler than that when asking a question

[09:00:52] <fatih> Nodex: I don't use aggregation, I just simple use find() to geth the document

[09:01:11] <fatih> my question was if it is possible to get the greates number with a query

[09:01:20] <Nodex> and all you want is the largest number on a sub document?

[09:01:37] <fatih> I'm a newbiew therefore sorry if my questions doesn't mean you anything

[09:01:49] <fatih> yeah the largest number in the bar sub document

[09:02:04] <Nodex> without a map/reduce you cannot do that in a quesry

[09:02:06] <Nodex> query*

[09:02:16] <fatih> ok thanks Nodex

[09:02:25] <fatih> than the best way is to make that on the client side

[09:02:31] <Nodex> much more efficient to do it client side

[09:02:51] <Nodex> most languages have a way to get the largest key in an array

[09:10:23] <fatih> Nodex: I have already implemented this on client side, but I just wanted to ask again to learn if my approach was right or not

[09:10:27] <fatih> thanks agian

[09:10:34] <Nodex> no probs

[11:12:30] <neophy> I have a collection which has documents some thing like: http://pastebin.com/RsFK5CsT

[11:13:43] <neophy> how to aggregate total number of hostname count?

[11:13:46] <neophy> say

[11:14:03] <neophy> {"hostname" : "aaa", "total" : "5"} {"hostname" : "abc", "total" : "4"} {"hostname" : "bbb", "total" : "3"}

[11:19:34] <kali> neophy: you can use the aggregation framework ($group)

[11:19:39] <kali> neophy: or distinct

[11:19:56] <kali> neophy: or group()

[11:20:10] <kali> aggregation framework is recommended

[11:27:45] <neophy> kali: db.syslogevents.group({key: {'hostname': true},initial: {sum: 0},reduce: function(doc, prev) { prev.sum += 1}});

[11:28:02] <neophy> it will return the count for all days

[11:28:17] <neophy> but I want to get count for today only

[11:29:07] <neophy> how to alter the above query to group based on today timestamp and hostname?

[11:35:05] <aeggea> hello mongodb idiots

[11:36:17] <kali> neophy: add a "cond" to you group, but seriously, you should switch to the AF and get rid of this ugly javascript

[11:37:24] <Derick> aeggea: please keep it civil

[11:39:40] <unsleep> can i do a find inside a second level array?

[11:39:49] <Derick> yes

[11:40:26] <unsleep> i want to do a statistic but i think that put every visit inside asecond level is will be very bad thing. isnt it?

[11:41:47] <Derick> yes, as that makes the size of each document grow an grow and grow - and that means MongoDB will need to move it around on disk a lot

[11:41:58] <Derick> it's better to store each visit in its own document in a different collection

[11:42:24] <kali> and you're in risk of hitting the 16MB limit at some point

[11:43:06] <unsleep> so i was a bit right about it hihihihi

[11:43:32] <unsleep> i think i will slip by hour

[11:43:40] <unsleep> split

[11:44:56] <unsleep> i dont want to need to do a map reduce to process the info and show the stats so i must to think it very carefully

[11:45:26] <Derick> unsleep: instead of Map Reduce, you should most likely use the Aggregation Framework anyway

[11:46:00] <kali> unsleep: that will not completely solve the "growing document" issue

[11:47:20] <unsleep> sure, having a million visits an hour would be a big problem ¬_¬

[11:48:00] <Derick> unsleep: you can either do pre-aggregation (ie, just store the total, and not each visit) or really just have one document in a collection per visit and use A/F

[11:48:11] <Derick> i wouldn't use a mix

[11:49:02] <unsleep> i was thinking about that.. "preprocess" the info with php and destress the db process

[11:49:45] <unsleep> i think taht use the db like a "onthefly calculator" is a big mistake

[11:51:19] <unsleep> i could do two colections but that is very similar to do one and then a map reduce

[11:57:34] <unsleep> having in count tha 15mb limit i dont see other way to do it than store everything and then do an aggregation or map reduce

[12:02:18] <unsleep> or simply forget to store who done what when.... like you said

[12:17:57] <unsleep> do you recommend to use $inc?

[13:03:59] <dreamchaser> hi. i want to export all document created between saturday and sunday to an csv file. while i use the mongoexport, the query part always have some errors.

[13:04:27] <dreamchaser> mongoexport --db watersystemdatabase --collection nidatas -q '{date:{ "$gte":ISODate("2013-06-22T00:00:00.000Z"),"$lt": ISODate("2013-06-22T00:00:10.000Z")}}' --out dbdump.csv

[13:04:50] <dreamchaser> assertion: 16619 code FailedToParse: FailedToParse: Bad characters in value: offset:15

[13:05:12] <dreamchaser> while the code will execute correctly in db.nidatas.find

[13:05:17] <dreamchaser> need help...~~

[13:06:14] <dreamchaser> the code above just extract very little data, i try to fix the syntax for now..

[13:10:52] <kali> dreamchaser: you can't use this javascript syntax in the query argument of an export

[13:11:03] <kali> dreamchaser: you need to fallback to the strict json syntax

[13:11:28] <kali> dreamchaser: http://docs.mongodb.org/manual/reference/mongodb-extended-json/ one of these should work :)

[13:12:08] <kali> dreamchaser: (i think it's the "strict" mode)

[13:12:34] <kali> dreamchaser: and you need double quote arount the key names ("date")

[13:12:52] <dreamchaser> thank kali. looks like Date(epoche) works. another question, does mongoexport needs field to be specified?

[13:13:11] <dreamchaser> is there a way to export all field?

[13:13:14] <aeggea> no it does not, as you can read from the docs

[13:13:17] <kali> dreamchaser: in csv/tsv, i think it's needed, yes

[13:23:40] <dreamchaser> thanks kali

[13:38:49] <hotsnow> { "_id" : ObjectId("51c5a886b38ccf329cbb4258"), "net" : { "eth0" : { "ip" : 1 }, "eth1" : { "ip" : 2 }, "eth2" : { "ip" : 3 } } }

[13:38:50] <hotsnow> I have many documents like this. how can i get the doc where ip == 3 ?

[13:45:05] <kali> hotsnow: you can't. you should never use values as keys, only keywords

[13:45:29] <kali> hotsnow: you need to refacetor to { "_id" : ObjectId("51c5a886b38ccf329cbb4258"), "net" : [ { iface: "eth0" : "ip" : 1 }, ... ]

[13:50:39] <hotsnow> thanks kali

[13:52:11] <hotsnow> but i don't know the the difference between keys and keywords, as the word you said

[13:53:28] <aeggea> do you know anything at all?

[13:53:51] <hotsnow> I mean i didn't know why shouldn't use that defined.

[13:53:57] <kali> the "keys" are what is left of the ":" in the documents. the keywords are the words that means something to your application

[14:08:59] <pwned> hotsnow: because if you use an array, you can query it like find({"net.ip": 3})

[14:10:33] <pwned> hotsnow: and you can add to it like update({"_id": 5}, { "$push": { iface: "lol0", "ip": 42 } })

[14:11:41] <hotsnow> thanks owned, I will try to change my doc defined

[14:11:45] <pwned> $push: "net" I mean

[14:11:55] <hotsnow> thanks pwned

[14:12:50] <pwned> in other words arrays are mutable and searchable

[14:15:34] <kali> ha ho ! there was a followup

[14:15:40] <kali> hotsnow: sorry, missed it :)

[14:16:13] <hotsnow> i defined the data structure like perl hash, so i make a mistake

[14:18:31] <hotsnow> It doesn't matter, thanks everybody

[14:18:47] <bobinator60> if I have an array inside my document, can I have two $elemMatch clauses to have them match two different elements of the array?eg values: $elemMatch{type:distance, value:50}, values: $elemMatch{type:color, value: blue}

[14:19:04] <Nodex> yes

[14:19:18] <bobinator60> Nodex: yes, to me?

[14:19:21] <Nodex> yes

[14:19:34] <Nodex> I dont see any other question being asked ;)

[14:19:49] <bobinator60> Nodex: i just dropped in. there could have been 7 pending questions

[14:20:31] <bobinator60> anyway, thanks, and i thought so. mongoengine is the culprit

[14:20:39] <starfly> Not so much, bobinator60, although there was pending idiot who's left

[14:21:02] <bobinator60> this completed idiot is leaving, too. thanks!

[14:21:09] <Nodex> I don't have joins visible sorry

[14:21:17] <starfly> not you, bobinator60!

[14:39:32] <vandnas> test

[14:55:15] <bobinator60> has anyone ever seen this error: Error: unrecognized flag [-] 45

[14:55:41] <bobinator60> its coming from the mongolab web query, but I don't know if its a mongolab error or a mongodb

[14:56:13] <kali> bobinator60: mongodb error usually have a five figures error code

[15:01:52] <bobinator60> grrr

[15:03:02] <bobinator60> that makes two pieces of software between me and mongodb that are unreliabile today

[18:46:06] <n06> Does anyone know how indexes are distributed across a cluster? does each shard have to hold the full index?

[18:55:10] <kali> n06: its shard have a partial index for the values it owns

[18:55:29] <n06> kali thanks so much

[19:31:10] <saml> update["foo."+userInput] = someData; db.collection.update(someQuery, update); is this safe?

[19:31:42] <saml> I wantt o update foo.somesubfield of document

[19:32:25] <n06> if i remember correctly mongo can execute arbitrary javascript, so no that is not safe

[19:32:30] <saml> uncaught exception: can't have . in field names [crop.default]

[19:32:33] <n06> you have to sanitize the user input

[19:32:41] <saml> db.Images.update({_id: "1.jpg"}, {"crop.default": {x:1}})

[19:32:49] <saml> i guess i can't update like that

[19:33:07] <n06> you have a big XSS vuln that way

[19:33:09] <n06> so be careful

[19:33:39] <crudson> saml: use $set to just update one attribute or else you will just replace the document

[19:34:10] <saml> yup just found that out

[19:34:13] <saml> haha my doc is gone

[19:37:32] <crudson> re. n06's comments: as long as you are not passing input directly to $where, db.eval() or map reduce function strings you can't just pass user inputted javascript into a query like that.

[19:38:11] <n06> crudson, can or can't? I can't tell if you are agreeing with me or not

[19:38:28] <n06> do only $where and db.eval() execute code?

[19:38:52] <crudson> I am generally disagreeing :) http://docs.mongodb.org/manual/faq/developers/#how-does-mongodb-address-sql-or-query-injection

[19:39:42] <n06> crudson, interesting, thanks for the heads up

[19:39:56] <n06> i would still caution not sanitizing user input. Never trust your users :)

[19:40:13] <n06> *caution against

[19:41:57] <crudson> n06: no probs. There are situations where you have to be careful, but it's a different deal than with SQL strings. A string containing javascript doesn't magically become query json unless using eval/where, as detailed above.

[19:42:56] <kali> using $where or eval() (or even map reduce) is a bad idea anyway

[19:43:14] <kali> it should be the exception

[19:43:35] <kali> (and that the effort to prevent injection becomes minimal)

[19:44:09] <n06> kali, mapreduce is a super useful function, but i have only used it an a user-isolated case.. its never anywhere near an endpoint

[19:45:15] <n06> crudson, absolutely, i understand now. My point from before was simply that keeping good security practices is always a good idea. Never trust anything or anyone but yourself haah

[19:46:28] <crudson> n06: very true

[19:48:15] <kali> n06: map reduce is not so usefull since the aggregation framework has been introduced :)

[19:53:05] <crudson> kali: eval is not so bad as before with 4.2, with less strict locks and multi-threading. Map reduce is very useful. Try doing a .aggregate on a big collection, or anything more complex than the exposed functions provide.

[21:53:31] <Bartzy> Yo :D

[21:54:52] <Bartzy> I have a comments collection and a photos collection. Each commend has a "photo_id". I want to do a map reduce that gets the number of comments for each photo, but something like gaussian function

[21:55:21] <Bartzy> so essentially I want to see what is the count of comments for -most- of the photos.

[21:55:23] <Bartzy> How can I do that ?

[22:03:33] <paulkon> Bartzy: so a mean of the number of comment docs corresponding to all photos combined?

[22:03:45] <paulkon> still not sure what you want to do

[22:04:05] <Bartzy> paulkon: I want to see how many photos have 1 comment, how many have 2 comments, 3, 4.. up to something normalized

[22:04:08] <Bartzy> like

[22:04:20] <Bartzy> http://en.wikipedia.org/wiki/Gaussian_function

[22:05:39] <paulkon> and you want to render out a guassian distribution graph of the comment data

[22:05:39] <paulkon> ?

[22:05:46] <Bartzy> eventually yeah

[22:06:27] <Bartzy> Have no idea how to normalize the data though. I now got that I need to get data that looks like: 1 comment -> 20 photos, 2 comments -> 35 photos, 3 comments -> 400 photos

[22:06:28] <Bartzy> etc

[22:06:35] <Bartzy> but have no idea how to normalize it to gaussian

[22:09:51] <Bartzy> paulkon: ?

[22:10:17] <crudson> Bartzy: do something like: db.grp.aggregate([ {$group: {_id:'$a', count:{$sum:1}}}, {$group: {_id:'$count', total:{$sum:1}}} ])

[22:10:33] <Bartzy> grp = comments ?

[22:11:04] <crudson> so two $groups in a row. First will create count for each 'thing', second will create distribution of those counts.

[22:11:05] <Bartzy> what is $a ?

[22:11:29] <crudson> yeah, substitute, grp/a etc for your comments/photo_id

[22:11:34] <Bartzy> so $a is photo_id ?

[22:11:34] <Bartzy> cool

[22:11:45] <Bartzy> and with map reduce, is it 2 map reduce ?

[22:12:01] <Bartzy> one to count each photo's comment, and one to count the distribution (like you said) ?

[22:14:52] <Bartzy> crudson: ? :D

[22:15:11] <crudson> Bartzy: if you have too much data for aggregate then you could do it that way

[22:15:13] <Bartzy> it worked really well, and seems much much faster than the map reduce - is it because it's not in JS ?

[22:16:37] <Bartzy> crudson: If I want to ignore comments where there is no such field like "photo_id" ?

[22:17:48] <crudson> it doesn't have to allocate temp collections for example

[22:18:09] <Bartzy> ok

[22:18:24] <Bartzy> it just seems so much faster, maybe my map reduce was bad. Can aggregate write results to a collection ?

[22:18:43] <paulkon> for each photo in the photos collection return the number of comment docs and send that in an array to the client and render that data with http://www.jstat.org/

[22:18:49] <paulkon> if I understand you correctly

[22:19:01] <paulkon> unless you need that data on the server beforehand

[22:19:35] <crudson> Bartzy: to filter the input documents use $match in aggregation, or 'query' option in mapreduce

[22:20:05] <Bartzy> crudson: I meant write the result of the aggregation to a collection instead of seeing it on the shell

[22:20:09] <Bartzy> like map reduce can

[22:20:27] <crudson> Bartzy: without seeing the mapreduce code I couldn't comment on that

[22:20:34] <crudson> regarding performance

[22:21:19] <Bartzy> it's :

[22:21:37] <Bartzy> http://paste.laravel.com/yh1

[22:21:42] <crudson> if you want to save the output then you can do that when you get the result.

[22:22:57] <paulkon> are you doing this in node?

[22:23:02] <paulkon> client driver

[22:23:10] <Bartzy> paulkon: No, just in the shell

[22:23:13] <paulkon> ah ok

[22:23:31] <Bartzy> just testing stuff out, want to get around some of the questions we have about our data, and see how mongo can solve those

[22:23:43] <Bartzy> I don't really understand when to use aggregation and when to use map reduce

[22:24:02] <Bartzy> other than the fact that aggregation seems easier because of the piping

[22:25:16] <paulkon> aggregation is more efficient depending on what you're doing

[22:25:35] <crudson> it's a matter of data volume and complexity of calculation

[22:26:46] <Bartzy> ok

[22:26:56] <Bartzy> the more complex the more suitable map reduce becomes ?

[22:27:12] <Bartzy> or more accurately, the less suitable aggregation becomes ?

[22:28:17] <paulkon> other way around

[22:32:03] <Bartzy> paulkon: Why ?

[22:32:18] <Bartzy> paulkon: map reduce is code, you can do what ever you need

[23:00:21] <orngchkn> How can I disable the warning "warning: ClientCursor::yield can't unlock b/c of recursive lock ns"? It's flooding my logs (writing about a meg per second)…

[23:01:31] <orngchkn> As I understand it, findAndModify is the "culprit" and They say that it's an error that can be ignored. But how can I get mongo to stop printing it altogether so I don't have to babysit the log filesystem

[23:01:57] <orngchkn> I keep getting degraded mongo perf when the log filesystem hits 100% usage (even though I have logrotate running hourly…)

[23:05:45] <paulkon> https://groups.google.com/forum/#!topic/mongodb-user/s62QnfT8Vbc

[23:06:36] <paulkon> I guess that's expected behavior

[23:36:25] <orngchkn> paulkon: It says in that thread that a good index will usually help the issue. What's the best way to figure out if the indexes that I have in place are insufficient?

[23:36:51] <orngchkn> (I'm using Delayed::Job with Mongoid which ought to have pretty good indices)

[23:37:47] <orngchkn> Googling…

[23:38:57] <crudson> orngchkn: you can run with --notablescan if you want it to error on non-indexed queries

[23:39:35] <crudson> that's the strict way to ensure indexes are there, but that is not the same as them being "appropriate" or "optimal"

[23:50:34] <kurtis> Hey guys; I'm working with millions of Documents. During a user's "session", they typically only deal with a subset of these tuples. The largest I've seen so far is 13 million+. I'm running multiple queries on these sets of Documents during a user's session. Would it be smart to cache the ObjectID to all of these Documents using Redis (or something similar?) for quicker queries? Or, is there a better mechanism for caching the set of Documents for all of the

Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 24th of June, 2013