PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Sunday the 21st of June, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[04:07:08] <redlegion> Is there much benefit to using bson for storage and delivery of files?
[07:54:15] <newtc> Hello, I'm having trouble with a query and I was wondering if anyone can be of assistance
[07:54:23] <newtc> It's similar to this: http://stackoverflow.com/questions/29368141/mongodb-query-latest-record-by-date-for-each-item
[07:55:56] <newtc> I have various records, of which I look for only some (let's say all of those in which foo=bar). Of those, I want the ones with the latest record, grouped by hash
[07:56:42] <newtc> I tried something like this but it evidently did not work: http://pastebin.com/87MkdNWV
[07:57:06] <styles> sure newtc
[07:57:52] <styles> So what's not working about your query, it's not grouping by the latest record?
[07:59:46] <newtc> Yeah
[08:00:00] <newtc> I'm getting a bunch of records with the same hash, while I should get only one of each
[08:02:33] <newtc> I'm really new to mongo so I'm probably misunderstanding some stuff
[08:11:48] <newtc> this seems to work: http://pastebin.com/f0Fb1XRv
[08:11:56] <newtc> but now I'm not getting the record's ID
[08:17:01] <styles> newtc, sorry you didn't respond fast enough, I ADDed out
[08:17:03] <styles> 1 sec
[08:17:14] <styles> currently importing like 5m records into mongo lol
[08:17:40] <styles> newtc, well if you're grouping the hash it can't know what the id is
[08:18:30] <styles> can you give a few example records and an example output you'd like
[08:48:00] <newtc> styles: For example, from the collection [{"_id": 1, "hash": "foo", date: today}, {"_id": 2, "hash": "foo", date: yesterday}, "_id": 3, "hash": "bar", date: yesterday}]
[08:48:24] <styles> And you want the latest hash by date?
[08:48:35] <newtc> I would like to recieve: [{"_id": 1, "hash": "foo", date: today}, {"_id": 3, "hash": "bar", date: yesterday}]
[08:48:38] <newtc> Yes
[08:48:47] <styles> Is that the data format?
[08:48:50] <styles> Also*
[08:50:22] <newtc> There's some one extra key/val I would like to pass
[08:50:29] <newtc> But otherwise, yes (and some other fields I don't care about)
[08:54:56] <newtc> I figured there ought to be a way to receive the ID, since I'm not really aggregating anything but taking a single record. Maybe $group isn't the way?
[08:55:36] <styles> checking
[08:55:53] <newtc> Thanks
[08:56:03] <styles> Well you can just group by hash then sort by time
[08:56:33] <newtc> Isn't that what I'm doing atm?
[08:56:47] <newtc> here http://pastebin.com/f0Fb1XRv
[08:56:48] <styles> o wait blah
[08:56:51] <styles> you don't want to group
[08:57:06] <styles> what you want is a filter on the hash sort by time
[08:57:11] <newtc> not in the aggregational sense
[08:57:16] <styles> right you want find
[08:57:27] <newtc> Yes, but it needs to be the latest for each hash
[08:57:31] <styles> find({ "hash": "hash" }).sort("time")
[08:57:36] <styles> ah
[08:57:42] <newtc> But I don't know the hash, I want all of them
[08:57:55] <newtc> Is there no way to do this whilst retaining the ID?
[08:58:11] <newtc> Something akin to SQL's GROUP BY
[08:58:20] <styles> What do you have?
[08:58:29] <styles> Like what's your input here.. if it's not the hash
[08:58:38] <styles> Do you just want the latest hash and time unique?
[08:58:41] <newtc> A different field in the records
[08:58:43] <styles> show me the sql you'd have
[08:59:17] <styles> http://docs.mongodb.org/manual/reference/operator/aggregation/first/
[08:59:21] <styles> sounds like what you want
[08:59:50] <newtc> but it redefines the ID
[09:00:03] <styles> You mean the ID is wrong?
[09:00:16] <newtc> The "item" field becomes the ID in that example
[09:00:29] <styles> right, your id is _id though? right?
[09:00:50] <styles> So you'd say _id in that case .. or $_id (i forget)
[09:00:58] <styles> I think it' $_id
[09:02:48] <newtc> If I do that it returns all of the records
[09:02:50] <newtc> Not just the latest
[09:04:17] <newtc> I guess that includes the ID in the grouping and thus the aren't grouped together
[09:04:22] <newtc> Maybe $project?
[09:04:23] <styles> right
[09:04:33] <styles> 1 sec there's a way to output it in a new format
[09:04:36] <styles> you probably want to do that
[09:04:44] <styles> sorry trying to fix my project lol
[09:04:58] <newtc> It's cool, thanks for the assistance :)
[09:05:28] <styles> Returns a value from the first document for each group. Order is only defined if the documents are in a defined order.
[09:05:37] <styles> So it depends how you group now
[09:05:40] <styles> If you ONLY group on hash
[09:05:42] <styles> that should be right
[09:05:49] <styles> correct?
[09:08:39] <newtc> Correct
[09:08:45] <newtc> Only if I add ID does it get messed up
[09:09:27] <styles> let me check agg for another pipeline 1 sec
[09:09:36] <styles> http://docs.mongodb.org/manual/reference/operator/aggregation/out/#pipe._S_out
[09:09:52] <styles> wrong thing
[09:10:37] <styles> I wonder if map/reduce would be easier
[09:10:40] <styles> naw
[09:10:52] <styles> i forget agg is like 100x faster lol
[09:11:25] <styles> http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
[09:11:27] <styles> that may help you too
[09:13:24] <newtc> lesse
[09:18:46] <newtc> Yeah seems like $group isn't the way
[09:26:59] <styles> yeah
[09:27:03] <styles> Group loses the ID
[09:29:31] <newtc> So what other option do I have? Query everything and sort in client?
[13:02:32] <PedroDiogo> hello everybody
[13:03:12] <PedroDiogo> my app will go into production mode in a week, and I haven't figured out yet how should I deploy my database
[13:04:23] <PedroDiogo> what do you think of using a replica set of 3 digital ocean vps for 10$ each ?
[13:05:05] <PedroDiogo> data wise, I know 30GB will be enough, but I dont know about the speed and the handling of the concurrent connections
[13:06:04] <PedroDiogo> also, in a replica set, should all the servers be equal when it comes to performance, or is the primary one the one who needs more processing power/ram ?
[13:06:10] <PedroDiogo> sorry for all the questions.....
[13:06:17] <PedroDiogo> im still a noob
[13:07:55] <JamesHarrison> they should broadly speaking be homogenous
[13:08:16] <JamesHarrison> as at any moment the master may fail and now one of the other boxes is master
[13:08:35] <JamesHarrison> digital ocean VPSes have pretty awful performance and might not be the best option
[13:08:53] <JamesHarrison> you'll almost certainly get more performance out of spending $30 on a decent VPS or more on a dedi
[13:13:39] <cqdev> Is there an effective way in Mongo to do multiple sorts with a spatial index based query?
[13:14:38] <cqdev> I'm running into an issue where I need to pull back documents based on geo location and then sort them by another field. The 16MB limit on BSON seems to be killing an potential to solve this problem.
[13:17:33] <cqdev> Essentially I'm trying to pull all records in a 250 mile radius around a point, then sort them by distance/another field
[13:22:41] <bgardner> Pull the records and sort in the application code?
[13:23:36] <cqdev> bgardner: That's too inefficient, the result times would be astronmically high. I've tried a few things but the BSON document limit is just too small to be useful.
[13:24:10] <cqdev> The $near and $nearSphere operations really should return a cursor, it's practically useless if you operate on 1+ million records
[13:24:41] <bgardner> Sounds like your particular use-case may not be a good fit for mongo.
[13:26:55] <cqdev> bgardner: I'm thinking so as well, I've played with OrientDB as well, I really wanted to go with a schema-less design but there doesn't appear to be a noSql database that can handle this problem.
[13:52:58] <PedroDiogo> JamesHarrison: thanks for the feedback! I've just tried to look up for some benchmarks and digital ocean looked like a nice deal
[13:53:07] <PedroDiogo> what do you recommend then?
[14:38:27] <JamesHarrison> PedroDiogo: I use Memset for the few VPSes I need, most of my things require dedicated hardware or large cloud instances (mostly on AWS)
[14:53:06] <PedroDiogo> hm, ok, thanks. so, no recommendation on fast solutions for small projects ? :)
[18:49:19] <Constg> Hello, I have a problem for several days now... On one server, I have a query which blocks completely the queue, CPU rise up to 1600%, I've tried db.currentOp({"active" : true, "numYields" : 0, "waitingForLock" : false } ) but result is empty. Do you have any idea how to find what is blocking???
[19:22:14] <Constg> Do you know how I could find which query blocks the queue?
[20:07:11] <kba> Constg: you could log all queries, then check the log after it freezes?
[20:08:11] <Constg> kba, what's better way to log all queries?
[20:09:42] <kba> Constg: that depends on your application
[20:10:16] <Constg> ha but I thought already doing it app side, but the app continue to send queries...
[20:10:21] <Constg> the server block them
[20:10:36] <kba> so they fail?
[20:10:51] <kba> If so, only log successful queries
[20:10:55] <Constg> If I can't find till tomorrow, I'll log on app side, and log all answers, so check where I stop receiving answers
[20:11:27] <Constg> before that, I did a db.setProfilingLevel(1, 500)
[20:11:37] <Constg> and now I'm waiting for the problem to come back
[20:34:42] <f31n> is there a way to search not with a string but with an array containing strings to match _id's? means i've got a numarray with id's and i wanna select only the documents i have in the array
[20:41:52] <f31n> rtfm - $in :) thanks :)