pmxbot IRC Log Viewer

[02:49:50] <terabyte> hey

[02:50:24] <terabyte> I ran out of inodes. I've now freed some up. running db.collection.count() still returns "can't take a write lock while out of disk space" I'm about 2gb away from being actually out of disk space. Is there something I've missed?

[03:40:53] <terabyte> how much memory should I have when indexing 20 million documents. each document is relatively small (as printed in json it's about 10 fields, each field is typically an integer or a 1 or 2 english word string).

[08:11:09] <jzp113> hi guys how to set the time range ? the layout like that ISODate("2015-09-14T13:30:21.340Z")

[12:32:34] <Kosch> hey folks. I want to drop all users from admin database using mongo --eval "JSON.stringify({dropAllUsersFromDatabase:1})" --quiet admin. But the result on stdt out is "{dropAllUsersFromDatabase:1}". I use mongo 3.0.6 with disabled auth (i just want to clean up users). Do I missed there something? I'd expect {"ok": 0} or smth.

[12:37:18] <Kosch> ok, got it. must be JSON.stringify(db.runCommand({dropAllUsersFromDatabase:1}))

[13:19:20] <symbol> I'm implementing pagination by using the objectID timestamp instead of skip() limit(). I'd like to send back a "more" boolean to my views if there's more docs left and if not, hide the button. I'm using the native driver for Node.js and got the query down but can't seem to get the "more" part.

[13:20:04] <symbol> Figured I'd have a cursor and call hasNext() on it but since I'm limiting my cursor it never returns true.

[13:20:13] <symbol> I suppose I could limit application side.

[13:53:28] <symbol> Eh, I've just decided to return one extra document and if its present I'll make the boolean true. Simple.

[14:42:11] <steeze> how would i go about using mongo's aggregation pipeline with $group to group my records by a property that isnt on the collection?

[14:42:34] <steeze> like, i have an array of intervals that have a start/end date. i want to group by these intervals

[14:42:52] <steeze> like where record's end date falls between the intervals begin/end

[15:22:58] <bartj3> I'm trying to create a collection with a TTL collection but it never seems to clean up, i've posted a minimal log to gist, could someone check and see what i'm messing up? https://gist.github.com/bartj3/505dd5d57addd7ca1e9c it's mongodb version 3.0.6 btw

[15:25:24] <cheeser> you have /12

[15:25:27] <cheeser> grrr

[15:27:02] <cheeser> bartj3: ok. what you have there is a directive to expire those documents at a specific clock time not after a duration

[15:27:57] <bartj3> true, but that's what i want in this case, so i can specify certain logs to expire in 30 days, others in 1 or 2 days

[15:28:24] <bartj3> but shouldn't the expireAt with "new Date()" or the one with the 2014 date expire right away? (or when the cleaner passes them?)

[15:28:37] <cheeser> you'd need two different indexes on two different fields.

[15:29:02] <cheeser> now, in 3.2 with partial indexes you might be able to differentiate the indexes on the same field.

[15:29:57] <bartj3> i don't think i get you, the difference between clock time and duration is just whether i specify a "expireAfterSeconds" with 0 or for example 3600 correct? the field content, a date, would be the same in both cases right?

[15:30:49] <deathanchor> bartj3 is right http://docs.mongodb.org/manual/core/index-ttl/#expiration-of-data

[15:31:01] <cheeser> "clock time" means expire at this time of day. "duration" means expire after /n/ seconds.

[15:31:53] <bartj3> yeah but technically the "clock time" is just a field with a date in the future with a duration of 0 correct?

[15:31:58] <cheeser> http://docs.mongodb.org/manual/tutorial/expire-data/#expire-documents-at-a-specific-clock-time

[15:32:15] <cheeser> bartj3: come again?

[15:33:24] <deathanchor> bartj3: can you tail your logs when you do the test?

[15:33:35] <deathanchor> it might clue in what is going on

[15:37:24] <bartj3> hms, started a new mongoDB, so i could generate a clean set of logs, and ofc it works now :s

[15:38:31] <bartj3> started it without the default homebrew config, i'll try once more with that config and see if that messes it up

[15:42:58] <bartj3> ok... no clue, sorry for wasting your time

[15:43:09] <bartj3> not sure if restarting did the trick or what happened there

[15:43:14] <bartj3> but works as expected now

[16:00:43] <mylord> When I do a[2] = x, mongo will insert null,null for array indexes 0 and 1. How to solve/remove this?

[16:09:07] <cheeser> what would it look like to resolve that?

[16:20:05] <symbol> If I have a colleciton that has two types of data stored in it...can I use find to retrieve 5 of each type or will that require aggregate?

[16:21:46] <mylord> cheeser: I don’t want users:[null, null, {uid:2} ], just users:[ { uid: 2} ]

[16:38:11] <symbol> Meh, I don't think that's even possible with aggregate.

[17:12:10] <ngl> Hello.

[17:12:34] <ngl> I've a JAVA question todo with MongoDB Java Driver.

[17:13:49] <ngl> It has been a loooong time since I was up in the Java. In JS, however, I would do this: db[collectionName].remove({});

[17:15:03] <ngl> ...where collection is a string key.

[17:15:09] <ngl> collectionName that is.

[17:19:24] <ngl> mongoConnect.db.getCollection(collection).remove({}); ?

[17:22:07] <cheeser> what happened when you tried?

[17:46:53] <ngl> Well, that took awhile because this is nested in a junit TestSuite @BeforeClass setup...

[17:51:58] <ngl> This was the answer: db.getCollection("restaurants").deleteMany(new Document());

[18:02:46] <ngl> Or not... BasicDBObject empty = new BasicDBObject(); - and pass that to remove.

[18:07:36] <jaequery> hi guys, anyone know if theres a way to replicate from mysql to mongodb?

[18:08:16] <ngl> ?

[18:08:32] <ngl> You can migrate. What do you mean replicate?

[18:10:03] <ngl> "cannot delete from system namespace"

[18:10:22] <ngl> WriteConcernException

[18:10:24] <ngl> :\

[18:10:33] <SDr> jaequery, you can write a diff script, and periodically push data there. Be advised, though, that 1, this is a very fragile solution; and 2, mixing different DB systems is a very not supported configuration

[18:11:06] <SDr> jaequery, what line of thought lead you to ask this question?

[18:11:33] <jaequery> for my reporting server, i want it based on mongo

[18:11:40] <jaequery> all my writes goes to mysql atm

[18:14:01] <GothAlice> https://github.com/mongodb-labs/mongo-connector would be the "correct" solution. Not sure if it currently has a MySQL adapter.

[18:14:44] <GothAlice> Ah, despite some documentation indicating MySQL ability (from the blog) and bidirectional use, the README only covers outbound from mongodb. :/

[18:20:20] <GothAlice> jaequery: What specifically can't you do with your data in-place that MongoDB and duplicating your data solves? AFIK there's basically no efficient way to watch MySQL data "live", thus no possible efficient way to clone the data for reporting in another system.

[18:20:42] <GothAlice> (http://grimoire.ca/mysql/choose-something-else)

[19:10:07] <cerebral_monkey> How can I get rid of the default "admin"/"pass"?

[19:10:28] <GothAlice> There is no default admin password. :|

[19:10:53] <cerebral_monkey> Hrm... I can log in with "admin"/"pass"

[19:11:06] <GothAlice> cerebral_monkey: http://docs.mongodb.org/manual/tutorial/enable-authentication/#create-the-system-user-administrator < the admin user is created by the admin as part of enabling authentication.

[19:12:00] <GothAlice> It's probably a bad idea to remove the admin user (if you have none, you're not going to have much in the way of control over your own cluster) but you can certainly change the password: http://docs.mongodb.org/manual/tutorial/change-own-password-and-custom-data/

[19:12:02] <cerebral_monkey> GothAlice: Ahh, thank you. I see my confusion now..

[19:12:37] <cerebral_monkey> I got confused and didn't realize this package I was using uses a username/pass combo from a config file, not mongodb itself

[19:12:41] <steeze> with mapReduce, how can i access the records values in the reduce function? in the map function, i can do this.property, but it doesnt seem to be the same in reduce

[19:23:07] <GothAlice> steeze: https://gist.github.com/amcgregor/1623352 may be helpful as an example.

[19:23:49] <GothAlice> Where map is run "over" each candidate record, and may emit zero or more values for reduction, reduce takes the values to reduce as an argument instead. (Important note: the output of the reduce function should match the input to the reduce function, or it can't be iteratively reduced.)

[19:24:09] <steeze> ive got mine working, just running into issues when it recursively does it

[19:24:19] <GothAlice> Likely because of the latter point.

[19:24:22] <steeze> my reduce doesnt return the same structure as my emits values

[19:24:31] <GothAlice> Which is a killer, as you've noticed. ;)

[19:24:33] <steeze> but i build my emits value in a way that allows me to get information from the record

[19:24:56] <steeze> http://pastebin.com/ns69aHiK

[19:25:02] <steeze> there's my mapReduce

[19:25:44] <steeze> in my reduce, im counting some stuff based on some conditions, but i need the records timestamp and user to do my filtering. and doing this.timestamp/this.user like in the map, doesnt work

[19:25:59] <GothAlice> Then you are emitting the wrong values.

[19:27:09] <GothAlice> Your filtering should happen in the query, where possible, then in the map. Not in the reduce.

[19:27:47] <steeze> the way my documents are structured, i dont think it's possible

[19:27:51] <steeze> that's why im using mapReduce in the first place

[19:27:58] <GothAlice> Unfortunately due to time constraints I can't do more than provide a working known-good example and an outline of the procedure, as above. (The output of reduce matching the input of reduce is the requirement that is currently not met by your code.)

[19:28:26] <steeze> i know why my code is wrong, but to fix it i can either do emit(interval, 1) but then i dont have timestamp and user available in reduce

[19:28:52] <GothAlice> So, don't put the entire set of conditionals in reduce…

[19:28:57] <steeze> or is it possible to add a new property to that object? and ill increment in my reduce and return the properly formatted obj

[19:29:16] <steeze> i guess i could use finalize to munge that data before my respones?

[19:29:41] <steeze> ill need to delete some properties before responding to the client

[19:29:56] <GothAlice> There are no particular limitations placed on the values you are emitting in your map for feeding into the reduce, nor on the object resulting from the reduce, other than it be an object. What object it is, and what attributes it has, is entirely up to you.

[19:30:13] <steeze> okay, ill play around with that

[19:31:15] <GothAlice> As you can see from my example, I'm turning individual record dates into a per-month breakdown. Because the input to reduce needs to match the output to reduce, I have the map stage emit the final record structure from the get-go. (A mapping of months to counts.) On individual records, obv. the count will only ever be one. Reduce then merges the records and tallies them up.

[19:32:52] <steeze> i think using the key as your objects first value is what ill need to do

[19:33:00] <steeze> building out my objs differently anyway

[19:34:13] <GothAlice> On the other hand, your condition on line 13 seems to be duplicating the query on line 27/28 (not truly duplicating because of the unwinding going on) and in general looks like it would be far, far, far easier to do as an aggregate, not as a map/reduce.

[19:34:21] <GothAlice> (Same with my example; it'd be easier to do as an aggregate.)

[19:35:12] <steeze> i tried aggregate but i dont know how to group on a variable that's not part of my collection

[19:35:20] <steeze> let's take a step back

[19:35:29] <steeze> the query is filtering based on a large time period

[19:35:33] <GothAlice> In your case, you have a $match on your timestamp criteria, $unwind on intervals, $match on intervals, then $group on 1 (to collect all) or a time slice (to group by individual ranges) using $addToSet (which only stores unique values) to gather the unique users.

[19:35:40] <steeze> the reduce filters on intervals that make up the larger timeperiod

[19:35:57] <steeze> my intervals are calculated on the server and are not part of a collection

[19:37:49] <GothAlice> https://gist.github.com/amcgregor/1ca13e5a74b2ac318017#file-eek-py-L9-L22 is an example aggregate that takes per-hour (or any interval) time series data and collects general count statistics broken down by day of the week.

[19:38:29] <GothAlice> You can see how I'm grabbing the data I care about ($project), then $group'ing on an extracted value of the "hour" field. The internal intervals matter less than the intervals you want to produce as output.

[19:39:11] <GothAlice> (Optionally using a $match stage if criteria were provided to the function.)

[19:40:12] <steeze> hmm

[19:40:20] <GothAlice> (The behind-the-scenes intervals for pre-aggregated data storage just saves on the individual number of records that need processing to answer any given question. Per-hour in my dataset means I'm querying at most 24 records per day to produce reports.)

[19:40:58] <GothAlice> See also http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework which dives into some of the querying patterns and implications for storage space, index efficiency, and performance, for this type of data.

[19:41:04] <steeze> not sure if mongo's date stuff gives me what i need to group by my intervals

[19:41:36] <steeze> this is getting closer though. ill look into aggregation more

[19:41:39] <GothAlice> If you're willing to have a second of inaccuracy, and timezone unawareness (I hope all your stored dates are in the same TZ…), convert to UNIX timestamp. From that point, it's just math.

[19:42:35] <steeze> they shouldnt be time zone aware, it's for building out a chart of active users in the last X hours

[19:43:14] <GothAlice> timestamp(record.value) % timestamp(hours=48) — group per-hour for a given 48 hour period. (P-code, but the idea is there. For grouping UNIX timestamp time, modulo is your friend.)

[19:44:22] <GothAlice> Interestingly, I track active users very differently.

[19:44:37] <steeze> what do you do?

[19:45:01] <steeze> ive been through like 4 iterations with this module and ive tried to pick the best way but im still pretty fresh

[19:45:46] <steeze> my boss told me to record all api calls so ive got records that look like {timestamp: <time of api call>, user: {usr obj}, method_route: <method and route>}

[19:45:56] <steeze> and im trying to filter these out to build data for a chart that shows active users/time

[19:46:15] <GothAlice> Every login attempt saves a document to db.attempt indicating source, account, and success/failure. These records have a TTL index which automatically deletes data older than 90 days. When a success record is inserted, the pre-aggregated statistics are updated (using an upsert) to increment the active user count for the current hour. (One record per hour, also with a TTL index to nuke things that are older than 30 days.)

[19:46:18] <steeze> i have some other queries that group by user and push timestamps and methods to get a look into their workflow

[19:46:49] <GothAlice> Getting the active user count is then db.statistics.findOne({'_id': 'active_user_count'}, {'value': 1}).value

[19:46:51] <GothAlice> Instant query.

[19:47:16] <GothAlice> (Roughly. Ugh… typing too fast for my p-code to be good code. ;)

[19:47:55] <steeze> active for us is defined by doing something in the last ten minutes

[19:48:09] <steeze> but the idea is similar still, let me disect all that

[19:49:14] <GothAlice> I'm only tracking general counts, but if you wanted to actually identify which users were active in any given hour, an $addToSet in the upsert would work quite well.

[19:49:46] <steeze> and that's just a unique $push basically?

[19:49:50] <GothAlice> (Generally, pre-aggregate into slices as close to your presentation slice size as possible. I.e. the step value of the time axis on your chart.

[19:49:54] <GothAlice> Exactly.

[19:50:13] <steeze> what do you mean by pre aggregate?

[19:50:22] <GothAlice> Have a gander at my last example again.

[19:50:31] <GothAlice> (The eek.py one. ;)

[19:52:26] <GothAlice> Storing a "hit" requires space for all of the various bits of information about the hit: source, destination, browser user agent, other headers, etc. The file at the bottom of the gist (sample.py) shows what that data pre-aggregated looks like. It's an extraction of the meaning of those values (i.e. which browser, OS, etc. is in use) for a given time period and unique combination of selection criteria (notably: company, job, and source

[19:52:26] <GothAlice> are the "unique key" here.)

[19:52:55] <GothAlice> It's done to speed up reporting by reducing the amount of data that needs to be processed to produce the charts.

[19:52:56] <GothAlice> http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework provides an excellent overview of the process.

[19:53:25] <steeze> ahhh okay i think i get it

[19:53:31] <GothAlice> Pre-aggregation is like pre-mapping your map/reduce.

[19:53:31] <GothAlice> :)

[19:53:48] <steeze> ahh who knew id be getting so deep into mongo today :)

[19:54:06] <GothAlice> (And upserts let you keep pre-aggregated data up-to-date with fire-and-forget queries. I.e. no need to check if a time period exists to insert it first.)

[19:54:08] <steeze> ive been in angular the last 3 months, so this is pretty different ha

[19:54:23] <steeze> im sure ill understand all of what you said before too long :) haha

[19:54:34] <steeze> it's cloudy, but it's coming together

[19:54:34] <GothAlice> Read that last article. It should greatly enlighten. :)

[19:56:45] <GothAlice> I've traded 14ms to fire off an extra query on each login for a reduction by orders of magnitude in the time it takes to answer the question "Can I get a chart of user activity comparing the last two weeks to the two weeks prior?

[19:57:05] <GothAlice> http://s.webcore.io/image/3c1J2p091D1W < looks roughly like this in the end. ;)

[19:58:10] <steeze> haha nice

[19:59:02] <steeze> this stuff is kickin my butt haha

[19:59:14] <GothAlice> http://s.webcore.io/image/142o1W3U2y0x is an older sample of our overall dashboard, all powered using pre-aggregation and the aggregation framework. (No map/reduce was harmed in the production of that page.) Pre-aggregation is the only way we were able to get this page to generate in a reasonable amount of time.

[19:59:51] <steeze> oh wow i need your knowledfe

[19:59:55] <steeze> knowledge*

[20:00:01] <steeze> ill be needing to do a lot of similar stuff

[20:00:10] <steeze> ill keep reading the stuff you gave me and referencing your examples

[20:00:39] <GothAlice> Making the stats comparative ("vs. others") was painful from a query performance perspective. ;)

[20:00:50] <steeze> that's definitely a down the road thing

[20:01:02] <steeze> i just want them to be able to see one set of data for now haha

[20:04:05] <GothAlice> But, fun fact, pre-aggregation saved our bacon on the comparisons, too. We store pre-aggregations of the data grouped to the (company, job), but also just (company), and by (job.category), and (company.category), and by "1". Then comparing a value vs. everybody means we load up the "1" pre-aggregate record. Or vs. everyone in your industry just needs the "company.category" pre-aggregate.

[20:04:12] <plodder__> hello, can anyone point me to the latest documentation for the mongodb wire protocol?

[20:04:23] <plodder__> the one on the website is legacy..

[20:04:27] <GothAlice> I.e. it doesn't go from vs. one company O(1) to vs. everyone O(n) where n is the number of companies, it's simply O(2).

[20:04:38] <steeze> oh wow very nice

[20:04:53] <steeze> 0(n) es no bueno

[20:05:06] <GothAlice> steeze: https://github.com/mongodb/specifications

[20:05:11] <GothAlice> Oops, sorry.

[20:05:12] <GothAlice> plodder__: https://github.com/mongodb/specifications

[20:05:15] <steeze> heheh

[20:07:28] <plodder__> @GothAlice: thanks, but short of parsing the source code, is there a separate doc?

[20:08:12] <plodder__> @GothAlice: oh sorry, i guess this *is* the doc

[20:08:22] <plodder__> thanks!

[20:21:16] <mpg456> Hi all, what should I do if I'm trying to get 10gen-gpg-key.asc , but it's 404?

[20:24:41] <steeze> there's another one here i think https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CCMQFjABahUKEwjjy-zzrc_IAhUE0WMKHQE5BdM&url=https%3A%2F%2Fdocs.mongodb.org%2F10gen-security-gpg-key.asc&usg=AFQjCNEKUo9x1CakGOFPg9xK1ub5aYETDQ&sig2=RP_JwKJ15z9699t8kNYONA&bvm=bv.105454873,d.cGc&cad=rja

[20:25:01] <steeze> whoops, https://docs.mongodb.org/10gen-security-gpg-key.asc

[20:25:39] <mpg456> thanks steeze!

[20:37:06] <steeze> GothAlice, so doing the pre-aggregation makes some assumptions right? like the length of my intervals (in your case 1hr)

[20:37:33] <GothAlice> Yes, you base the assumptions on the facts of the display, though.

[20:38:05] <steeze> hmm okay. would a resolution of 10min cause performance issues?

[20:38:10] <GothAlice> An hourly period was chosen as the finest-grained because that's the closest our charts let you zoom. (Where each step in the X axis is one hour.)

[20:38:18] <GothAlice> You can do napkin calculations.

[20:39:06] <GothAlice> If you have a chart showing 24 hours of data and your data is divided up into 10 minute segments, that's 144 records to aggregate for display.

[20:39:16] <GothAlice> If you have a chart showing 24 hours of data and your data is divided up into hourly minute segments, that's 24 records to aggregate for display.

[20:39:54] <GothAlice> s/hourly minute/hourly/

[20:40:10] <steeze> ah gotcha. i wasnt sure of all the mechanisms to know where performance costs lie

[20:40:11] <steeze> lay

[20:40:45] <GothAlice> The pre-aggregation article I linked goes into other ramifications of how you pre-aggregate, such as document storage overhead and index size/efficiency.

[20:41:20] <steeze> just to make sure i follow the steps:

[20:41:32] <steeze> you do pre-aggregation whenever a login attempt is made? in that upsert?

[20:41:47] <GothAlice> That's an upsert.

[20:41:50] <steeze> and you only have one record/user/hour?

[20:41:57] <GothAlice> One record per hour.

[20:42:12] <steeze> one record/hour of all login attempts and user?

[20:42:29] <GothAlice> One record/hour for all login attempts regardless of user.

[20:43:09] <steeze> ok

[20:43:22] <GothAlice> db.activity.update({period: utcnow().replace(minute=0, second=0, microsecond=0), {$inc: {count: 1}, $addToSet: {users: user.username}}, upsert=True)

[20:43:37] <GothAlice> It's also quite efficient: if there is no activity in a given hour, there is no record for that hour.

[20:44:21] <steeze> im thinking i can structure my data differently to make this easier

[20:44:45] <GothAlice> Pre-aggregation gives you the opportunity to store data in a way that more closely matches how you are going to use/display it.

[20:44:59] <steeze> instead of creating a doc for each api call,ill do something similar to you. ill create a doc for each user/hour and use $addtoset to add the timestamps and methods_and_route

[20:45:12] <steeze> upsert when the user makes an api call

[20:45:13] <GothAlice> That still sounds backwards.

[20:45:19] <steeze> ahhhh

[20:45:38] <GothAlice> $addToSet and timestamp together are the "bad sign" in my eyes.

[20:45:58] <steeze> well id have an array of timestamps that correspond to an array of timestamps so i can see when my user did what

[20:46:22] <steeze> and since there's 1 record/hour/user, i can easilly count those unique users for my charts

[20:46:33] <steeze> correspond to an array of method's and verbs*

[20:46:44] <GothAlice> Why do you keep including "user" in that set?

[20:47:03] <GothAlice> And additionally, do you actually report with second-accurate values?

[20:47:04] <steeze> i need history of what users are oding and when

[20:48:03] <GothAlice> There's a world of difference between needing a history of what users are doing and when for the purpose of accurate automated auditing, for example, vs. display on a page for consumption by human eyeballs. The latter doesn't require much in the way of accuracy, thus in my example completely throwing away the minute/second/microsecond data.

[20:48:36] <steeze> more like, all this stuff got deleted at this time, let's see what user was doing what when

[20:48:37] <GothAlice> I.e. the question that started this revolved around the _number_ of active users for a given period or range of periods. That in no way requires time data more accurate than the period size.

[20:49:31] <steeze> im not only charting active users/time, but also actions/time per user

[20:49:48] <GothAlice> But what is the granularity of your display?

[20:49:56] <GothAlice> I.e. the step size in the X axis on your chart.

[20:50:41] <steeze> the first one will be 1hr (just decided that for the pre-aggregation), the latter will just be a chart of actions and timestamps

[20:50:57] <GothAlice> :|

[20:51:11] <GothAlice> "just a chart of actions and timestamps" — what granularity on the X axis, then?

[20:51:17] <steeze> im still typing ha

[20:51:51] <GothAlice> I.e. if your chart for a 24 hour period is 300 pixels wide, and you throw infinite accuracy data at it, the best you're going to get is 4.8 minutes per pixel.

[20:53:06] <steeze> im at a loss for words right now, cant think of how to say what im thinking ha

[20:54:37] <steeze> i dont need the second chart to be accurate at all, it really jsut shows the order of their actions

[20:55:11] <steeze> (with timestamps, but the data doesnt have to map accurately to the x axis' linear scale)

[20:58:26] <steeze> wish i had a better project manager that told me what i needed to do for this. this is only my 4th month of full time devving

[20:58:48] <steeze> i enjoy solving these problems, but ive been chasing rabbit hole after rabbit hole it feels :/

[21:10:41] <steeze> thanks for the help GothAlice! gotta walk home from the office

[21:10:48] <GothAlice> No worries.

[21:18:54] <ah-> hi, what's the most performant way to look up a ton of documents by key?

[21:19:23] <ah-> so i have a lot of ~1kb documents, keyed by the hash of the document

[21:19:50] <ah-> and i want to query lets say 1000000 of them at once

[21:20:02] <ah-> is there anything to keep in mind to make that read fast?

[21:20:22] <ah-> currently i'm doing a huge find $in query on the _id

[21:20:23] <cheeser> index that key?

[21:20:36] <ah-> should be indexed if its the _id?

[21:21:03] <cheeser> yes. the _id is always indexed.

[21:24:03] <ah-> it still takes a while to read all that, and suspiciously longer than it takes to write these documents

[21:24:29] <ah-> which i think must be doing pretty much the lookup since it does return me duplicate errors

[21:26:09] <ah-> is there a way to help mongo find things faster? by sorting the lookup keys first or something?

[21:26:26] <ah-> or would bulk queries make sense?

[21:30:05] <cheeser> probably a bad query. run explain on your query and see what indexes it's using.

[22:46:17] <morenoh149> how do I query for all items that match any of XYZ tags but only 10 of each (limit for each)

[23:02:50] <symbol> Is findAndModify overkill for finding or creating a user?

[23:04:28] <cheeser> what part would you be modifying?

[23:05:04] <symbol> If it's found, nothing. If not, create the user. I'm using github authentication.

[23:05:59] <symbol> I did notice that it's deprecated in the native driver 3.0

[23:06:04] <cheeser> and then you want to return the user account, yes? findAndModify is probably fine then. use $setOnInsert for any fields you only want to set when doing the initial creation.

[23:06:17] <cheeser> that particular method. not the feature.

[23:06:52] <symbol> Huh? The method is deprecated but not the feature...not sure what that means?

[23:07:21] <cheeser> the feature on the server (and the driver) isn't going away. jsut that particular function to access it.

[23:07:59] <symbol> Ah, so I still shouldn't be using findAndModify via the driver?

[23:08:12] <cheeser> you should. just a different function in the driver, i'm guessing.

[23:08:26] <joannac> morenoh149: aggregation

[23:09:16] <symbol> Looks like I have to use findOneAndUpdate instead. I suppose that makes sense.

[23:09:32] <symbol> The intention is more clear there anyhow.

[23:11:37] <cheeser> the last round of drivers (from mongodb) all conform to a uniform spec which dictates certain API shapes

[23:11:56] <symbol> That's neat.

[23:15:04] <symbol> Thanks for the help

Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 19th of October, 2015