pmxbot IRC Log Viewer

[00:00:50] <Boomtime> always set wtimeout to less than socket timeout

[00:01:47] <Boomtime> wtimeout applies to writes, use maxTimeMS for queries

[00:02:02] <wsmoak> I don’t think I’ll set wtimeout at all … I just need it to not wait forever which is apparently the default

[00:02:10] <Boomtime> if you set a socket timeout you need to always set one of these two

[00:02:24] <wsmoak> … now if only there were a usage example in the docs …

[00:02:26] <Boomtime> if you don't set wtimeout as well, you'll be sorry

[00:03:23] <Boomtime> socket timeout used by itself is a really wonderful way to DOS yourself

[00:09:55] <wsmoak> because … if I tell the driver to give up after 30 seconds without configuring wtimeout. I give up. what horrible thing happens?

[00:11:18] <Boomtime> the server keeps working

[00:11:32] <Boomtime> the DRIVER gives up after 30 seconds

[00:17:49] <wsmoak> Boomtime: oh, okay. I was looking at it from the perspective of the server being down. thanks again!

[00:20:50] <Boomtime> everybody does, people set socket timeout to take care of one particular condition without realizing the impact it has on other similar looking conditions

[00:48:37] <wsmoak> still completely unable to get the driver to time out after 5 seconds . following http://mongodb.github.io/node-mongodb-native/1.4/driver-articles/mongoclient.html#mongoclient-connect-options

[00:49:17] <wsmoak> and code is at https://github.com/wsmoak/geojson-example/blob/master/server.js

[00:51:39] <Boomtime> I don't know about Node crazy extension options, but please try using the connection URI instead

[00:51:56] <Boomtime> I note you have a ? at the end of your URI

[00:52:12] <Boomtime> normally you only have that if you actually have options, so add your connect timeout after that

[00:52:31] <Boomtime> ?connectTimeoutMS=5000

[00:52:58] <Boomtime> sorry, socketTimeoutMS

[00:55:39] <freeone3000> I'm trying to get the number of unique requestInfo.userIds, grouped by domain and subdomain, such that I have (ALARM, ALARM_BY_TIME, 30), (ALARM, ALARM_MOVE, 22), (CALENDAR, CALENDAR_SEARCH, 35)... I've tried https://gist.github.com/freeone3000/25e83e37bbb34820e75b but it gives me one entry.

[00:56:25] <freeone3000> SQL query in question would be "SELECT DISTINCT COUNT(requestInfo.userIds) FROM base WHERE dateAdded > startDate AND source IN routes GROUP BY domain, subdomain".

[00:59:47] <Boomtime> What is your intention of this stage -> { "$group": { "_id": 1, "userCount": { "$sum": 1 } } }

[01:03:58] <freeone3000> Boomtime: Trying to apply "distinct".

[01:04:32] <freeone3000> Boomtime: If I use "_id": { "uid": "$uid", "domain": "$domain", "subdomain": "subdomain" } there, I instead get the number of queries each user has performed in that (domain, subdomain).

[01:07:11] <freeone3000> DISTINCT Isn't quite in the "SQL to aggregation" mapping chart. There's a distinct *op*, but that only seems to apply to queries...

[01:07:51] <Boomtime> yeah, I see exactly what is going on, I will reply when I have time in 5 minutes or so

[01:14:56] <wsmoak> Boomtime: I had tried that with and without the ?, and with url parameters. Turns out, it’s broken: https://jira.mongodb.org/browse/NODE-288

[01:19:02] <Boomtime> wsmoak: bugger

[01:19:32] <Boomtime> good job finding the ticket and making a comment there though

[01:20:00] <Boomtime> freeone3000: ok, how about we go through your pipeline, because it does a few things that I think are not what you expect

[01:20:27] <Boomtime> you should try running it with just the first three stages, in the mongo shell, and observe the results

[01:21:31] <Boomtime> your third stage is where it starts to run off the rails - the documents being fed into the next stage do not have the information pertaining to 'domain' and 'subdomain'

[01:25:23] <Boomtime> freeone3000: you can achieve what you want with exactly two $group stages only (nothing else)

[01:25:42] <freeone3000> Okay. Still waiting on the first stage to complete.

[01:25:49] <Boomtime> (oops, you need your $match too, sorry I forgot about that)

[01:26:37] <Boomtime> I assume your $match is correct.. have you run that as a query with .count() to see how many documents would feed into the next stage?

[01:27:07] <Boomtime> because if a billion documents pass the $match then the next $group is going to be a while

[01:28:46] <freeone3000> Boomtime: What would a count look like as an aggregate step?

[01:32:13] <Boomtime> this is a pretty effective count -> { "$group": { "_id": 1, "count": { "$sum": 1 } } }

[01:32:19] <Boomtime> you may recognise it :-)

[01:32:50] <Boomtime> to test a $match though, you can just use the regular find() method in the shell, the syntax is the same

[01:32:52] <freeone3000> Boomtime: Alright, query is running.

[01:33:06] <Boomtime> and add .count() to the end of the .find()

[01:34:21] <freeone3000> Yeah, that's also running.

[01:36:00] <Boomtime> uhuh.. is it indexed?

[01:36:02] <freeone3000> Boomtime: It doesn't usually take this long in code.

[01:36:10] <freeone3000> Boomtime: routes and dateAdded are.

[01:38:21] <freeone3000> Boomtime: I can't seem to determine what's taking so long without admin access.

[01:39:10] <freeone3000> I'll report back once it's solved. Hopefully we can figure out what goes wrong before you go away. :)

[03:00:00] <tim_t> how can I search an already retrieved array list for embedded elements?

[03:00:53] <Boomtime> "already retrieved array list" <- you have an array in some language then?

[03:01:15] <tim_t> sorry. yes, using java and morphia

[03:01:53] <Boomtime> so your question is "I have an array in Java, how can I find if a certain element is present?"

[03:02:40] <Boomtime> alternatively, if you only want a specific element from a search, you can have MongoDB do that part for you

[03:02:48] <tim_t> using the morphia API if possible, the objects to search over are tagged as embedded

[03:03:20] <Boomtime> I have not used morphia, sorry

[03:03:27] <tim_t> ah so i have to do it from a "top level" rather than get some results and search over those?

[03:04:58] <tim_t> mongo is okay also

[03:05:26] <Boomtime> searching for stuff you want is basically what a database is good for

[03:05:53] <Boomtime> if you are looking to search your array because you only want specific elements then you can instruct the database to do that

[03:06:23] <Boomtime> if you are planning to consume the whole array, but need to identify certain elements as well, then you need the whole array from the database (thus, too bad, do it in code)

[03:06:39] <tim_t> yeah i just wanted to know if there was a way to get some part of the database using mongo/morphia and then possibly use the api so search more within that returned data

[03:07:09] <tim_t> but it looks like it has to be with a single query instead of two in sequence

[03:07:17] <tim_t> thats fine though i was just wondering

[03:08:16] <tim_t> \that makes sense because the results once out to java-land are no longer the responsibility of the database API

[03:08:24] <Boomtime> right

[03:09:05] <tim_t> alright thanks Boomtime. I'll adjust my model and make this thing work.

[04:01:48] <rh1n0> Greetings all. I need to write some code to query records from one collection in a remote mongodb server. I then need to import some of these records into a different mongodb server. Because i have to check various conditions i cant simply use copy (or similar). What would be a fast language to do this? By trade im a ruby dev but i dont think ruby can do this very fast.

[05:48:25] <blizzow> when/where do I add a user for authentication when deploying a replicated sharded cluster? I have my three config servers set up with keyfiles and three routers set up with keyfiles. I'm confused as to when I add a db/cluster admin user. Do I add a user to the admin database the config servers and the same user to the admin database on replica sets I add as shards?

[05:54:13] <Boomtime> http://docs.mongodb.org/manual/tutorial/add-user-administrator/#first-user-restrictions

[05:55:24] <Boomtime> blizzow: the short answer is that your create a user admin by utilizing the localhost exception

[05:55:55] <Boomtime> the localhost exception only applies when you have no users, be sure to create your first user as a user-admin or you're hosed

[05:56:20] <Boomtime> the link I provided goes through all of this step-by-step

[05:58:01] <blizzow> Boomtime: I know that part, I'm confused as to where I do this...on each config server, or do I do it on the replica set before I add it as a shard?

[06:00:05] <Boomtime> connect on a mongos

[06:00:23] <Boomtime> in a cluster you should do all things connected to a mongos, unless you hate having data

[06:01:23] <Boomtime> (there are exceptions, they are rare)

[06:03:33] <blizzow> So I only have to add a user on one router and the config magically replicates to my other routers or do I have to do this on all routers?

[06:11:17] <Boomtime> all routers (presumably you mean mongoS) use the config servers to auth against, all users are stored in the config servers, so now you know the answer :-)

[09:56:49] <elatomo> Hi! Is it possible to exclude "_id" from one embedded document with $project? `{$project': {'item._id': 0}}` raises a "The top-level _id field is the only field currently supported for exclusion"

[09:57:52] <elatomo> If possible, I would like to get rid of it while aggregating, so I don't to run extra operations on the results

[12:50:19] <techfreak> hi guys

[12:50:55] <techfreak> how can I make replicaset slave to be standalone server without loosing data?

[13:00:03] <techfreak> guys?

[13:24:20] <Ron4ldinho> what causes stopIteration in a map_reduce?

[14:18:13] <agenteo> hi there, by looking at the mongodb logs how can I see if a query is a cache hit?

[14:21:43] <remonvv> There is no caching. Assuming you mean if mongo served the result of your query from data that has been swapped to memory then the answer is that you cannot from the log.

[14:46:03] <agenteo> @remonvv thanks for clarifying that.

[15:56:27] <ZenDoodles> if I have a collection which has a reference one relationship to another... say user references a group. So the group can have many users, but a user can only have one group.

[15:57:27] <ZenDoodles> If I want to find all the groups without a user, how would I go about that?

[15:57:53] <ZenDoodles> I can write the sql query, but that is unhelpful here...

[16:05:23] <remonvv> agenteo: You can look at mongostat output. Page faults (the value being non-zero) means that data is being swapped in and out of memory.

[16:06:02] <remonvv> ZenDoodles: Depends on your schema. Do you have one already?

[16:07:00] <remonvv> ZenDoodles: I'm asking because there are a few ways to go about that. Some scale, some don't. Some are good ideas, some aren't. The options available to you depend on what data you're storing where.

[16:28:45] <ZenDoodles> remonvv: We do have one already

[16:33:49] <remonvv> ZenDoodles: okay, well if it's easy to share through a pastie or something that'd be nice. If not; basically without denormalizing something (e.g. tagging a user document as with hasGroup: true or something) you're stuck to a $nin type query

[16:34:19] <remonvv> ZenDoodles: Not necessarily a problem if the group count stays relatively small.

[16:34:36] <ZenDoodles> there will be tens of thousands of records

[16:34:56] <ZenDoodles> Oh... actually, in this case no

[16:35:55] <ZenDoodles> the group count is small.

[16:36:07] <ZenDoodles> many users tho.

[16:36:54] <ZenDoodles> remonvv: so inefficient is probably okay

[17:11:28] <remonvv> ZenDoodles: Many users is not a problem. In any case, i'd go for something like db.users.find({groupId:{$nin:[group1, group2, .., groupN]}})

[17:27:58] <jeromegn> anybody with experience with the ruby driver? I’m using master right now. I’m wondering if it’s possible to close the connection(s) for an instance of Mongo::Client. I tried client.cluster.servers.each {|s| s.disconnect! } which works, but it still prints {ismaster: 1} commands and a connection is still open from my MongoDB server logs.

[17:39:44] <blizzow> How do I determine the status and health of my 3 config servers for a sharded cluster? According to the docs, they should not be set up in a replicaset configuration.

[17:52:34] <blizzow> How can I verify that they're talking to each other and all is well?

[17:53:31] <Cor__> my question is.. I have sql

[17:53:48] <Cor__> table: article_group

[17:54:44] <Cor__> my query SELECT article Where group IN (LARGE, array) and group IN (LARGE array)

[17:54:56] <Cor__> my query SELECT distinct(article) Where group IN (LARGE, array) and group IN (LARGE array)

[17:55:05] <Cor__> can I optimize this with mongodb?

[17:56:50] <Cor__> please help :)

[18:02:25] <freeone3000> After an "ensureIndex()" call, new records inserted don't seem to be indexed. Why would this be?

[18:04:31] <Derek57> Hi there, just a quick one for someone who knows it off the top of their head. In what way would indexes update faster. If I had 7 indexes, and inserted/update 200k documents, would the indexes update faster if they were kept separate, in a compound index, or no difference between the two?

[18:27:22] <GothAlice> Derek57: Compound indexes, carefully chosen, will speed up quite a number of operations. In terms of incrementally adding to the index (which is what inserts would do if they happen after the index is already built) either you loop over the fields in a compound index, or you loop over multiple indexes… you're looping either way, so no *real* performance gain.

[18:27:32] <agenteo> hi, I’ve set db.setProfilingLevel(2) but mongo log isn’t showing some of my queries. I have indexes on them and they taking less then 1ms I think.

[18:27:51] <agenteo> isn’t db.setProfilingLevel(2) meant to profile all?

[18:28:07] <Derek57> @GothAlice Got it, makes sense. Thanks. :)

[18:28:46] <GothAlice> Derek57: However the question suggests a lack of understanding of indexes in MongoDB, so here is a handy link: http://docs.mongodb.org/manual/core/indexes/ and a note that on any given query, only _one_ index can be used. (This is why compound indexes are so important!) If you aren't adept at reading the output of .explain(), there are tools like http://blog.mongolab.com/2012/06/introducing-dex-the-index-bot/ to help with figuring out

[18:28:46] <GothAlice> optimal indexes. :)

[18:31:25] <GothAlice> agenteo: I believe there is a split between what gets logged (into the log file or syslog) and what gets placed into the system.profile collection. With a profiling level of 2, everything should certainly go into the system.profile collection, but it may still rely on the "slowms" speed to determine what gets logged to the log. (I may be wrong, but the docs aren't very clear on this.)

[18:31:27] <GothAlice> http://docs.mongodb.org/manual/reference/method/db.setProfilingLevel/

[18:32:23] <GothAlice> Fact: "mongod writes the output of the database profiler to the system.profile collection." Fact: "mongod prints information about queries that take longer than the slowOpThresholdMs to the log even when the database profiler is not active." These facts do not combine to "a level of 2 will log all operations", only that it will record them.

[18:33:29] <agenteo> I am not changing in the admin running: db.runCommand( { setParameter: 1, logLevel: 1 } ) and that tough increse the log verbosity on all ops which is something I’d rather not have… :\

[18:33:51] <agenteo> at least I see the queries taking “0ms”

[18:34:29] <GothAlice> agenteo: I believe the correct pattern of use, here, is to increase the level, but then examine the system.profile collection for results. (And you'd probably want to prune old data out of that occasionally… enabling profiling has performance and storage impact.)

[18:35:09] <GothAlice> (Or simply set slowms to 0… that may work.)

[18:36:48] <agenteo> I tried that when running setProfilingLevel(2, 0)

[18:36:50] <agenteo> but no dice

[18:37:03] <agenteo> thanks for reminding me about system.profile, I’ll check it there rather then tailing the log

[18:37:04] <GothAlice> There's an alternate approach, which we took at work for a variety of reasons, in that you can watch the oplog yourself and process it in realtime (depending on workload, of course).

[18:38:37] <GothAlice> We use "tailing" of the oplog and liberal use of $comment within our queries to pass data to the audit process that records changes. (So we can let the audit process know the logical user from the front-end who issues every query.)

[18:48:18] <Cor__> please can anyone answer the question above?

[18:48:43] <shoerain> hm does mongo have a bulk lookup? Should I be using $in ( http://docs.mongodb.org/manual/reference/operator/query/in/#op._S_in ) if I have a list of N ObjectIDs?

[18:48:44] <unholycrab> i run mongodb on aws and get the startup warning "WARNING: You are running on a NUMA machine. We suggest launching mongod like this to avoid performance problems numactl --interleave=all mongod [other options]"

[18:48:52] <unholycrab> are all AWS machines like this ?

[18:49:17] <shoerain> unholycrab: what's your AWS instance? I don't remember seeing that on m1.small and c3.large

[18:51:24] <shoerain> Cor__: well, there's a mapping of some SQL concepts to Mongo: http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/, have you taken a look? I don't remember my SQL that well, to be honest

[18:54:02] <unholycrab> shoerain: r3.8xlarge

[18:54:04] <unholycrab> woo

[18:55:45] <Cor__> Thnx! shoerain I will go and read

[18:56:30] <freeone3000> unholycrab: You'll get that on r instances, deliberately, because some of their memory is actually mapped instead of being pure virtualized.

[19:05:10] <freeone3000> How do you get .distinct() as a group operator? I have a uid: ["a", "a", "b"] and want an aggregate stage that gives me uid: 3

[19:05:28] <freeone3000> Err. Sorry, uid: 2, and {"$sum": 1} currently gives me 3.

[19:06:02] <Derek57> GothAlice: Sorry for the late reply. Thanks for the link, some reading would do me some good. :) So, it only hits one index. That means if I make a query with the mindset that 7 indexes will be used to complete it, it'll only use one?

[19:06:32] <GothAlice> Derek57: Correct. And it might not even pick the "best" index in that case.

[19:06:57] <GothAlice> (I'm beginning to suspect mongod calculates the expense of working that out, and won't try if it'll take too long.)

[19:07:28] <GothAlice> Derek57: Note that 95% of my queries are handled by three indexes per collection.

[19:07:57] <GothAlice> (But I also have compound indexes involving up to six fields.)

[19:08:28] <Derek57> GothAlice: Hmmm, fair enough, that probably explains some slowness on bigger queries. Though,would it be crazy to make a compound index with something like 20-25 keys (might be exaggerating a bit)? Some of our queries can get pretty specific.

[19:08:58] <unholycrab> freeone3000: looks like its a feature of the cpu that gets turned on for r and i instances

[19:09:16] <freeone3000> unholycrab: Makes things faster, yeah.

[19:09:17] <unholycrab> oh you said that. yes thanks

[19:09:21] <GothAlice> Yes; there's a point of diminishing return. Generally don't index a field if it has large cardinality (i.e. it isn't overly useful to remove large numbers of possibilities).

[19:09:29] <GothAlice> Derek57: ^

[19:10:52] <GothAlice> (If half of your documents have an indexed value "a", and the other half have "b", indexing on that isn't useful since scanning the whole table will be just as fast as trying to use the index.)

[19:13:05] <GothAlice> Derek57: This makes it important to sort your compound indexes with more selective fields up front, also considering that leading subsets of the compound index can be used on queries. I.e. indexing on ["username", "joined"] will use that one index for queries that are specific (use both) as well as just "username" searches.

[19:14:21] <Derek57> GothAlice: Oh so a query can still uses a single field in a compound index will still use the compound index? That would make life quite easy.

[19:15:04] <GothAlice> Derek57: Yes, but only _leading_ subsets. An index on [foo, bar, baz] ~= [foo, bar] ~= [foo]. (Not [bar, baz]!)

[19:16:09] <GothAlice> (Or [bar], or [baz]…)

[19:17:53] <Derek57> GothAlice: Got it, the order is extremely important then. In order to use third index in the compound, it has to use the first and second as well. Good stuff! Just taking a look at the documentation, I am still able to include a geospatial index in the compound, correct?

[19:18:24] <unholycrab> freeone3000: i want to benchmark a couple of r3 instances with numa/not numa. do yout think mongoperf would be a good test? its supposed to be a disk test

[19:18:56] <freeone3000> numa has nothing to do with disk; if you're using disk, the performance differential between your fastest and slowest memory isn't discernable.

[19:19:25] <unholycrab> yeah maybe not, since it doesn't benchmark mongodb

[19:19:43] <GothAlice> Derek57: Yes, though there may be restrictions. (Such as only one geo field per compound index.) Fulltext indexes have similar restrictions. See: http://docs.mongodb.org/manual/core/index-compound/ for more. :)

[19:24:22] <Derek57> GothAlice: Fantastic, I'm going to give these a couple tests and see what comes out. Thanks again for the help, really appreciate it.

[19:25:44] <GothAlice> Derek57: It never hurts to help.

[19:26:38] <ikus060> Hello, using mongodb. I've store date value as epoch. Is it possible to query the data as date with a projection or something ?

[19:28:05] <freeone3000> How can I manipulate the query in https://gist.github.com/freeone3000/95b639d6f70011f45e21 to produce results like the desired result? I want to apply "distinct sum" to the "uid" output field.

[19:28:32] <GothAlice> ikus060: http://docs.mongodb.org/manual/reference/operator/aggregation/year/#exp._S_year — looks like they need to be real dates. Storing things as epoch seconds sounds like a useful optimization, but it hurts most uses for the data.

[19:29:23] <GothAlice> freeone3000: Distinct on which field, summing uid?

[19:29:34] <ikus060> <GothAlice> yeah I know it's not good to store date as epoch. It's a design constrains

[19:29:37] <freeone3000> GothAlice: Distinct on uid. I want the counts of unique uids for each (domain, subdomain).

[19:30:59] <GothAlice> freeone3000: So you want to $group on {_id: {domain: "…", subdomain: "…"}} and {$sum: "$uid"}… which actually looks like what you have. Ah! I see, I'm confusing the desired result with the actual result. Give me a moment.

[19:33:59] <GothAlice> That's a tricky one, and there's probably a simpler way to do it, but: [{$match: … your match …}, {$project: {domain: 1, subdomain: 1, uid: 1}}, {$group: {_id

[19:34:07] <GothAlice> Er, didn't mean to press enter there, sorry, still working. ^_^

[19:35:34] <GothAlice> freeone3000: Confirming: there will always be a uid for each record?

[19:35:40] <freeone3000> GothAlice: Yes.

[19:36:04] <GothAlice> freeone3000: [{$match: … your match …}, {$project: {domain: 1, subdomain: 1}}, {$group: {_id: {domain: "$domain", subdomain: "$subdomain"}, uid: {$sum: 1}}}]

[19:36:29] <GothAlice> freeone3000: Because there is always a uid, and you only care about the "count" of UIDs, you can omit the uid field completely and just count the number of records that were grouped into each group.

[19:36:48] <freeone3000> GothAlice: That's the number of queries, yes, but it doesn't count *unique* uids.

[19:37:10] <freeone3000> GothAlice: In "result", some uids are listed twice, because there are two queries with the same uid. ("User ID", not "Unique ID").

[19:37:18] <GothAlice> Ahh. That's a bit different, but not much different.

[19:37:23] <GothAlice> ^_^ Silly acronyms.

[19:37:29] <freeone3000> GothAlice: Okay. The distinct part is what I'm having trouble with.

[19:38:46] <GothAlice> freeone3000: [{$match: … your match …}, {$project: {domain: 1, subdomain: 1, uid: 1}}, {$group: {_id: {domain: "$domain", subdomain: "$subdomain"}, uids: {$addToSet: "$uid"}}}, {$project: {uidc: {$size: "$uids"}}}]

[19:39:14] <GothAlice> (Note the re-adding of uid, addToSet to only add unique uids to each group, then subsequent $project to get the length of that set.)

[19:41:04] <sekyms> am I missing an obvoious reason as to why the callback isn't firing?

[19:41:06] <sekyms> advertisementSchema.statics.findByIds = function (ids, cb) {this.find({'id': {$in: ids}}, cb);};

[19:43:14] <freeone3000> GothAlice: Ah, you can use aggregates in $project! Thanks.

[19:43:59] <GothAlice> $group and $project both use generic expressions. http://docs.mongodb.org/manual/meta/aggregation-quick-reference/#expressions

[19:44:34] <GothAlice> Some, such as {$sum: 1}, only make sense in the context of $group, though.

[19:44:37] <freeone3000> GothAlice: That worked great, thanks.

[19:45:06] <GothAlice> freeone3000: It never hurts to help.

[19:48:05] <sekyms> :D

[19:51:03] <GothAlice> sekyms: I don't JS, sorry. https://www.youtube.com/watch?v=Othc45WPBhA and http://callbackhell.com ;) (The dotJS 2012 WTFJS presentation, top related, is also highly appropriate. typeof new String('bob') !== typeof 'bob' is a nice implementation detail, only 48-bit integer accuracy, and the two page definition of soft comparators are bowel-loosening.

[19:51:36] <sekyms> thanks

[19:51:42] <sekyms> for responding

[19:52:08] <GothAlice> sekyms: While I can't help with JS problems, at least I can inject some JS humour (via that video, which is rolling-on-the-floor hilarious). :)

[19:52:28] <sekyms> I will have to take a look later

[19:53:07] <GothAlice> Array(8).join("wat" - 1) + "Batman!" :)

[19:53:34] <sekyms> don't hate on JS, it's a beautifully fucked up language

[19:55:48] <sekyms> its getting there

[19:55:48] <GothAlice> The WTFJS presentation actually illustrates how to avoid most of the issues, i.e. "the right way" to do things like typeof comparisons.

[19:55:58] <GothAlice> T'was very informative.

[19:55:58] <sekyms> ecma 6 has some nice features

[19:56:13] <sekyms> thats why you use deep equals

[19:56:19] <sekyms> instead of just ==

[19:57:03] <GothAlice> Equal signs won't save you from "typeof new String()" being "object". ¬_¬

[19:57:26] <sekyms> well everything is an object in javascript

[19:58:48] <GothAlice> sekyms: typeof "foo" -> "string"

[19:59:12] <GothAlice> This is OO, Jim, but not like we know it. XP

[20:00:10] <sekyms_> sorry my connection drops for no reason

[20:00:17] <cheeser> startrekking across the universe!

[20:00:31] <GothAlice> sekyms: typeof "foo" -> "string" (This is OO, Jim, but not like we know it.) XP [my last]

[20:47:03] <pbunny> hi

[20:47:10] <GothAlice> Hello, pbunny.

[20:47:32] <pbunny> is there a method in mongo-cxx-driver (libmongoclient?) for sending json or some object to server?

[20:47:38] <pbunny> f.e. { isMaster: 1 }

[20:47:51] <pbunny> i know there's IsMaster() method

[20:48:08] <pbunny> but i'm writing mongo proxy so i'd rather just send requests as they come

[20:48:36] <pbunny> don't feel like coding tons of IFs

[20:48:54] <GothAlice> pbunny: MongoDB is all about passing objects between client and server; Mongo uses BSON for this. You're going to want to truly grok BSON, and MongoDB's wire protocol, before attempting to write a proxy for it. Ref: http://bsonspec.org

[20:49:26] <pbunny> what method(s) should i look into for turning string like { isMaster: 1 } to BSON object?

[20:49:27] <GothAlice> (JSON itself doesn't make for a very good wire protocol, since the length of things is undefined.)

[20:49:49] <pbunny> and, more importantly, which method can be used to actually send BSON object?

[20:49:54] <GothAlice> pbunny: https://github.com/mongodb/mongo-cxx-driver/wiki/BSON%20Helper%20Functions

[20:50:16] <pbunny> i think i'm familiar enough with creating BSON object

[20:50:18] <pbunny> but how to send it?

[20:50:19] <GothAlice> pbunny: For that, you're going to want to spelunk through the code powering mongo-cxx-driver.

[20:51:03] <GothAlice> The driver uses https://github.com/mongodb/mongo-cxx-driver/blob/legacy/src/mongo/client/wire_protocol_writer.cpp internally.

[20:52:11] <GothAlice> Used in the constructor (I think? I don't C++) here: https://github.com/mongodb/mongo-cxx-driver/blob/legacy/src/mongo/client/dbclient.cpp#L1580-L1583

[20:52:21] <pbunny> what's StringData ?

[20:52:35] <GothAlice> And here: https://github.com/mongodb/mongo-cxx-driver/blob/legacy/src/mongo/client/dbclient.cpp#L1829-L1843

[20:53:08] <pbunny> ok thx, will look into

[21:00:03] <jto3> hello, I have problem understanding proper use of mongorestore. I've restored mongodb volume from AWS snapshot, and now I wont to replay oplog, but when I do this, mongorestore creates new index ( allocating new ns file , allocating new datafile ). is there a way to prevent this ?

[21:00:30] <jto3> want*

[21:05:01] <GothAlice> jto3: There are several concerning points you have mentioned. First, mongorestore is designed to restore the output of mongodump. Second, restoring from an EBS snapshot is what's referred to as a filesystem snapshot, and there are some highly important notes to go with using that approach: http://docs.mongodb.org/manual/tutorial/backup-with-filesystem-snapshots/

[21:05:50] <GothAlice> Generally, if there was a journal present on the EBS volume when you took the snapshot, simply starting MongoDB on the restored data will apply any pending journal commits and return the database to a known good state.

[21:05:51] <jto3> GothAlice: I'm doing two ways backup, one is snapshot every hour, and in the meantime i'm uploading oplog

[21:06:09] <GothAlice> jto3: So, you're emulating the Postgresql approach of WAL-E?

[21:06:53] <GothAlice> (WAL-E being an S3 streaming oplog backup system.)

[21:07:04] <jto3> hah you saved me some reading

[21:07:07] <jto3> but an interesting project

[21:07:13] <jto3> yes, I upload oplog to S3

[21:07:50] <jto3> so when I create new instance from snapshot I've already have data from max,min last hour

[21:08:25] <jto3> and then I would like to replay oplog, and even when I do oplogLimit it still creates new index/files so instead of model_stage_002 db I have model_stage

[21:08:30] <jto3> duplicated data :/

[21:08:41] <GothAlice> Unfortunately, this approach is not supported by MongoDB. The usual approach is to have a replica set with one member hidden and marked un-votable (priority zero) as a backup. (This gives you the streaming oplog backup and a "secondary" that can be easily cloned.) http://docs.mongodb.org/manual/administration/backup/ enumerates the supported options.

[21:11:00] <GothAlice> At work our replica set has a secondary in a different datacenter, a secondary at the office, and a secondary at my home. (All three are non-electable hidden members for backup purposes.)

[21:11:37] <jto3> mhmm good that you have in 3 different location

[21:11:59] <jto3> btw do you snapshot hidden replica as well

[21:12:06] <GothAlice> Additionally, since my home is a block from the office, my home secondary is set to delay replication by 24 hours. (Just in case someone accidentally hits the "nuke it from orbit" button.)

[21:12:17] <jto3> or just check if hidden replica is still online, and if dies bring a new one

[21:13:12] <jto3> interesting

[21:13:32] <GothAlice> Our VM automation, when starting a new MongoDB shard, will automatically attempt to join an existing replica set, and if one does not exist it will pull the latest mongodump, apply that, and call for reinforcements. (I.e. two additional hosts need to be brought online to make a happy replica set.)

[21:15:10] <pbunny> any way to make async requests with mongo-cxx-driver?

[21:15:16] <pbunny> providing callback

[21:16:02] <jto3> yeah at least 3 nodes

[21:17:10] <GothAlice> jto3: "vm deploy role=mongo tag=HEAD cluster=brandnewclient rs=rs0 count=1" — If "brandnewclient" is a new cluster with no hosts, this will actually spawn three hosts.

[21:17:13] <jto3> GothAlice: thank you, you were very helpful

[21:17:48] <jto3> yeah I have a different challenge, customer is using AWS Cloudformation, so I cannot easily add / remove new hosts without breaking their deployment

[21:19:22] <jto3> furthermore they have script in cron that runs every 5 minutes, and reloads configuration from Zookeeper ;) madness

[21:19:26] <GothAlice> Ouchies.

[21:20:06] <GothAlice> Could be worse, they could be using buildout+puppet.

[21:20:08] <GothAlice> ;^)

[21:21:07] <jto3> ;) chef-solo haha

[21:21:48] <GothAlice> (All of my automation is event triggered; basically nothing other than monitoring tasks run on schedules like that. I push a change to the server repo, the VMs pull the change and automatically adjust as needed via this insane #truepowerofunix snippit: git diff --name-only @{1}.. | xargs qfile -Cq | xargs --no-run-if-empty equery files | grep init.d/ | sort -u)

[21:22:59] <GothAlice> (That finds all installed packages whose files were modified by the last pull, filtered to only those with init.d scripts, prior to looping over the result and calling conftest/reload/restart as appropriate.

[21:23:07] <jto3> ;) geek power 3

[21:23:35] <jto3> btw does mongo encrypts transfer of data to its members ?

[21:23:41] <GothAlice> Not unless you enable SSL.

[21:23:52] <GothAlice> Wait. Replica set members, or clients, or both?

[21:23:57] <jto3> or did you have to set up a VPN tunnel

[21:24:05] <jto3> replica set members

[21:24:08] <jto3> or hidden members

[21:26:51] <GothAlice> In the DC the members (and app clients) talk over an IPSec-secured private LAN. Outside the DC I use SSL. Additionally, cross-talk between replica members is authenticated using a large random shared secret.

[21:29:53] <deanclkclk> hey guys

[21:29:55] <deanclkclk> i have an issue

[21:30:06] <deanclkclk> I have a replica setup....one is primary and the other is slave

[21:30:24] <deanclkclk> I reinstall the slave box and specifiy mongo there to be primary

[21:30:49] <jto3> deanclkclk: only 1 slave ?

[21:31:06] <GothAlice> (primary/secondary are the usual terms; master/slave generally refers to the very old and deprecated method of redundancy)

[21:31:10] <deanclkclk> ok..it's a matter of swtiching primary box to become slave

[21:31:15] <deanclkclk> and slave the reverse

[21:31:16] <deanclkclk> by doing that

[21:31:34] <deanclkclk> I reinstall mongo as --primary on the new primary box

[21:31:45] <deanclkclk> and deleted the instance of mongo on the now slave box

[21:31:48] <deanclkclk> when I check mongo

[21:31:51] <GothAlice> deanclkclk: Not quite. There are several issues at play: first, you have two members. This means that when one goes offline the other has no way to know which side of the connection just died. ("Did *I* lose my connection, or did they?")

[21:31:58] <deanclkclk> the primary mongo is still saying secondary

[21:31:59] <jto3> haha I've remember some github issue when some group wanted to change naming convention from master/slave to primary/secondary

[21:32:22] <GothAlice> jto3: In MongoDB the different terms actually refer to different features. ;^) And yeah, I remember that ticket…

[21:33:45] <GothAlice> deanclkclk: The result of not knowing which side of the connection dropped is that the server will have no way to identify if it should take over primary duties. For a secondary to even try to become primary it must be able to see *greater than* 50% of the other hosts. (With two hosts, this can never happen.) When it fails, it'll enter read-only secondary mode and wait for the other host to come back online.

[21:34:07] <GothAlice> deanclkclk: When the other host does, the first one will see that it's data is newer and resume operation as the primary.

[21:34:15] <deanclkclk> it did

[21:35:12] <GothAlice> So, to resolve this little snag, you need an 'arbiter'.

[21:35:49] <GothAlice> A third machine that doesn't store data, but just lets the other two know they're alive. (And if one goes offline, the other can ask the arbiter to confirm.)

[21:36:31] <deanclkclk> this is my rs.status

[21:36:32] <deanclkclk> http://pastie.org/9745586

[21:37:10] <deanclkclk> so i'm saying i want to promote the secondary to be indeed primary

[21:37:14] <deanclkclk> in that rs.status()

[21:37:32] <deanclkclk> that makes sense? which is the LiveProd mongo to be primary

[21:37:52] <GothAlice> Note: two hosts in that list are unreachable (offline) — the secondary can't reach > 50% of the hosts and thus will never become primary.

[21:40:21] <jto3> I'm off as I live on old world and it's getting late, thanks again GothAlice ! tada

[21:43:49] <deanclkclk> ok so I don't get it

[21:44:03] <deanclkclk> all i'm trying to do is set liveProd to be primary

[21:44:10] <deanclkclk> and the other node to be secondary

[21:44:14] <deanclkclk> is that possible?

[22:03:00] <deanclkclk> ok so I have the then primary backup

[22:03:06] <deanclkclk> and I do the necessary reconfiguration

[22:03:09] <deanclkclk> i'm getting this

[22:03:09] <deanclkclk> "errmsg" : "exception: invalid parameter: expected an object (0)", "code" : 10065, "ok" : 0

[22:11:46] <joannac> deanclkclk: pastebin an rs.status()

[22:14:36] <joannac> also, what you typed to reconfigure

[22:22:27] <sparc> I heard that the MongoDB journal flushes to disk about every 60 seconds.

[22:22:50] <sparc> I do see that we have some files in the journal directory, that persist past this 60 second window

[22:22:54] <sparc> Is that a problem?

[22:23:04] <sparc> I would have thought that Mongo would delete them

[22:23:31] <sparc> There's just two though: j._2 and lsn

[22:25:21] <sparc> Trying to fend off questions from people about "What if this fills up the disk!?"

[22:25:46] <sparc> I'm like.. It will sync the journal to disk, every minute, chill out.

[22:26:11] <sparc> If we can write, 20G of data in 60 seconds, that would be amazing in itself.

[22:38:49] <joannac> sparc: the flush interval is at least every 60 seconds; the OS is free to flush more

[22:39:17] <sparc> Cool

[22:39:38] <joannac> I just checked my running instance and I have the same as you

[22:40:14] <joannac> given you only have j._2 I'm guessing your disks are at least reasonably fast

[22:40:16] <sparc> Since there's still a single journal file in my journals/ directory, do you think Mongo just keeps a single file around, and doesn't bother to reduce the size, until it has to make another?

[22:40:29] <roadrunneratwast> Is there a best practice for packing an ENUM into a set in MongoDB? EG: "A" = 0x0001; "B" = 0x0010; "C" = 0x0100; "D" = 0x1000. Then pack into 4 bits

[22:40:31] <joannac> How big is it?

[22:40:37] <sparc> I'm on amazon's EBS with 250 iops i think

[22:40:49] <roadrunneratwast> Is there a "set" type?

[22:41:04] <sparc> thanks joannac, happy holidays

[22:41:56] <joannac> sparc: actually, i confused myself with your question

[22:43:00] <joannac> flushes from journal to actual data files is every 100ms by default. flushes from the storage layer to disk is every 60 seconds (at least, maybe more often)

[22:43:05] <joannac> I need a coffee :(

[22:43:11] <roadrunneratwast> Is there a "bit" type?

[22:43:11] <sparc> ok

[22:49:36] <Derek57> I'm currently in the middle of creating a compound index with 8 keys. First is a bool, second a 2dsphere, last 6 are arrays. There's about 2.5 million keys in the collection. System has 32GB of ram and 8 virtual cores, and is currently using the minimal of that. Currently it's only done maybe 10,000 of the 2.5mil in the last 10 minutes. Is this normal? Any way to speed it up?

[22:53:27] <tim_t> java mongo question. i run a query and it returns a set of documents each with two fields A and B. Is there a query function I can call that returns a list of field A from each document instead of the entire documents?

[23:05:54] <joannac> tim_t: db.coll.find({query}, {A:1, _id:0})

[23:06:30] <joannac> Derek57: erm, you can't index more than one array

[23:06:39] <joannac> (in the same index)

[23:06:46] <Derek57> Ah, is that the reason?

[23:07:28] <Derek57> Even if inside the arrays it's the objects I want?

[23:15:56] <tim_t> thanks joannac. do you know the morphia equivalent of that?

[23:29:14] <Derek57> In a compound index, would it be smarter to put more 'simpler' values first? For example, put all the booleans first, then the ints, strings and arrays? Or does it really not matter?

[23:29:14] <Derek57> (I of course would alter my query to fit the new order)

[23:33:15] <Boomtime> Derek57: you should should put the highest cardinality fields first, it doesn't matter what they're type is

[23:34:00] <Boomtime> having an integer field first is pointless if you have 10 billion documents with the same value for that field

[23:38:36] <Derek57> Boomtime: Smart, got it! Should I order the entire index by cardinality? Or would it be smart to put something like a bool second that you drastically reduce the amount of documents returned?

[23:38:49] <Derek57> *that could drastically

[23:40:35] <Boomtime> Derek57: boolean fields are usually the worst field to add to an index ever

[23:41:02] <Derek57> Boomtime: Haha that bad, eh? Would 1s and 0s be a better idea?

[23:41:09] <Boomtime> a boolean doesn't consume a single bit - it takes a good few bytes to store, but it has a guaranteed cardinality to 2

[23:42:34] <Boomtime> highly variable strings like names are good, short identity type values like ObjectID) are king

[23:43:07] <Boomtime> the fastest query you will ever perform is on _id where you are using ObjectID

[23:43:53] <GothAlice> Gotta love that rapid descent to a b-tree leaf.

[23:44:34] <Derek57> Boomtime: Well that's something new, didn't realize ObjectID was that quick. Brings a couple new ideas to the table. Thanks for that. :)

[23:45:43] <GothAlice> Derek57: MongoDB will actually completely ignore indexes with low cardinality (i.e. half the records have true, half the records have false) because scanning the collection will take just as much time as using the index.

[23:46:11] <Boomtime> if there is an atlernative, yes

[23:46:42] <Boomtime> it will never _completely_ ignore an index, it's just that the race that occurs from time to time always loses for a bad index

[23:46:59] <Boomtime> so the query planner is never observed to select it

[23:47:02] <GothAlice> Boomtime: Twice in two weeks I've supported a user with a query that chose to use no indexes over a badly thought-out index.

[23:47:23] <Boomtime> yes, collection scan might be a better query plan

[23:47:49] <Boomtime> it was still not ignoring the index, it's just that the race always favoured the collection scan

[23:48:16] <GothAlice> Boomtime: One of those users had a perfectly good index and noticed MongoDB stopped using the index after an version bump. He had to force the hint to improve his query performance by 10x. That one was _really_ weird.

[23:48:18] <Boomtime> one day it might happen that the index wins, then you'll suddenly see a change in performance

[23:48:34] <GothAlice> Yeah, like that. ;)

[23:49:46] <Boomtime> 2.6 had some interesting bugs in the new query planner which caused some grief early on, most of those are fixed now it seems

[23:50:07] <GothAlice> I believe it resolved down to needing to reverse the sort order on one of the fields in the compound index.

[23:51:12] <Derek57> Mongo will favor collection search over a bool index, though for arguments sake would it still not be smarter to include it if it would be smack in the middle of a compound? Or am I just too tired to catch on to all of this right now? :P

[23:51:44] <Boomtime> generally, a boolean offers no advantage, use it if you have absolutely no choice

[23:52:23] <Boomtime> mongodb will NOT catergorically favour a collection scan over a boolean index, by definition, the race would be won by the index half of the time

[23:53:33] <Boomtime> and there is no randomness either, you can construct the dataset to do either

[23:53:50] <Boomtime> however, in a real world scenario you don't get to control your dataset

[23:54:21] <GothAlice> (General rule of life: never trust a dataset, even if it's yours.)

[23:56:49] <Derek57> Haha, fair enough, good rule to follow. And thanks for all the info Boom and Alice, I'm going to give this another try and see what kind of results I can reproduce. Thanks again. :)

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 26th of November, 2014