[00:13:33] <lgm> Is there anyone there who could look at a problem with mongo sharding setup?
[00:18:35] <tomas> hi, can anyone please advise if it's generally better to do agregation calculations in mongo or running find and using something like python pandas? I'm finding that converting find result from cursor to python list is what is taking most of the time
[00:23:41] <Boomtime> @tomas: "converting find result from cursor to python list is what is taking most of the time" -> how big is your result set?
[00:24:40] <Boomtime> also, how are you measuring that this conversion is what takes the most time?
[01:00:31] <Boomtime> @tomas: a cursor only contains the first 101 results when it returns
[01:00:58] <Boomtime> as you consume the items, the cursor goes back to the database to retrieve more
[01:01:47] <lgm> Some queries get answers here! Yay!
[01:01:57] <Boomtime> if you ask for a cursor to be converted to an array, how do you measure the conversion to an array versus the returning to the server for more results>
[01:02:37] <Boomtime> have you tried simply consuming all the results of the cursor without the array part?
[01:02:41] <lgm> i have what is probably a simple query about sharding, that's completely blocking me right now.
[01:12:13] <tomas> Boomtime: many thanks for you answer, I'm using pymongo, this is my find query: cursor = db['hits'].find(form, inc_field) I then use this to convert result to pandas dataframe: df = pd.DataFrame(list(cursor))
[01:13:01] <tomas> will my cursor contain 101 results or will it contain all?
[01:13:57] <Boomtime> when the "find" returns it will contain 101 results
[01:14:21] <Boomtime> if you consume those items, it will magically have more
[01:14:47] <Boomtime> in your case, the rest of the results are not retrieved until the list(cursor) call
[01:15:25] <Boomtime> measuring the length of time that takes includes the time it takes to get the vast majority of the results from the database since you are inexplicably returning millions of items
[01:17:41] <tomas> oh I see... so it's not that conversion from cursor to list takes time because of the data, it's because data is getting extracted from the db
[01:18:35] <Boomtime> it's because you requested a silly amount of data
[01:35:27] <tomas> is there any way it returns: { "_id" : "en-US", "hits" : 2567284, "id": 1}
[01:35:43] <tomas> and if there are multiple groups, increment that for each
[01:37:22] <tomas> also is there a way of renaming "_id" to say "lang" ?
[01:40:18] <Boomtime> yes, you can do all you just asked, however, i'm not about to give you personal training on something that is well documented: http://docs.mongodb.org/manual/reference/operator/aggregation/
[03:37:42] <oRiCLe> hi, i have a dev server running mongodb, replica set but only the one node for oplog access. I am currently getting an error "CAant write to a secondary" but there is no secondary, what could that be how can i check it??
[03:38:24] <Boomtime> where do you get that error?
[03:41:12] <Boomtime> otherwise, it will be the writeConcern setting in your connection settings
[03:42:17] <Boomtime> only the driver will have a problem with not writing to a secondary - and it will only have that problem if the writeConcern is set to 2 or more
[03:42:58] <oRiCLe> uses mongees node driver so no connection settings other than the url
[03:43:12] <oRiCLe> its an old app im trying to keep alive until the new one is finished
[03:51:54] <Boomtime> then it is most likely an app setting
[04:25:12] <tomas> Boomtime: can you please give me some hints on adding $inc to my query? Do I need to put $inc in my $group or $project bit to have id for each group starting with 1 and incrementing for each group?
[04:27:03] <Boomtime> you said you just wanted a generated field for an id type value, if you want a plain index counter then do it on the client
[04:27:37] <tomas> yea it's plain index counter, cool will do (;
[06:03:42] <WarAndGeese> I have a mongo database hosted somewhere else, I have the url and username and password I need to access it and all that
[06:03:56] <WarAndGeese> I was wondering what the easiest way to get data from it is
[06:04:20] <WarAndGeese> Like, I can create a ruby on rails application and host it and link it to the database for this simple task, but I'd rather not because that's a lot of work
[06:05:23] <WarAndGeese> PHP is usually pretty straightforward because you don't need to do much, you just go to wherever is hosting your websites and make a page called pagename.php, and that's it, but I don't know how to connect to an external database with PHP, some tutorials I found required installing packages and I don't know how to do that, I might take that route though
[06:05:32] <WarAndGeese> I should go to bed, I'm super tired
[06:05:39] <WarAndGeese> I'll continue this another day I guess
[09:07:23] <n^izzo> I'm building a node app where I need to store a polygon of GPS locations and test how if a point is in it and I'm not sure how to do this with mongoose
[11:12:37] <jdj_dk> anybody online who has used npm modules in a meteor package?
[11:50:27] <kas84> guys, does anybody know if mongodb reIndex when deleting data?
[11:51:19] <Derick> every insert, update or delete causes mongodb to update its index
[11:51:24] <kas84> I’m deleting like 200GB of data and see a 200% write locks even after it’s finished
[11:51:28] <Derick> there is no such thing as a reIndex
[11:53:22] <kas84> aha but as soon as it finishes deleting the data it shouldn’t do anything else, right?
[11:54:01] <Derick> no, but if you do deletions without acknowledgement (which is possible), it's not done deleting after the command returns immediately
[12:59:15] <oznt> which role I have to have in mongo in order to be able to create new databases?
[12:59:49] <oznt> I added a user admin with "userAdminAnyDatabase" but this user can't create new databases ...
[14:13:15] <derek-g> can I do replication across datacenters with mongo-db?
[14:15:01] <duncan> has anyone run into “DR102 too much data written uncommitted” followed shortly by a fatal crash “[repl writer worker 1] ERROR: writer worker caught exception: passes >= maxPasses in NamespaceDetails::cappedAlloc:”
[14:34:27] <voidDotClass> guys, is it better to make a connection to mongo and then maintain / re-use it throughout the application, or should you make a connection at the start of each request and then disconnect it?
[14:34:39] <voidDotClass> i'm only doing read only operations
[14:37:09] <jdj_dk> voidDotClass: I would reuse.. Does your connector lib not use a connection pool?
[14:37:33] <voidDotClass> yes it does. thanks. so when i do client.getDb, do i need to close that afterwards
[14:54:08] <voidDotClass> Should I cache the collection, or should i get the collection each time with each request?
[15:04:38] <cheeser> you should reuse the same MongoClient. closing a connection just returns it to the pool (which you should do)
[15:05:21] <cheeser> cacheing a collection doesn't really save you anything. iirc, the driver maintains an internal cache of any relevant metadata anyway.
[15:24:56] <s2013> dumb quesiton but since mongodb has no joins.. do you just fill in the userID on another table with the userID or is there some other method?
[15:26:14] <Eremiell> s2013: yes, you generally fill in the foreign value, unless you can collapse the second collection into this one.
[15:26:56] <Eremiell> if you pick something reasonable for id, you might often no need to read the second collection at all.
[15:26:57] <s2013> userid is usually string right? so i guess it would also be a string?
[18:30:27] <ctaloi> Hello - I’m attempting to update a field within an embedded document.. it’s a value in a list of dicts (python speak) when I set the value I lose the other dict in the list. Here’s the code and the document before and after http://pastie.org/9556370 - thoughts on what I might be doing wrong?
[18:31:34] <ctaloi> I basically want to update the dict that matches the “national_id” I already know but I’m not sure how to write that out properly
[18:32:04] <EmmEight> You can reference values like this translations.national_id
[18:32:46] <kfb4> ctaloi the core problem is that your $set is telling it what value to set for the whole array, not a member of the array.
[18:32:54] <ctaloi> I tried that first but I was getting an error that through me off - one sec, will grab that snippet.. sorry for not using the proper mongo nomenclature, bit of a newb
[18:32:56] <kfb4> i *think* this will do what you want, but i'm not 100% sure: http://docs.mongodb.org/manual/reference/operator/update/positional/
[18:34:54] <ctaloi> @EmmEight - using dot notation http://pastie.org/9556397
[18:35:22] <ctaloi> @kfb4 I was reading that earlier - wasn’t sure if I am on the right track
[18:36:30] <kfb4> ctaloi it's definitely the "<array>.$" syntax for the $update statement. The doc I linked to says that "$" will be the first member of the array that matches your query. However, you need to make sure that your query section includes the "translated_national_id" so it know which member of the array it's looking for.
[18:36:44] <kfb4> I haven't worked a lot with arrays, so i can't tell you the specific syntax.
[18:37:43] <ctaloi> @kfb4 thanks.. I’m wondering if what i’m doing is the _right_ way or if I should have these as separate referenced documents
[18:41:08] <ceejh> Some question about mongo clusters. I finally figured out my mongos wouldn't start up because I had auth=true in the config file. But, when you're trying to do user management, mongos needs a user that matches the admin on mongos, right?
[18:41:52] <kfb4> ctaloi possible. i couldn't tell you that for sure. i tend to not use array much, personally. the alternative could be to have a {national_id: translated_national_id} subdoc, but that comes with it's own set of problems depending on what characters could end up in that key name.
[18:45:45] <ctaloi> @kfb4 agree, the national_id will always be a unique ID and only the translated value would change.
[18:48:18] <ctaloi> Maybe it would be easier if I had the {national_id:translated_id} in a separate collection and include just the national_id as a ref in the account doc
[18:48:31] <ctaloi> @kfb4 just kinda thinking it through..
[20:12:54] <saml> how can I diagnoze mongodb health
[20:13:39] <EmmEight> Can you be more specific? What is wrong?
[20:30:09] <q85> Hello, I'm having an issue that is affecting write throughput of a long-running script. These http://pastie.org/9556678 queries pop up. They cause mongo to scan from disk a lot http://pastie.org/9556681. The odd thing is, we don't have a collections called "ns" in that database. I'm thinking the query is actually an internal mongo command (I suspect it is coming from one of our mongos instances). I'm trying to confirm this and find an ac
[20:34:05] <cheeser> "ns" is namespace. it's the "database/collection" combination.
[20:34:40] <cheeser> oh, i see what you're talking about maybe. the \u0002ns ?
[20:34:41] <sssilver> Hey guys, is it possible to _remove_ a SON manipulator?
[20:34:50] <sssilver> I only seem to find docs for add_son_manipulator
[20:38:20] <q85> cheeser: yes. The \u0002ns is evaluated to "ns". You can see "ns" listed at the top of the mongotop. We do NOT have a collection called "ns". I'm trying to find out what this thing is doing.
[20:38:54] <cheeser> \0002ns is not the same, no. but that's probably beside the point.
[20:39:41] <q85> What do you mean it is not the same?
[20:40:24] <cheeser> it might print the same but it's not the same.
[20:45:04] <q85> anyway, mongotop was collecting stats for 5 minutes. The host has 4 cores. 8179559ms is... unreasonable. Any ideas?
[20:46:09] <q85> I searched for that connection in the logs and came up with this: http://pastie.org/9556682
[20:49:59] <EmmEight> That tells me your logs are full and there is prolly a lot of disk operations going on
[20:56:46] <q85> The log is only 454M (rotated monthly). At this time of the day, this cluster should be upserting all the time to the insights collection (~400/sec). But, you are correct, there are a lot of operations going on. These operations should not be there. I'm trying to figure out why there are there.