pmxbot IRC Log Viewer

[00:13:11] <lgm> Hello!

[00:13:33] <lgm> Is there anyone there who could look at a problem with mongo sharding setup?

[00:18:35] <tomas> hi, can anyone please advise if it's generally better to do agregation calculations in mongo or running find and using something like python pandas? I'm finding that converting find result from cursor to python list is what is taking most of the time

[00:23:41] <Boomtime> @tomas: "converting find result from cursor to python list is what is taking most of the time" -> how big is your result set?

[00:24:40] <Boomtime> also, how are you measuring that this conversion is what takes the most time?

[00:36:16] <tomas> Boomtime: millions

[00:36:52] <tomas> I'm using linux time also with Timer() as t: http://preshing.com/20110924/timing-your-code-using-pythons-with-statement/

[00:37:32] <tomas> I'm storing hits to urls in mongo

[00:38:18] <tomas> kind of diy web analytics based on what strings urls contain

[00:40:05] <tomas> 1 document = 1 hit

[01:00:31] <Boomtime> @tomas: a cursor only contains the first 101 results when it returns

[01:00:58] <Boomtime> as you consume the items, the cursor goes back to the database to retrieve more

[01:01:47] <lgm> Some queries get answers here! Yay!

[01:01:57] <Boomtime> if you ask for a cursor to be converted to an array, how do you measure the conversion to an array versus the returning to the server for more results>

[01:02:37] <Boomtime> have you tried simply consuming all the results of the cursor without the array part?

[01:02:41] <lgm> i have what is probably a simple query about sharding, that's completely blocking me right now.

[01:12:13] <tomas> Boomtime: many thanks for you answer, I'm using pymongo, this is my find query: cursor = db['hits'].find(form, inc_field) I then use this to convert result to pandas dataframe: df = pd.DataFrame(list(cursor))

[01:13:01] <tomas> will my cursor contain 101 results or will it contain all?

[01:13:57] <Boomtime> when the "find" returns it will contain 101 results

[01:14:21] <Boomtime> if you consume those items, it will magically have more

[01:14:47] <Boomtime> in your case, the rest of the results are not retrieved until the list(cursor) call

[01:15:25] <Boomtime> measuring the length of time that takes includes the time it takes to get the vast majority of the results from the database since you are inexplicably returning millions of items

[01:17:41] <tomas> oh I see... so it's not that conversion from cursor to list takes time because of the data, it's because data is getting extracted from the db

[01:18:35] <Boomtime> it's because you requested a silly amount of data

[01:18:43] <tomas> (:

[01:18:47] <Boomtime> how big are the documents?

[01:19:05] <tomas> tiny

[01:19:23] <Boomtime> how tiny?

[01:20:09] <Boomtime> let's say the docs are 100 bytes

[01:20:28] <Boomtime> and you return 2 million of them (the lowest version of the plural "millions")

[01:20:34] <tomas> > Object.bsonsize(db.hits.findOne({lang:"en-US"}))

[01:20:34] <tomas> 307

[01:20:38] <Boomtime> you just sent 190MB across the network

[01:21:35] <Boomtime> congratulations, you just sent 600MB across the network and through various buffers and handling, etc

[01:21:56] <Boomtime> i hope you make good use of it

[01:22:20] <Boomtime> how often do you perform this operation?

[01:23:28] <Boomtime> @lgm: what is your question about sharding?

[01:23:32] <tomas> quite often... I tried using db['hits'].aggregate([{"$match" :{...}, { "_id" : "$lang" , "hits" : { "$sum" : 1 } }])

[01:23:40] <tomas> and that's much faster

[01:24:16] <Boomtime> that's a group at the end?

[01:24:23] <tomas> yea

[01:24:44] <Boomtime> OMG you only want a sum of one field in the documents involved?

[01:24:53] <tomas> yea

[01:25:14] <Boomtime> @lgm: what is your question about sharding?

[01:25:41] <tomas> Boomtime: you recommend I do grouping?

[01:26:01] <Boomtime> of course

[01:26:19] <tomas> (: I kind of thought something isn't quite right here...

[01:26:50] <Boomtime> the aggregation framework can be incredibly efficient for certain tasks, yours is a perfect fit

[01:26:53] <tomas> is there any way I could add id's 1,2,3,4,... to every grouped result?

[01:27:41] <tomas> I can do it later just by running a loop, but I wonder if mongo can do that for me

[01:27:58] <Boomtime> your question needs more definition

[01:28:22] <tomas> sure 1 sec.

[01:35:04] <tomas> > db.hits.aggregate([ {$match: { lang : "en-US" }}, {$group: { _id : "$lang", hits: { $sum : 1 } }} ]);

[01:35:07] <tomas> { "_id" : "en-US", "hits" : 2567284 }

[01:35:27] <tomas> is there any way it returns: { "_id" : "en-US", "hits" : 2567284, "id": 1}

[01:35:43] <tomas> and if there are multiple groups, increment that for each

[01:37:22] <tomas> also is there a way of renaming "_id" to say "lang" ?

[01:40:18] <Boomtime> yes, you can do all you just asked, however, i'm not about to give you personal training on something that is well documented: http://docs.mongodb.org/manual/reference/operator/aggregation/

[01:41:04] <tomas> (: thanks

[01:41:30] <tomas> mostly thanks for clarifying aggregate vs find

[01:41:37] <Boomtime> goodo

[03:37:42] <oRiCLe> hi, i have a dev server running mongodb, replica set but only the one node for oplog access. I am currently getting an error "CAant write to a secondary" but there is no secondary, what could that be how can i check it??

[03:38:24] <Boomtime> where do you get that error?

[03:38:41] <oRiCLe> in my node app on an update

[03:38:57] <Boomtime> what is your writeConcern?

[03:40:14] <oRiCLe> my what?

[03:40:50] <Boomtime> in your connection-string, it would be the "w" parameter if present

[03:40:56] <oRiCLe> not present

[03:41:12] <Boomtime> otherwise, it will be the writeConcern setting in your connection settings

[03:42:17] <Boomtime> only the driver will have a problem with not writing to a secondary - and it will only have that problem if the writeConcern is set to 2 or more

[03:42:58] <oRiCLe> uses mongees node driver so no connection settings other than the url

[03:43:12] <oRiCLe> its an old app im trying to keep alive until the new one is finished

[03:43:26] <Boomtime> ok, what is the url then

[03:43:45] <oRiCLe> mongodb://localhost/portal

[03:43:59] <Boomtime> oh wait, it will also say that error if your single node is primary for some reason

[03:44:11] <Boomtime> and you are not using a replica-set connection so there's that

[03:44:26] <Boomtime> is _NOT_ primary

[03:44:48] <oRiCLe> okay, must have been in a mongo update or something cos it was working, how can i fix that

[03:44:51] <Boomtime> connect to your mongodb using the mongo shell and do rs.status()

[03:45:03] <Boomtime> i suspect you somehow have no primary

[03:46:05] <oRiCLe> there is a myState: 1 for the set, and members "stateStr": "PRIMARY" and self: true

[03:46:50] <Boomtime> so try an update via the shell

[03:51:05] <oRiCLe> hmm works via shell

[03:51:09] <oRiCLe> just not in the app

[03:51:54] <Boomtime> then it is most likely an app setting

[04:25:12] <tomas> Boomtime: can you please give me some hints on adding $inc to my query? Do I need to put $inc in my $group or $project bit to have id for each group starting with 1 and incrementing for each group?

[04:27:03] <Boomtime> you said you just wanted a generated field for an id type value, if you want a plain index counter then do it on the client

[04:27:37] <tomas> yea it's plain index counter, cool will do (;

[04:27:42] <tomas> thanks

[04:28:04] <Boomtime> unless you sort by some field afterwards the order is not relevant, so the index isn't either

[04:48:43] <tomas> I use id's label donut chart because full strings won't fit

[05:50:38] <WarAndGeese_> Hi

[05:51:10] <WarAndGeese> I have a database hosted on mongohq, now known as compose.io

[05:51:20] <WarAndGeese> What's the easiest way to have a web page display content on that database?

[05:51:31] <WarAndGeese> I was thinking of making something in php, but I don't know if that's the easiest way

[05:54:27] <WarAndGeese> Is anyone here?

[05:56:14] <Boomtime> do you have a mongodb question?

[05:56:29] <Boomtime> also, hello!

[06:03:19] <WarAndGeese> Hi

[06:03:24] <WarAndGeese> I was about to go to bed

[06:03:42] <WarAndGeese> I have a mongo database hosted somewhere else, I have the url and username and password I need to access it and all that

[06:03:56] <WarAndGeese> I was wondering what the easiest way to get data from it is

[06:04:20] <WarAndGeese> Like, I can create a ruby on rails application and host it and link it to the database for this simple task, but I'd rather not because that's a lot of work

[06:05:23] <WarAndGeese> PHP is usually pretty straightforward because you don't need to do much, you just go to wherever is hosting your websites and make a page called pagename.php, and that's it, but I don't know how to connect to an external database with PHP, some tutorials I found required installing packages and I don't know how to do that, I might take that route though

[06:05:32] <WarAndGeese> I should go to bed, I'm super tired

[06:05:39] <WarAndGeese> I'll continue this another day I guess

[06:05:40] <WarAndGeese> goodnight

[06:06:29] <Boomtime> ok then

[08:30:28] <kas84> hi there!

[09:05:43] <n^izzo> test

[09:06:06] <n^izzo> /msg NickServ identify ww2wf1939t1945

[09:06:38] <n^izzo> test

[09:07:23] <n^izzo> I'm building a node app where I need to store a polygon of GPS locations and test how if a point is in it and I'm not sure how to do this with mongoose

[09:20:54] <cofeineSunshine> n^izzo: [12:02] < n^izzo> /msg NickServ identify ww2wf1939t1945

[09:20:57] <cofeineSunshine> :)

[09:21:00] <cofeineSunshine> that's too bad

[09:21:01] <cofeineSunshine> change it

[09:21:14] <cofeineSunshine> i bet it's yours email passwd also:)

[09:23:40] <n^izzo> lol ty cofeineSunshine

[09:30:57] <cofeineSunshine> kek, WW2:D

[11:12:37] <jdj_dk> anybody online who has used npm modules in a meteor package?

[11:50:27] <kas84> guys, does anybody know if mongodb reIndex when deleting data?

[11:51:19] <Derick> every insert, update or delete causes mongodb to update its index

[11:51:24] <kas84> I’m deleting like 200GB of data and see a 200% write locks even after it’s finished

[11:51:28] <Derick> there is no such thing as a reIndex

[11:53:22] <kas84> aha but as soon as it finishes deleting the data it shouldn’t do anything else, right?

[11:54:01] <Derick> no, but if you do deletions without acknowledgement (which is possible), it's not done deleting after the command returns immediately

[11:55:29] <kas84> aha

[11:55:40] <kas84> that could be it!

[12:59:15] <oznt> which role I have to have in mongo in order to be able to create new databases?

[12:59:49] <oznt> I added a user admin with "userAdminAnyDatabase" but this user can't create new databases ...

[14:13:15] <derek-g> can I do replication across datacenters with mongo-db?

[14:15:01] <duncan> has anyone run into “DR102 too much data written uncommitted” followed shortly by a fatal crash “[repl writer worker 1] ERROR: writer worker caught exception: passes >= maxPasses in NamespaceDetails::cappedAlloc:”

[14:34:27] <voidDotClass> guys, is it better to make a connection to mongo and then maintain / re-use it throughout the application, or should you make a connection at the start of each request and then disconnect it?

[14:34:39] <voidDotClass> i'm only doing read only operations

[14:37:09] <jdj_dk> voidDotClass: I would reuse.. Does your connector lib not use a connection pool?

[14:37:33] <voidDotClass> yes it does. thanks. so when i do client.getDb, do i need to close that afterwards

[14:54:08] <voidDotClass> Should I cache the collection, or should i get the collection each time with each request?

[15:03:04] <cheeser> which driver?

[15:03:17] <voidDotClass> java

[15:04:38] <cheeser> you should reuse the same MongoClient. closing a connection just returns it to the pool (which you should do)

[15:05:21] <cheeser> cacheing a collection doesn't really save you anything. iirc, the driver maintains an internal cache of any relevant metadata anyway.

[15:06:51] <voidDotClass> got it, thanks

[15:24:56] <s2013> dumb quesiton but since mongodb has no joins.. do you just fill in the userID on another table with the userID or is there some other method?

[15:26:14] <Eremiell> s2013: yes, you generally fill in the foreign value, unless you can collapse the second collection into this one.

[15:26:56] <Eremiell> if you pick something reasonable for id, you might often no need to read the second collection at all.

[15:26:57] <s2013> userid is usually string right? so i guess it would also be a string?

[15:27:03] <s2013> what do you mean?

[15:27:21] <Eremiell> _id can be anything, t just have to be unique.

[15:28:14] <s2013> yeah i understand that part

[15:28:29] <s2013> but what do you mean by the "often no need to read hte second collection at all"

[15:29:17] <Eremiell> if all you need is username and ou use that as _id...

[15:33:49] <s2013> k

[18:30:27] <ctaloi> Hello - I’m attempting to update a field within an embedded document.. it’s a value in a list of dicts (python speak) when I set the value I lose the other dict in the list. Here’s the code and the document before and after http://pastie.org/9556370 - thoughts on what I might be doing wrong?

[18:30:32] <ctaloi> using pymongo

[18:31:34] <ctaloi> I basically want to update the dict that matches the “national_id” I already know but I’m not sure how to write that out properly

[18:32:04] <EmmEight> You can reference values like this translations.national_id

[18:32:46] <kfb4> ctaloi the core problem is that your $set is telling it what value to set for the whole array, not a member of the array.

[18:32:54] <ctaloi> I tried that first but I was getting an error that through me off - one sec, will grab that snippet.. sorry for not using the proper mongo nomenclature, bit of a newb

[18:32:56] <kfb4> i *think* this will do what you want, but i'm not 100% sure: http://docs.mongodb.org/manual/reference/operator/update/positional/

[18:34:54] <ctaloi> @EmmEight - using dot notation http://pastie.org/9556397

[18:35:22] <ctaloi> @kfb4 I was reading that earlier - wasn’t sure if I am on the right track

[18:36:30] <kfb4> ctaloi it's definitely the "<array>.$" syntax for the $update statement. The doc I linked to says that "$" will be the first member of the array that matches your query. However, you need to make sure that your query section includes the "translated_national_id" so it know which member of the array it's looking for.

[18:36:44] <kfb4> I haven't worked a lot with arrays, so i can't tell you the specific syntax.

[18:37:43] <ctaloi> @kfb4 thanks.. I’m wondering if what i’m doing is the _right_ way or if I should have these as separate referenced documents

[18:41:08] <ceejh> Some question about mongo clusters. I finally figured out my mongos wouldn't start up because I had auth=true in the config file. But, when you're trying to do user management, mongos needs a user that matches the admin on mongos, right?

[18:41:52] <kfb4> ctaloi possible. i couldn't tell you that for sure. i tend to not use array much, personally. the alternative could be to have a {national_id: translated_national_id} subdoc, but that comes with it's own set of problems depending on what characters could end up in that key name.

[18:45:45] <ctaloi> @kfb4 agree, the national_id will always be a unique ID and only the translated value would change.

[18:48:18] <ctaloi> Maybe it would be easier if I had the {national_id:translated_id} in a separate collection and include just the national_id as a ref in the account doc

[18:48:31] <ctaloi> @kfb4 just kinda thinking it through..

[20:12:54] <saml> how can I diagnoze mongodb health

[20:12:57] <saml> performance too poor

[20:13:39] <EmmEight> Can you be more specific? What is wrong?

[20:30:09] <q85> Hello, I'm having an issue that is affecting write throughput of a long-running script. These http://pastie.org/9556678 queries pop up. They cause mongo to scan from disk a lot http://pastie.org/9556681. The odd thing is, we don't have a collections called "ns" in that database. I'm thinking the query is actually an internal mongo command (I suspect it is coming from one of our mongos instances). I'm trying to confirm this and find an ac

[20:34:05] <cheeser> "ns" is namespace. it's the "database/collection" combination.

[20:34:40] <cheeser> oh, i see what you're talking about maybe. the \u0002ns ?

[20:34:41] <sssilver> Hey guys, is it possible to _remove_ a SON manipulator?

[20:34:50] <cheeser> a what?

[20:34:50] <sssilver> I only seem to find docs for add_son_manipulator

[20:38:20] <q85> cheeser: yes. The \u0002ns is evaluated to "ns". You can see "ns" listed at the top of the mongotop. We do NOT have a collection called "ns". I'm trying to find out what this thing is doing.

[20:38:54] <cheeser> \0002ns is not the same, no. but that's probably beside the point.

[20:39:41] <q85> What do you mean it is not the same?

[20:40:13] <cheeser> "ns" != "\0002ns" fwiw

[20:40:24] <cheeser> it might print the same but it's not the same.

[20:45:04] <q85> anyway, mongotop was collecting stats for 5 minutes. The host has 4 cores. 8179559ms is... unreasonable. Any ideas?

[20:46:09] <q85> I searched for that connection in the logs and came up with this: http://pastie.org/9556682

[20:49:59] <EmmEight> That tells me your logs are full and there is prolly a lot of disk operations going on

[20:56:46] <q85> The log is only 454M (rotated monthly). At this time of the day, this cluster should be upserting all the time to the insights collection (~400/sec). But, you are correct, there are a lot of operations going on. These operations should not be there. I'm trying to figure out why there are there.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 15th of September, 2014