pmxbot IRC Log Viewer

[01:38:44] <nicken> if I'm referencing a doument in a different collection, should I use manual IDs or DBRefs?

[01:40:15] <cheeser> either one works

[01:40:33] <nicken> also, on a side note, I'm using collections as a sort of way to differentiate between types of documents

[01:40:42] <nicken> e.g. I'll have a users collection, articles collection, etc.

[01:40:49] <nicken> kind of in the same way as you would tables in SQL DBs

[01:41:04] <nicken> is that abusing the idea of collections?

[01:41:10] <cheeser> DBRefs keep explicit track of the remove collection but if it's always to the one collection, save some space and just use the ID values

[01:41:25] <cheeser> not necessarily no.

[01:41:41] <nicken> is there a different, preferred way?

[01:42:44] <cheeser> for data modeling? no. it depends on usage patterns and such

[03:07:09] <sinelaw> hi, I have about 2GB of json data. When I import it into mongo it takes up 20GB of space. Is that normal?

[03:07:30] <retran> dont spend time thinking about such things

[03:07:42] <retran> if you're using the database

[03:07:58] <retran> if you're coding the source code for mongo-dameon... sure

[03:08:19] <retran> your job as a db user, is to provide whatever space it needs for the data you give it

[03:08:56] <retran> "Is that normal?", could be.

[03:09:02] <retran> doesn't matter

[03:10:07] <sinelaw> retran the problem is "as a db user" that my machine can't possibly keep 20gb in memory, and it looks like mongo hits the disk way to often. I'm trying to understand if these numbers are "the usual" or is there is some config I should know about

[03:10:22] <sinelaw> or maybe mongo isn't a good match for this use case

[03:10:57] <retran> what does 20GB in memory have to do with anything

[03:11:25] <retran> stop looking at disk hits and disk space usage

[03:11:33] <retran> that's idiotic way to DBA

[03:11:52] <retran> same thing if it was mysql

[03:12:04] <retran> or any db system

[03:12:20] <retran> you should be benchmarking by using it, seeing speed of results

[03:12:33] <retran> and how it handles under load / multiple connections

[03:12:44] <sinelaw> ok, the speed sucks. now what?

[03:13:05] <retran> what kind of system

[03:13:09] <retran> are you on ec2

[03:13:17] <retran> i've never seen speed sucking on mongo

[03:13:31] <sinelaw> no. my laptop.

[03:13:33] <sinelaw> core i5 with about 6gb ram

[03:13:44] <retran> it sounds like you got wrong fields indexed

[03:13:51] <retran> for the finds you're doing

[03:14:02] <retran> so you might have tons of needles index data

[03:14:22] <retran> anyway.. this is pointless speculation

[03:14:31] <retran> you should pastie examples of schema/data and finds

[03:15:43] <sinelaw> ok. thanks, I'll start by reading more about indexes.

[03:15:55] <retran> you could pastie sample data inserts

[03:16:02] <retran> and sample finds you're doing

[03:16:18] <retran> and then also pastie .getIndexes() of colllection

[03:16:45] <retran> some types of finds can't be indexed, also

[03:17:08] <retran> it would help you to pastie what i suggested

[03:17:19] <retran> someone (if i'm not here) could point you in right direction

[03:17:27] <sinelaw> thanks, I'll do that as soon as my machine gets unstuck. probably not anytime soon ;)

[03:17:28] <retran> you'd have same issue in any database system

[03:17:35] <retran> ok, no prob man

[03:35:13] <Jules`> hey guys, anyone know how to perform a collection.copyTo('target') via the node driver please?

[03:35:35] <Jules`> I see that there's a db.command() function, but I'm not if that's relevant and the documentation is super thin

[03:57:54] <joannac> Jules`: if you type db.coll.copyTo

[03:57:59] <joannac> (leave out the brackets)

[03:58:33] <joannac> you can see how it's implemented, and can translate it to Node

[04:22:09] <Soothsayer> I want to keep track of a Customer Session in my e-commerce site (for e.g User X viewed Product A, then Product B, then Category C, Loggedin, Added to Cart, etc).. Should I be storing one document per event or create a document per hour with an array of events?

[04:22:27] <Soothsayer> one document per hour per customer session*

[05:48:26] <nicken> aren't DBRefs good to use for efficiency?

[05:48:55] <nicken> I guess what I'm wondering is, do they provide any advantage over manual IDs?

[06:30:22] <Soothsayer> nicken: how do you define efficiency?

[06:30:34] <Soothsayer> nicken: if you want to be efficient, dbrefs might not have an advantage over simple references

[06:30:56] <nicken> do DBRefs reduce the number of queries I need to write?

[06:31:07] <Soothsayer> nicken: what language are you using?

[06:31:10] <nicken> or rather, the amount of code I need to write?

[06:31:11] <nicken> node.js

[06:31:17] <Soothsayer> use an ODM

[06:31:26] <Soothsayer> mongoose or whatever exists

[06:31:37] <Soothsayer> the code you will have to write will remain the same then.

[06:31:44] <nicken> I'll consider it, but I'm getting by for now without one.

[06:33:28] <proteneer> how many nodes have you guys been able to scale out to?

[06:33:35] <proteneer> and how big is your dataset?

[06:39:24] <Soothsayer> nicken: got disconnected

[06:39:32] <Soothsayer> i don't see why you wouldn't use an ODM

[07:31:27] <c3l> I have two models, objA and objB, they both reference each other. When I create a new objA a new objB should have been created, whose id should be put in objA.refB, and the id of objA should be put int objB.refA. Can I do this with less than 3 database requests? :)

[08:39:10] <KamZou> Hi, when i've the following entries : http://pastebin.com/yJucA887 What request could i type to list those lower to the date : 20140101 please ?

[09:11:33] <kas84> Hi, anybody up?

[09:12:24] <kas84> I am having trouble with an aggregate query that takes an undetermitistic amount of time to execute depending on wether I make other queries to the mongodb or not

[09:12:34] <kas84> doesn’t make a lot of sense to me :/

[09:12:47] <Nodex> perhaps pastebin your uery

[09:12:48] <kas84> here’s my query http://hastebin.com/qimihexiru.bash

[09:12:49] <Nodex> query

[09:13:11] <kas84> there you go Nodex, thanks!

[09:13:33] <Nodex> are your other queries writing to the db?

[09:14:46] <kas84> hang on a sec, please

[09:17:46] <kas84> it’s the same query against other collections

[09:18:20] <kas84> what’s causing it to take an enormous amount of time

[09:18:54] <kas84> I have 20 million documents on the db

[09:19:16] <kas84> I mean on each collection

[09:20:18] <kas84> it’s taking 7 seconds for everytime I query after I restart mongo

[09:20:53] <kas84> if I repeat the query right after I queried the same collection it takes 4 seconds, so it’s caching it somehow

[09:21:15] <Nodex> yes the OS is ejecting the RU

[09:21:17] <Nodex> LRU

[09:21:24] <kas84> and if I do the same query against any other collection and then back to the original one

[09:21:29] <kas84> it’s taking 35 seconds

[09:21:42] <Nodex> how much ram does your system have>

[09:22:10] <kas84> 16GB

[09:22:22] <Nodex> how large is your working set?

[09:22:49] <kas84> you mean how large is the collection?

[09:22:56] <Nodex> http://docs.mongodb.org/manual/faq/diagnostics/#what-is-working-set-and-how-can-i-estimate-its-size

[09:23:04] <kas84> okay thanks

[09:26:54] <KamZou> Since i did a find request, my db has a 'null' size when i type "show dbs;' any idea ?

[09:27:34] <kas84> there you go: http://hastebin.com/sabitirudi.coffee

[09:27:45] <kas84> Nodex

[09:39:11] <Nodex> "mappedWithJournal" : 159732 <---- I think your system is thrashing when you're doing the aggregations

[09:44:44] <KamZou> Nodex, do you know how could i construct a request to get elements lower than a specific date for "max" element when with these : http://pastebin.com/yJucA887 ?

[09:57:39] <Nodex> I don't even know what your data is supposed to be. Can you format it?

[10:23:14] <future28> Hi there, say I have a file full of ID's that I wish to find the address for in my database, is there a way to use mongo to iterate these ID's? Or shall I just write a script to output queries?

[10:23:53] <Vejeta> Hello, I have a question about django and mongodb. Is this a good place to ask?

[10:24:36] <future28> So I have in my database: UserID and EmailAddr - I collect the user ID's that log into the site within the last 2 weeks and wanting to send them all an email so I need to get the email address from the DB. My file full of UserID is just new line delimeted

[10:31:17] <harryy> Hey. :) I realise MongoDB always listens on a Unix socket - but I can't find a way to connect to a unix socket using the mongoshell - is this supported?

[10:45:56] <Number6> harryy: Yes, it can be done

[10:46:04] <Number6> harryy: mongo --host /tmp/mongodb-27017.soc

[10:46:11] <Number6> harryy: You might need to be root

[10:48:03] <harryy> Ahh, brilliant

[10:48:25] <Zelest> .soc?!

[10:48:30] <Zelest> .sock if anything :P

[10:48:45] <Zelest> oh, nvm me

[10:48:56] <Zelest> thought you were talking about the server

[10:49:33] <Number6> Zelest: Copy/paste fail on my part

[10:49:48] <Zelest> Hehe

[10:50:00] <Zelest> the real path is /tmp/mongo<tab> ;)

[10:50:45] <harryy> i changed my pidfile path to /var/run/mongodb :P

[10:50:52] <harryy> and socketfile path*

[10:56:29] <Zelest> harryy, ps auxwww | grep mongod | grep -oe '[a-z\.\/\-\_]\+\.sock'

[10:56:32] <Zelest> harryy, does that work? :)

[10:57:42] <harryy> unfortunately not :(

[10:57:51] <harryy> i'm not passing it through the command line anyway, it's through a config file

[10:58:09] <Zelest> ah :/

[10:58:22] <harryy> i'm sure you could do something like that with netstat though ;)

[10:58:55] <Zelest> different kinds of output sadly

[11:04:00] <harryy> :[

[11:04:01] <harryy> freebsd?

[11:07:45] <Guest19432> Will indexing sub documents result in compound indexes?

[11:09:38] <Vejeta> What's the best subdocument structure to query django-tastypie with mongoDB as backend? https://stackoverflow.com/questions/22792828/best-subdocument-structure-to-query-django-tastypie-with-mongodb-backend Hopefully, someone here knows :)

[11:44:06] <KamZou> Nodex, http://pastebin.com/F57bB3m4

[11:44:21] <KamZou> Here the data is formatted

[12:13:12] <harryy> welcome back harry1

[13:28:35] <_boot> is there a way to increment date fields in an update?

[13:29:28] <Nodex> sounds like a job for your app to me

[13:29:57] <_boot> that will take far too long |:

[13:30:05] <Zelest> https://ifconfig.se/images/app.png :D

[13:30:06] <_boot> need to do the whole db

[13:31:12] <harryy> aha, Zelest

[13:31:46] <Zelest> also, if you dig through /images/, heads up, plenty of nsfw there..

[13:31:48] <Zelest> just so you know

[13:36:02] <slap_stick> hey i have set auth to true in mongo yet i am still able to login without any authentication, is there something else that is required? i'd kind of have presumed that it would just fail if i made a connection unless credentials were supplied

[13:41:16] <harryy> slap_stick: there's a localhost exception by default

[13:41:25] <harryy> i think it's removed after the first user is created, not sure

[13:41:48] <harryy> or after mongo is restarted, idk

[13:42:33] <slap_stick> yeh it is still allowing me to login remotely without any auth

[13:42:39] <slap_stick> and mongo has been restarted

[13:42:46] <slvrbckt> hey guys, i was wondering if there was a way to query for records /after/ a specific _id?

[13:42:49] <slap_stick> well to /test and then i can use db and switch db's

[13:43:01] <slvrbckt> ie. sorted by date, give me 5 records that occur after X id

[13:49:36] <vparham> @slap_stick, can you actually see the dbs (show dbs) or see any data (show collections, db.collection.find())?

[13:51:56] <slap_stick> if i do show dbs i get unauthorized or if i run any queries then yeh it wont work

[13:52:37] <vparham> That is similar to the behavior I see using keyfile auth. I couldn't find anything in the doc but assume it's "as intended" behavior.

[13:52:45] <vparham> Would love to know otherwise.

[13:53:50] <slap_stick> yeh kind of odd behavior, presumed it just would fail to connect completely as oppose allow a connection through

[14:28:59] <balboah> anyone know why mongorestore has chosen to restore indexes as the last step? is it a pro or was that just how it ended up being?

[14:30:17] <kali> balboah: it's just more efficient this way

[14:30:58] <skot> That is how it is designed for various reasons, mostly due to efficiency and optimizations.

[14:33:31] <balboah> ok

[14:38:15] <cheeser> if you had to update the indexes with every write, it'd be really, really slow

[14:38:30] <cheeser> even in the sql world, disabling indexes before imports is recommended.

[14:46:55] <MattBo> hi all!

[14:48:00] <MattBo> I think I have a simple question, but new to mongo and kinda new to document databases. need a little help seeing if this is a problem mongo can even solve or if I write my own...

[14:48:53] <MattBo> let's say I've got a whole buttload of these types of documents: { _id:"asdf", people:["matt","steve","mark"] }

[14:49:51] <MattBo> now, I want to know what _ids "matt" has access to, but doing a contains query against all of these is inefficient, I'd rather like to map the documents to something like: { person:"matt", ids:["asdf","asdf2","asdf3"] }

[14:50:10] <MattBo> so, basically, I want to map the original document to a set of new documents and "flip" the relationship.

[14:50:38] <MattBo> I feel like this is something a map/reduce could do, but I'm not sure how I could handle this. the alternative is I have to write a routine that manages the second documents when the first one is edited.

[14:51:13] <kali> first, forget about map/reduce. this is achievable with the aggregation pipeline

[14:51:24] <MattBo> oooh, please tell me more! =-}

[14:51:39] <kali> well, google it :)

[14:52:05] <MattBo> Looking at the docs on docs.mongodb.com but not quite seeing how I'd do it

[14:52:23] <MattBo> so the match phase... I want all docs, so there isn't much of a match

[14:52:37] <kali> http://docs.mongodb.org/manual/core/aggregation-pipeline/ look for $unwind and $group

[14:52:54] <kali> and $match, indeed

[14:53:00] <MattBo> k, thanks!

[14:54:07] <kali> secondly, if this is an occasional query, aggregation pipeline (with an index on people) may be enough. But if this is a frequent query, you need to maintain the denormalisation at write-time.

[14:54:44] <MattBo> ok, that's what I was thinking of... the initial document is strictly for management, the second document would be queried pretty often

[14:54:48] <kali> basically, you have to pay for it at each read, or at each write. if your app is read intensive, you'll probably want to move this a write time

[14:54:58] <kali> well, then, here you go.

[14:55:06] <MattBo> yup, sounds like I have to manage this myself

[14:55:11] <MattBo> ok, thanks!

[14:55:48] <kali> MattBo: also, have a look at this: http://blog.mongodb.org/post/65612078649/schema-design-for-social-inboxes-in-mongodb

[14:56:08] <kali> MattBo: it's not stricly the problem you're trying to solve, but it may help you figure it out

[14:56:59] <MattBo> k, thanks

[14:58:27] <kali> to be honest, I think there is a theorically possible optimisation with a covered index in your case, but I think the optimiser does not know how to perform it yet

[15:00:40] <MattBo> kali, thanks for the help, I think this puts me in a great direction.

[15:00:53] <kali> MattBo: you may want to vote for SERVER-4463 :)

[15:01:03] <kali> MattBo: https://jira.mongodb.org/browse/SERVER-4463

[15:18:56] <pradeepc_> Hi, I want to understand mongotop output.

[15:27:50] <Nodex> cool

[15:31:53] <pradeepc_> I am running mongotop 1 but the total time it is showing is greater than 1

[15:35:36] <Katafalkas> Hey, is there a way to install only mongos on a server using official 10-gen distribution ?

[15:39:01] <pradeepc_> anyone can please explain me mongotop output

[15:40:34] <andrewferk> Yesterday, a co-worker and I were benchmarking Mongo with bulk importing. We started by running the node import script and mongo on the same vm.

[15:41:28] <andrewferk> We saw really good performance. Then, we mved the import script onto it's own VM, as the mongo VM was I/O bound, and we saw even greater performance

[15:43:17] <andrewferk> We then setup a shard environment, and the bulk import benchmark tanked, taking about 20% longer than when everything was on one vm

[15:43:49] <andrewferk> It seemed that the bottleneck was the mongo router (mongos)

[15:44:22] <andrewferk> We were under the impression that mongos was lightweight, but it was consuming about 50% of one the vCPUs

[15:44:37] <andrewferk> The only queries being run were inserts

[15:45:21] <andrewferk> Is there an explaination why mongos would be slow on batch inserts? Or maybe our shard is misconfigured?

[15:46:39] <pradeepc_> My read queries are taking extremely high time. The database is also very smalljust 1Gb. I want to know how i can debug

[15:46:45] <pradeepc_> can someone help me out please

[15:48:18] <cplantijn> Has anyone attempted to install mongo on a Hostgator/Bluehost server? I followed this tutorial http://rcrisman.net/article/11/installing-mongodb-on-hostmonster-bluehost-accounts. I get an error however

[15:49:32] <cplantijn> My bluehost account has a dedicated IP, but i get an error couldnt connect at src/mongo/shell/mongo.js:145

[16:12:44] <pradeepc_> anyone please ?

[16:14:05] <rafaelhbarros> pradeepc_: there is the command explain()

[16:14:28] <rafaelhbarros> pradeepc_: http://docs.mongodb.org/manual/reference/operator/meta/explain/

[16:14:41] <rafaelhbarros> pradeepc_: with that you can check if you have the right indexes and what not

[16:16:59] <Katafalkas> Is there a way to install only mongos, without installing entire mongodb from ubuntu distro ?

[16:19:21] <kali> andrewferk: well, mongos has an overhead... but there might be something else. have you pre-plit the data ? if you haven't all write are sent to the same shard

[16:19:51] <pradeepc_> rafaelhbarros: Thank you for the help, I have actually same data on two mongodb machines both have no indexes. When I am trying db.collection.find().explain(), I am seeing on one machine its taking long time than that on other

[16:20:05] <kali> andrewferk: then, the point of sharding is not better absolute performance, it's better scalability. so sharding will harm you before it helps you

[16:20:38] <rafaelhbarros> pradeepc_: well, what is the distribution of data?

[16:20:51] <kali> andrewferk: then, a struggling mongos is easy to scale: just add one and split the load

[16:22:27] <pradeepc_> rafaelhbarros: I didnt get you. Can you please explain what do you mean by distribution of data ?

[16:23:04] <rafaelhbarros> pradeepc_: how much of the data is coming from the slow machine?

[16:24:21] <pradeepc_> rafaelhbarros: mongotop shows 182502ms for one of the namespace in read column.

[16:24:29] <kali> pradeepc_: the two machines have a copy of the same data, and one is lwloser than the other ?

[16:24:36] <kali> slower :)

[16:24:52] <pradeepc_> kali: correct

[16:25:13] <kali> pradeepc_: show us a typical document, your query, and the explain() for the query

[16:25:16] <pradeepc_> rafaelhbarros: and the data size is just around 1GB

[16:25:19] <kali> pradeepc_: on pastebin

[16:27:50] <pradeepc_> kali: http://pastebin.com/WhW3b2EV

[16:29:12] <kali> pradeepc_: where is the rest ? one server will be enough

[16:29:30] <kali> pradeepc_: you obviously need index or indexes

[16:31:53] <pradeepc_> kali: a document looks like this : { "_id" : ObjectId("507d8f16e4b0a7079dd6e205"), "brand" : DBRef("Brand", 15), "aliases" : [ "5.11 tactical", "tactical 5.11", "5.11" ] }

[16:32:43] <pradeepc_> the interesting thing is both machines dont have indexes.

[16:33:09] <kali> pradeepc_: and the query ?

[16:33:37] <pradeepc_> kali: db["BrandAlias"].find()

[16:33:50] <kali> ha.

[16:34:11] <rafaelhbarros> you're trying to get all your records at once?

[16:34:16] <kali> ok. index is not the issue :)

[16:34:54] <pradeepc_> rafaelhbarros: I think it will return just 10 at a time by default

[16:35:07] <rafaelhbarros> pradeepc_: ... no

[16:35:18] <kali> pradeepc_: in the shell, yeah

[16:35:29] <kali> pradeepc_: not from... is it php ?

[16:35:54] <pradeepc_> kali: ohh ok .. this time I am in shell only

[16:36:01] <rafaelhbarros> same for python, it returns a cursor, but once you try to iterate through it, it locks

[16:36:49] <pradeepc_> kali: One thing which is difference which I can see is mongo version. The better performing machine is of 2.4.3 and worse performing machine is of 2.4.8 version

[16:37:09] <rafaelhbarros> pradeepc_: are the machines with the same hardware?

[16:37:21] <pradeepc_> yes both are AWS c1.xlarge instances

[16:37:48] <abhishek__> hey, guys. DO you think i can query time difference in mongo if i have date format day-month-year stored as string .

[16:38:14] <kali> rafaelhbarros: well, performance between two instance can vary a lot, so...

[16:38:21] <kali> pradeepc_: ^ not rafaelhbarros

[16:38:47] <pradeepc_> But this difference is huge .. And I dont see any system parameter which is becoming bottleneck

[16:39:19] <rafaelhbarros> pradeepc_: well, find like that, with no limit, can be the factor, are you reading the second one over the network or directly in the other shell?

[16:39:21] <andrewferk> kali: We were expecting an initial decrease in throughput, but not at the size we saw. And why would mongos be utilizing so much CPU?

[16:39:26] <rafaelhbarros> pradeepc_: is the second one a replica?

[16:39:33] <kali> pradeepc_: is that relevant ? because fetching 44k docs is not really a real-life usecase...

[16:40:19] <andrewferk> kali: also, we noticed that we never set the ulimit for mongos. Could that be a possible issue if we are slamming it with inserts?

[16:40:42] <pradeepc_> rafaelhbarros: there is no replication at all.

[16:40:49] <kali> andrewferk: which ulimit are you concerned about ?

[16:40:51] <rafaelhbarros> andrewferk: well, you need to do that anyways

[16:40:59] <kali> pradeepc_: how did you copu the data from one server to the other ?

[16:41:01] <andrewferk> kali: and our data was splitting correctly, it was just really slow

[16:41:42] <kali> andrewferk: you would probably get better result with a pre-split

[16:42:02] <andrewferk> i think the default ulimits are 1024. We use the recommended for the shards at 64000. But the mongos was still at the default of 1024

[16:42:17] <kali> andrewferk: which one ? -n ?

[16:42:57] <andrewferk> kali: what do you mean which one?

[16:43:29] <rafaelhbarros> -u is process

[16:43:31] <rafaelhbarros> -n is open files

[16:44:00] <kali> andrewferk: ulimit allows to set various limits. try "ulimit -a" to see them all

[16:44:43] <andrewferk> on the vm running the mongo router?

[16:44:50] <rafaelhbarros> andrewferk: yup

[16:45:40] <andrewferk> i don't have that one running atm. I'll get it up.

[16:46:06] <rafaelhbarros> well, spawining an instance to query data might be the bottleneck there

[16:46:10] <rafaelhbarros> spawning*

[16:46:10] <pradeepc_> kali: sorry for delay .. i used mongorestore

[16:46:46] <kali> pradeepc_: with a dump from the other one ?

[16:46:59] <andrewferk> what do you mean? this isn't production. we are just benchmarking and testing

[16:47:11] <kali> pradeepc_: or mongorestor on both host ?

[16:47:43] <pradeepc_> kali: actually this data was in s3

[16:48:02] <pradeepc_> I used mongorestore to restore the data on these two machines

[16:48:26] <kali> ok. well, except from ec2 variability, i have no idea

[16:48:33] <kali> you may try with a third one

[16:48:37] <kali> +want

[16:49:10] <pradeepc_> kali: you mean create one more machine and try ?

[16:49:21] <kali> yeah

[16:49:47] <pradeepc_> ok kali thank you for helping me :) :)

[16:49:54] <kali> also, consider using m1 class machine. mongo is not cpu greedy, but loves its memory

[16:50:12] <kali> you may get better performance for the same price

[16:51:33] <pradeepc_> kali : I will try m1 instances as well and see performance difference

[16:52:29] <kali> pradeepc_: benchmarking on ec2 is hard, you need to bench on several "identical" machines and aggregate the results, or else you'll just get crap

[16:53:27] <pradeepc_> kali: exactly

[17:02:35] <andrewferk> output of "ulimit -a" on the mongos vm: http://pastebin.com/PUS8RkNq

[17:07:30] <kali> well, that's nice, but which one were you concerned about ? :)

[17:08:47] <andrewferk> which one would limit the number of requests?

[17:09:10] <kali> the number of requests ? as in "concurrent requests" ?

[17:09:21] <andrewferk> ugh

[17:09:47] <andrewferk> Should I not be worried about it? I was reading through the ulimit docs on mongo

[17:10:24] <kali> -n may limit the number of connections, but i don't think that's your problem right now

[17:10:56] <kali> the rest seems harmless

[17:11:20] <andrewferk> OK. It was the only thing I could think of that we didn't do

[17:12:16] <andrewferk> I thought -n 1024 vs -n 64000 was pretty significant when we are trying to do batch inserts, but i really have no clue :)

[17:13:10] <kali> not really. -n is file descriptors, so it has to cover tcp connection and actual files. It would generate errors anyway, not slow it down

[17:14:01] <andrewferk> we were getting about 7500 inserts/sec when we had a mongo setup on a m3.2xlarge ec2 instance mostly using ephemeral ssd

[17:14:30] <andrewferk> without vertically scaling, we would like to see this number to be 10K+ inserts/sec, so that's why were are benchmarking sharding

[17:15:09] <andrewferk> having 2 shards, we were not coming close to 7500 inserts/sec, so we assume we are doing something wrong

[17:15:52] <andrewferk> we backtracked, and discoverd that the issue happens when we setup sharding.

[17:16:11] <andrewferk> we had node-vm inserting into mongo0-vm

[17:16:49] <pradeepc_> one question does mongodump and mongorestore copies indexes as well ?

[17:16:57] <cheeser> yes

[17:16:59] <kali> andrewferk: how many mongos are you using ?

[17:17:23] <andrewferk> we are just using one, on the same vm as the node import script

[17:17:40] <kali> andrewferk: you can try using two

[17:17:44] <andrewferk> yes we can

[17:17:51] <kali> andrewferk: on different hosts

[17:18:03] <andrewferk> but we would like to understand why mongos is running at 50% CPU

[17:18:14] <andrewferk> and why is it so much slower

[17:18:23] <andrewferk> even when we only have one shard

[17:18:23] <kali> can't help there :)

[17:18:36] <andrewferk> and it knows to only send all requests to a single shard

[17:18:48] <pradeepc_> cheeser: so if i create some index on some custom field.. and take mongodump and restore on different machine the indexes willl also get copied ?

[17:18:51] <andrewferk> mongos uses 50% cpu and slows the insertions/sec

[17:19:02] <cheeser> pradeepc_: try it and see but yes it should

[19:29:37] <LucasTT> hello, i have an issue

[19:30:10] <LucasTT> i'm on windows, and the last time i was running a mongodb server,i closed it with Control + C which forced it

[19:30:25] <LucasTT> but now that i'm trying to run mongodb again it won't run it

[19:31:55] <LucasTT> this is the log

[19:31:56] <LucasTT> http://pastebin.com/vUf4buCs

[19:32:39] <coreyfinley> Does anyone know of an easy way to clear the cache for MongoDB? I'm running some aggregations on a local dataset, and I need to benchmark the run times. But after the first one it just returns the copy from memory.

[19:33:39] <cheeser> i'm not sure that's really true. i don't recall ever hearing that aggregation results are cached in memory.

[19:33:58] <cheeser> now, the collection data processed might still be in memory but that's what you want anyway

[19:34:28] <coreyfinley> ruby benchmark results estimate something like 20 seconds for one of my jobs, then when rerunning it, it comes back as around 4. With no changes to the calculation.

[19:34:36] <LucasTT> nevermind, ran it with --repair and worked

[20:17:51] <proteneer> wtf there is a 10MB overhead for each mongodb connection?

[20:19:10] <proteneer> oh it's 1MB now

[20:25:17] <skot> It is the default stack size (per thread) and configurable

[20:38:45] <andrewferk> kali rafaelhbarros thanks for your help earlier. We are no longer having mongos issues today. We weren't able to figure out the difference from yesterday. I'm guessing it was an incorrect setup

[20:38:54] <andrewferk> Someone else did yesteray, and did it today

[20:38:59] <andrewferk> didn't work yesterday, works today

[20:39:02] <andrewferk> enough said :)

[20:39:13] <andrewferk> i did it today*

[20:39:17] <rafaelhbarros> andrewferk: hit them with a shovel.

[20:39:33] <rafaelhbarros> the FNG is the issue

[20:39:41] <andrewferk> i'm just excited to see it working

[21:57:51] <ctorp> Is there an easy way to target an array index by an attribute on one of the contained objects with mongo?

[21:59:20] <ctorp> I basically have {foo: [{id:1},{id:2}]} and want to do something like db.test.findById where id==2, but hopefully with some mongo internal instead of doing a js loop

[21:59:44] <ctorp> where foo[index].id == 2, I mean

[22:02:22] <skot> You can just issue this query: find({"foo.id" : 2}) using dot-notation

[22:03:00] <skot> http://docs.mongodb.org/manual/core/document/#dot-notation

[22:03:42] <skot> http://docs.mongodb.org/manual/tutorial/query-documents/#read-operations-subdocuments

[22:04:27] <ctorp> oh. nice!

[22:10:49] <ctorp> skot: if I do a lookup based on another property like db.test.findByIdAndUpdate({vers:123}...) can I still do a subdocument update only where "foo.id"==2 ?

[22:14:17] <skot> http://docs.mongodb.org/manual/reference/operator/update/positional/

[22:14:37] <ctorp> thank you!

[22:26:37] <andrewferk> So, now I'm having issues with chunks in my shards

[22:27:06] <andrewferk> The chunk size is 64, the default, but my data is 79MB, and it's still one chunk

[22:27:12] <andrewferk> so all my writes are going to only one shard

[22:27:19] <andrewferk> what would be causing this?

[22:28:03] <andrewferk> i tried changing the chunk size to something smaller (4), but all the data continues to be in one chunk

[22:37:03] <joannac> the chunk can't be split? all the docs in it have the same shard key?

[22:41:21] <andrewferk> they all have an attribute "geohash" which is the shard key

[22:41:30] <andrewferk> there are over 700k unique values for geohash

[22:42:24] <andrewferk> i'm adding even more records, and it continues to stay at one chunk

[22:46:11] <joannac> sh.getBalancerState() ?

[22:46:34] <andrewferk> true

[22:46:53] <jdavid> hey guys, my friend brandon black says i can find help here

[22:46:56] <jdavid> https://groups.google.com/forum/#!topic/mongodb-csharp/euVPtNEPE0M

[22:46:59] <joannac> sh.status() ? (pastebin this one please)

[22:48:04] <andrewferk> http://pastebin.com/tfWfUPBZ

[22:48:31] <joannac> jdavid: where's the query and update?

[22:49:30] <joannac> andrewferk: use geohash; db.geohash.stats() ?

[22:49:45] <joannac> gah, "use benchmark", sorry

[22:50:43] <andrewferk> http://pastebin.com/qTUqA2RS

[22:51:16] <jdavid> @joannac

[22:51:39] <jdavid> i don’t think i can share that code, but we are using bson to locate the docs

[22:51:40] <joannac> andrewferk: hmmm. inserting via multiple shards, by any chance?

[22:52:12] <jdavid> if we are using the VisualStudio console and set a break point before the query, it will work

[22:52:17] <andrewferk> joannac: what do you mean?

[22:52:24] <jdavid> if we set the break point after the query it fails

[22:52:37] <joannac> jdavid: so the query doesn't work?

[22:52:41] <andrewferk> i was inserting into shard0000 before it was a shard

[22:53:29] <andrewferk> then i removed all the data, setup my mongo router and config, and added the mongod as a shard

[22:56:18] <andrewferk> would that be a problem? do i need to go back and clear my two shards of data and readd them?

[22:56:57] <joannac> andrewferk: how are you adding data?

[22:57:18] <andrewferk> with the node.js mongo driver

[22:57:27] <joannac> to the mongos?

[22:57:32] <andrewferk> yes

[22:57:42] <joannac> chec the mongos log?

[22:57:55] <joannac> it should be trying to split and migrate

[22:58:01] <andrewferk> wait... no

[22:58:03] <andrewferk> i'm stupid :)

[22:58:11] <andrewferk> haha, ty though

[22:58:16] <jdavid> it works if we give the 1st write/update query more time by setting a breakpoint

[22:58:22] <joannac> you said it, not me ;)

[22:59:12] <joannac> jdavid: okay. are you not waiting for the getLastError response?

[22:59:23] <jdavid> we tried that too

[22:59:27] <joannac> anyway, i gtg

[22:59:30] <jdavid> same result

[23:00:01] <joannac> jdavid: post more information on the ticket. there's not enough information there to go on

[23:00:48] <joannac> jdavid: for a start, change step 4 to "retrieve the document I just inserted

[23:00:54] <joannac> and see if that works

[23:01:04] <joannac> and either way, update the ticket

[23:01:22] <jdavid> ok

[23:01:52] <jdavid> but if we do a findOne on it with the query and do it at human time scales it works

[23:02:21] <jdavid> i’m wondering if the key to this is two HTTP requests on IIS 7.x

[23:02:47] <jdavid> is journaling keyed to a connection

[23:02:59] <proteneer> does mongodb not support set semantics? (container of unique items)

[23:03:10] <jdavid> and an IIS Mongo Connection to a request?

[23:03:18] <jdavid> so two requests would see two journals?

[23:03:43] <jdavid> writeconcern means consistent across connections right?

[23:39:45] <proteneer> how well does Mongo handle data in the 500TB range?

[23:46:50] <rkgarcia> proteneer: I haven't 500TB :(

[23:56:41] <VooDooNOFX> proteneer: 500TB of storage size doesn't matter. It's about index size.

[23:57:54] <VooDooNOFX> proteneer: mongo uses a mmap on the storage files, so access time is kept low on any item.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 2nd of April, 2014