PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 27th of August, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:24:21] <jmo> Hey there. I am confused about the proper use case for mongo. If I have a data model that seems to fit just fine in a SQL database, is it a good idea to think about using mongo?
[00:26:18] <Boomtime> jmo: are you prepared to learn something new or would you rather only ever use what you are familiar with?
[00:27:17] <Boomtime> MongoDB is a pretty general purpose database - but like _all_ databases, unless you know how to use it, you're probably not going to have a good experience
[00:27:42] <Boomtime> that means you need to be prepared to learn something new, that is all
[00:28:39] <jmo> Boomtime: very willing to learn something new. the flexible schema part for development is mostly what draws me. i'm worried that it's meant more for unstructured / inconsistently structured data
[00:28:57] <jmo> could you really take a relational model and map it into mongo without trouble?
[00:30:28] <Boomtime> of course
[00:31:29] <Boomtime> just because mongodb is schema-less, does not mean you get to ignore the schema
[00:32:12] <Boomtime> mongodb does not enforce a schema on you, but _you_ better think about your schema or you won't be able to manipulate your data
[00:32:18] <jmo> fair enough
[00:32:39] <jmo> i know this channel will tend to be biased, but is there a case when you definitely *shouldn't* use mongo?
[00:33:30] <Boomtime> when you don't know what you're doing - otherwise nearly anything can work - just like an sql
[00:36:52] <jmo> Boomtime: thanks for the tip, sounds like a plan :)
[00:37:20] <jmo> what about all the stuff i keep reading about losing data?
[00:37:41] <jmo> it seems like that's just because people aren't using 'safe'
[04:48:00] <alaee_> Hi. Can anyone help me with mongodb optimization please? I have this very simple 100 documents long collection and when I'm using it in a Flask app with uwsgi and nginx, the bottleneck is my mongodb and I can't go beyond 200 reqs/sec.
[04:48:57] <alaee_> BTW, I use mongoengine for connecting to my mongo
[06:36:06] <macwinner> alaee_: are you using any indexes?
[06:36:37] <alaee_> I have the default index on _id and a unique index on one of my four fields.
[06:36:54] <macwinner> are you querying by id?
[06:37:07] <alaee_> I'm querying all documents. There are 100
[06:37:26] <macwinner> are they big documents?
[06:37:42] <alaee_> They're like this : post 1
[06:37:42] <alaee_> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passa
[06:37:42] <alaee_> ges, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. 1394/6/5
[06:38:21] <macwinner> i'm not familiar with flask and uwsgi
[06:38:40] <macwinner> and haven't used nginx really.. so i'm not sure if they support some sort of connection pooling
[06:52:51] <alaee> I got disconnected :|
[06:53:28] <alaee> They're like this : post 1
[06:53:28] <alaee> <alaee_> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem I
[06:53:28] <alaee> psum passa
[06:53:28] <alaee> <alaee_> ges, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. 1394/6/5
[06:53:30] <alaee> * albertom has quit (Ping timeout: 245 seconds)
[06:53:32] <alaee> <alaee_> 100 of them.
[06:53:34] <alaee> <alaee_> I send requests using ab -n 10000 -c 100 and I get the results in 50 seconds. Meaning 200 reqs/sec
[07:30:54] <alaee> Hi. Can anyone help me with mongodb optimization please? I have this very simple 100 documents long collection and when I'm using it in a Flask app with uwsgi and nginx, the bottleneck is my mongodb and I can't go beyond 200 reqs/sec.
[07:54:13] <mtree> hello
[07:54:47] <mtree> its better to do couple of aggregate qureries, or make one search and reduce results to the required format?
[07:56:29] <kali> mtree: i try to deport as much work as reasonable to the application server, as this part is easier to scale than a database
[07:56:52] <kali> mtree: not sure if that answer your question, but without cardinalities anyway, there's not much that can be said
[07:58:05] <mtree> kali: i wrote code that gets LOTS of records from the collection and I filter/reduce these results in code to get what I need (sum of some conditional values)
[07:58:21] <mtree> but my boss says its not cool and that I should do it all using aggregate
[07:58:38] <mtree> even if it means triggering more than 1 query
[07:58:48] <kali> mtree: well, moving LOTS of records over the network does not sound good :)
[07:59:03] <mtree> so he got the point, right?
[07:59:47] <kali> well, you at least ought to try what he's suggesting and see how it behaves compared to your solution
[08:00:19] <kali> but we're slowing moving to a non-technical field here, not sure i'm the best person to give advice :)
[08:00:41] <mtree> thats fine, u helped me for sure
[08:00:45] <mtree> thanks
[09:38:33] <sedavand> what users and roles do I need to create inorder to use local user/simple password auth?
[09:46:34] <mtree> is it possible to have two unrelated $group in one aggregate query?
[09:55:23] <Derick> mtree: sure
[09:55:40] <Derick> you'd still need to do it in two pipeline stages though
[09:55:50] <mtree> Derick: can u provide me example?
[09:56:50] <Derick> mtree: it's probably easier if you show me your input documents, and describe (with example) what you'd like as output (in a pastebin)
[10:00:55] <mtree> ok, give me a second
[10:05:48] <mtree> Derick: http://pastebin.com/VE40EFix
[10:10:51] <mtree> so, is it doable in one querY?
[10:11:53] <Derick> let me see
[10:12:44] <Derick> is the number of values in "otas" a set, or unbound?
[10:13:42] <mtree> its the sum of same ota values
[10:13:51] <mtree> not sure if I understand your question
[10:14:59] <Derick> are there just two values of otas, or many more?
[10:15:11] <mtree> oh, many more
[10:15:31] <Derick> in that case, no. as you want to move a value into a key
[10:15:53] <mtree> :-) thank you very much!
[10:16:51] <mtree> for now i'm doing something like this
[10:16:52] <mtree> $group: { _id: '$result.name', count: { $sum: 1 }}
[10:17:14] <mtree> s/$result.name/$ota
[10:17:34] <mtree> and it works fine, but it requires me to execute another query to group by hotelNames
[10:17:51] <Derick> hmm
[10:17:54] <mtree> its sad but still better than getting whole filtered collection and reducing it in code
[10:17:55] <Derick> actually
[10:18:24] <Derick> you have a match as well?
[10:18:27] <Derick> hm
[10:19:10] <mtree> yes
[10:24:05] <charnel> How can I optimize a query which is looking for ip range. I am using $gt and $le which is slow. But I converted the Ip addresses to integer which gained me some time. Basically now I am only looking if the given integer is between min and max values. Can I improve performance more ?
[10:35:42] <kali> charnel: you have an index on the field, i assume ?
[10:36:04] <charnel> Yep.
[12:34:34] <ssutch> hey
[12:35:02] <ssutch> we are trying to add a member to our replicaset but it stays in startup2 state forever and continually crashes with this message: 2015-08-27T12:31:10.370+0000 E REPL [rsBackgroundSync] sync producer problem: 16235 going to start syncing, but buffer is not empty
[12:35:17] <ssutch> running 3.0.6 on the entire cluster
[12:35:45] <ssutch> it's a 2-shard cluster with 3 configs and we are trying to startup a node on rs0
[12:44:48] <deathanchor> ssutch: no other errors around that line?
[12:44:57] <deathanchor> or warnings?
[12:48:45] <ssutch> deathanchor: several
[12:49:01] <ssutch> the other that commonly occurs directly after a replicaset failure is [rsSync] 10276 DBClientBase::findN: transport error (PRIMARYNAME)
[12:49:19] <ssutch> ony ever happens after it gets all the indexes
[12:49:56] <ssutch> if someone AT mongodb were available for immediate support im sure my boss would be happy to pay for someone to help us out. today is our launch day hah
[12:51:17] <cheeser> support staff are standing by. :)
[12:51:36] <deathanchor> ssutch: can you gist/pastebin the logs with the errors? (you can sanitize if you want)
[12:51:58] <ssutch> deathanchor: https://gist.github.com/samuraisam/83fed4d9ce21b4614388
[12:52:24] <ssutch> ill post the other error when it pops up
[12:52:59] <deathanchor> are you starting with an empty dir for the dbpath?
[12:53:10] <ssutch> yes
[12:53:27] <ssutch> completely fresh node, 6tb EBS at /data with 10000 IOPS
[12:55:43] <deathanchor> jez.
[12:56:10] <ssutch> the only potentially weird thing is we are using an ELB to route traffic (so we can set up a cname in route53). it's worked for us for weeks though
[12:56:50] <deathanchor> single CNAME resolution? (meaning the CNAME points to an A record)
[12:57:19] <deathanchor> I have had problems with CNAME -> CNAME -> A resolution.
[12:57:36] <ssutch> CNAME points to an ELB
[12:57:41] <ssutch> (which is an A record)
[12:58:09] <ssutch> internal-mongodb-rs0-node0-$BLAH.aws.blah
[12:58:11] <deathanchor> yeah, not sure about this error, new to me, but I'm not on 3+ mongo yet
[12:58:35] <deathanchor> nor do I use an ELB with my mongo set.
[12:59:11] <deathanchor> but like cheeser said, mongodb.com <- support for $$
[12:59:21] <deathanchor> I don't work for them
[12:59:26] <deathanchor> I don't use them :D
[12:59:35] <ssutch> lol thanks
[12:59:39] <ssutch> cheeser: how do we get started?
[12:59:52] <deathanchor> I'm a cowboy mongodb user :D I break it, I fix it
[13:00:38] <cheeser> ssutch: here perhaps: https://www.mongodb.com/products/development-support
[13:00:55] <cheeser> they've renamed things and i'm not quite sure which is which now. :)
[13:01:34] <deathanchor> https://www.mongodb.com/contact
[13:04:04] <ssutch> tried calling the new york office. nothing
[13:10:29] <ssutch> well, darn.
[13:12:31] <mbwe> hi everybody i have a collection devices and it has a name field, how could i do an insert only when the name field does not exist already
[13:13:37] <mbwe> thus {name: "#010101"} and if there is already a record with name: "#010101" i want to cancel that insert
[13:14:59] <deathanchor> mbwe: unique index
[13:15:35] <deathanchor> you'll get an exception if you try to insert a record that already exists
[13:16:01] <deathanchor> does anyone here use morphia?
[13:22:59] <ssutch> are any of the devs around? or that i can ping?
[13:23:07] <ssutch> not able to get in touch with anyone at mongodb via phone
[13:23:36] <ssutch> kind of crucial. our app just went live :(
[13:34:09] <ssutch> welp. restarted from a manual snapshot of another node and it worked
[14:01:37] <Grixa> Hi all!
[14:03:18] <coudenysj> hi
[14:07:17] <Grixa> I have a problem with slowly finding data in collection (~2500000 records). My script executes search query about 500000 times and it takes about 10 minutes. Collection contains records with windows paths (for example: C:\Users\admin\Documents\1.pdf).
[14:07:51] <deathanchor> Grixa: index? did you do an explain on the query?
[14:08:24] <Grixa> "cursor" : "BtreeCursor path_1","isMultiKey" : false,"n" : 1,"nscannedObjects" : 1,"nscanned" : 1,"nscannedObjectsAllPlans" : 1,"nscannedAllPlans" : 1,"scanAndOrder" : false,"indexOnly" : false,"nYields" : 0,"nChunkSkips" : 0,"millis" : 45
[14:08:49] <Grixa> Of course, i am using index
[14:09:14] <deathanchor> so 45ms is too slow for you?
[14:11:05] <Grixa> It is too slow, while i am querying search for 500000 times in my python script. My idea is to store some hash (CRC32 for example) and use it for indexed search. Or it will be meaningless?
[14:12:00] <deathanchor> my calc is 45ms per search x 500k time = 6.25 minutes
[14:12:24] <deathanchor> what are you returning from the doc?
[14:12:53] <deathanchor> indexOnly" : false <- if you want faster try to make this true by making a compound index of what you want and what you are looking for
[14:13:21] <Grixa> I am using find_one() from pymongo. Collection contains only "path" field
[14:13:50] <deathanchor> then use the filter option to return only the path.
[14:14:13] <deathanchor> find_one( { path : "pathyouwant"}, { path : 1 } )
[14:14:34] <Grixa> This can help?
[14:14:44] <deathanchor> indexOnly" : false because you want the doc which includes the _id which isn't in that index
[14:15:31] <deathanchor> do an explain on findOne( { path : "pathyouwant"}, { path : 1 } )
[14:17:28] <Grixa> I used this query: db.aps2_paths.find({"path": "..." }, {"path": 1, "_id": 0}).explain()
[14:17:39] <deathanchor> and?
[14:17:42] <deathanchor> results?
[14:17:43] <Grixa> "indexOnly" : true,
[14:17:47] <Grixa> "millis" : 0,
[14:17:47] <deathanchor> ms?
[14:17:49] <deathanchor> NICE
[14:17:52] <deathanchor> better?
[14:17:55] <Grixa> Yeah!
[14:18:01] <Grixa> Thanks!
[14:18:18] <Grixa> :)
[14:18:40] <deathanchor> yeah gotta remember that mongo will return the full doc by def which requires a read from disk, but if the index is in memory and all you need is in the index use projection to avoid disk reads
[14:22:35] <deathanchor> Grixa: just keep your index size in mind when spec'ing out your machine memory, if it gets too large then you'll need more memory to avoid paging.
[14:30:08] <Grixa> Thanks! Aaand another stupid question: if i want only recieve _id from my collection, i use this query: db.ups_paths.find({"path" : "..." }, {"path": 0}). And this query again not used index ( "indexOnly" : false )
[14:31:04] <Grixa> ("cursor" : "BtreeCursor path_1")
[14:32:13] <deathanchor> yeah because the _id isn't in the index
[14:32:28] <deathanchor> you can create another index : { _id : 1, path : 1}
[14:33:13] <deathanchor> but you need to decide if you want another index built and taking up sweet precious memorry
[14:34:05] <Grixa> this index already exist
[14:34:30] <deathanchor> ah, then use a hint
[14:34:47] <deathanchor> it will force it to use that index instead
[14:42:53] <Grixa> db.ups_paths.find({"path" : "..." }, {"path": 0}).hint({"path": 1, "_id": 1}).explain() -> "cursor" : "BtreeCursor path_1__id_1","isMultiKey" : false,"nscannedObjects" : 1,"nscanned" : 1,"indexOnly" : false,"millis" : 0
[14:43:03] <Grixa> Again, indexOnly: false
[14:43:05] <Grixa> :/
[14:49:46] <jpbjpb> hi gang!
[14:50:20] <jpbjpb> I’m trying to add a key:value pair where the value is a pointer to another collection. How can I do this?
[14:50:22] <saml> Grixa, nscannedObjects:1 is good
[14:50:53] <saml> jpbjpb, how would you dereference your _pointer_ ?
[14:51:02] <saml> your app has to make second query
[14:51:29] <jpbjpb> saml: I’m not using the app, I’m just in the REPL
[14:51:43] <saml> http://docs.mongodb.org/master/reference/database-references/#dbrefs this is convention
[14:51:56] <saml> but you do need to do secondary queries
[14:52:13] <jpbjpb> saml: that’s all a little over my head
[14:52:40] <saml> db.docs.find(yolo).forEach(function(x) { db.someOtherCollection.find({_id: x.idOfSomeOtherCollection}) ....
[14:52:46] <jpbjpb> I used mongoimport to pull in some JSON, but I could’nt figure out how to import the references to another collection, so I’m trying to update by hand
[14:53:09] <StephenLynx> because
[14:53:12] <StephenLynx> there isn't references
[14:53:13] <cheeser> references should import just fine
[14:53:13] <jpbjpb> saml: I get the `find`, I’m a little unclear on the `update` or `set`
[14:53:31] <StephenLynx> ah
[14:53:32] <StephenLynx> dbrefs
[14:53:40] <StephenLynx> I always forget about those :v
[14:53:45] <saml> maybe give us example json you're importing. probably better to use mongodump and restore
[14:53:56] <cheeser> StephenLynx: i wish i could. currently fixing morphia bugs around those. :)
[14:53:58] <jpbjpb> cheeser: mongoimport seemed to have trouble w/ non-strings (eg ObjectId(“foo”))
[14:54:20] <Derick> jpbjpb: that's not valid as an ObjectId
[14:54:22] <StephenLynx> >morphia
[14:54:28] <StephenLynx> is always some kind of ODM, isn't it?
[14:54:38] <cheeser> it is, yeah. java.
[14:54:53] <StephenLynx> >everything is burningo, mongo is the suxor
[14:54:57] <StephenLynx> theres always an ODM behind it
[14:55:06] <cheeser> meh
[14:55:07] <StephenLynx> people never learn
[14:55:07] <jpbjpb> Derick: it seemed like mongoimport cannot handle `ObjectId`s
[14:55:14] <jpbjpb> so my hope was that I could do it manually
[14:55:21] <jpbjpb> I’m trying this: `db.plans_copy.update({"slug":"200w41st"}, {$set: {"_building":ObjectId("55dcd6aa88a093e0032e8a97"),}, false, true}`
[14:55:42] <cheeser> for the most part it works like a champ. just some corner cases where people want to reuse instances that weren't designed for it.
[14:55:43] <jpbjpb> but when I run that in the REPL, it returns three dots and hangs
[14:55:56] <saml> that's unmatching { or (
[14:55:57] <jpbjpb> (which I’ve never seen before)
[14:56:07] <Derick> jpbjpb: you can't have danling commas either
[14:56:10] <saml> check your syntax of jervascript
[14:56:14] <jpbjpb> saml: the three dots are unmatching brackets? thanks!
[14:56:18] <jpbjpb> i’ll check my syntax
[14:56:21] <Derick> jpbjpb: you miss an ) at the end
[14:56:26] <jpbjpb> yep!
[14:56:29] <Derick> to close the update( one
[14:56:33] <cheeser> jervascript. love it.
[14:58:16] <jpbjpb> thanks gang! I’ll work on the syntax a bit
[15:17:56] <Grixa> Why db.ups_paths.find({"path" : "..."}, { "path": 0 }).explain() return "indexOnly" : false, but it uses "cursor" : "BtreeCursor path_1"?
[15:19:46] <cheeser> because it's using the index path_1 to resolve the query but you're suppressing the path field so it has to use more than the index to return documents
[15:19:54] <cheeser> all you want is the _id?
[15:21:29] <Grixa> yes
[15:22:03] <cheeser> right. so it has to load the full document from disk in order to get those _id values.
[15:22:12] <cheeser> it's not "index only" at that point.
[15:23:27] <cheeser> http://docs.mongodb.org/manual/core/query-optimization/#covered-query
[15:25:36] <Grixa> There is no way to optimize it? Or it is normal situation?
[15:26:54] <cheeser> you could add _id to that index.
[15:29:03] <saml> why not make "path" _id ?
[15:31:02] <Grixa> I created new index (path: 1, _id: 1) and, again: "cursor" : "BtreeCursor path_1__id_1", "indexOnly" : false
[15:31:44] <saml> db.ups_paths.find({"path" : "..."}, { "path": 1 }).explain() try this
[15:32:21] <Grixa> indexOnly: true
[15:32:30] <Grixa> but i need _id
[15:32:35] <saml> you do get _id
[15:32:39] <saml> along with path
[15:33:51] <Grixa> it will be recieved from memory?
[15:35:08] <saml> no idea
[15:35:38] <saml> my .explain() doesn't show indexOnly stuff. i'm using mongo 3
[15:36:05] <Grixa> indexOnly: true
[15:37:28] <Grixa> I will test this on my server. Thanks!
[15:41:43] <deathanchor> saml: really? that's strange.
[15:43:28] <christo_m> hello, does this schema structure make sense: http://pastebin.com/QMJhq3DH
[15:43:52] <christo_m> im trying to figure out how to create shows without overwriting the season and episode properties.. i want it to append
[15:44:07] <christo_m> so i can have a subcollection of seasons and then episodes respectively
[15:45:46] <christo_m> i guess i have to treat the title as the primary key and do a $push
[16:01:21] <christo_m> anyone able to help?
[16:04:08] <StephenLynx> don't use mongoose.
[16:04:12] <christo_m> too late
[16:04:17] <StephenLynx> it will give you nothing but trouble and run incredibly slow.
[16:04:40] <StephenLynx> if you are trying to figure your model, why is too late?
[16:05:19] <christo_m> because other things are built already.
[16:05:37] <StephenLynx> you can just not use mongo on this case.
[16:05:44] <christo_m> ?
[16:05:45] <StephenLynx> mongoose*
[16:06:25] <christo_m> StephenLynx: i started the project with this: https://github.com/DaftMonk/generator-angular-fullstack
[16:06:30] <christo_m> and it scaffolds some mongoose crap.
[16:06:44] <christo_m> i dont want to start deviating from the norm unless im actually doing it for everything.
[16:08:06] <christo_m> StephenLynx: anyway, you see what im trying to do, i just dont know how to shape my queries to do it.
[16:08:30] <StephenLynx> ugh
[16:08:35] <StephenLynx> you started with the wrong foot
[16:08:40] <christo_m> dude listen
[16:08:42] <StephenLynx> and the norm is not to use all this crap
[16:08:52] <christo_m> it doesnt matter where i go on IRC, im using the wrong framework or tool or technique
[16:08:56] <christo_m> i dont care anymore
[16:09:04] <christo_m> its all bullshit
[16:09:21] <cheeser> yeah. enough with the "mongoose is shit" screeds. it doesn't help answer questions.
[16:09:44] <christo_m> StephenLynx: im sorry i didnt do things your way the first time i started with node/mongo
[16:09:52] <christo_m> im trying to scramble together an MVP
[16:09:59] <christo_m> not engineer facebook.
[16:34:27] <deathanchor> when I have nothing good to say, I don't say anything at all. Keeps me out of trouble.
[16:38:29] <sewardrobert> deathanchor, your mother taught you well. :-)
[16:41:24] <StephenLynx> yeah, nah
[16:41:33] <StephenLynx> knowledge without criticism is useless.
[16:41:59] <StephenLynx> torvalds is a great role model.
[16:47:06] <deathanchor> please critic when I ask, "What do you think about?" but unwanted criticism is just that... unwanted.
[16:50:40] <StephenLynx> thats not how criticism works.
[16:51:24] <StephenLynx> let say you create some bullshit, ok?
[16:51:34] <StephenLynx> it doesn't work, its completely wrong
[16:51:44] <StephenLynx> but you say "I don't want criticism :^)"
[16:52:04] <StephenLynx> the less someone wants criticism, the more it needs.
[16:52:46] <StephenLynx> because when you just cover your ears and scream "LALALALA I CAN'T HEAR YOU", it signs you are on your way to completely screw up
[16:53:18] <deathanchor> agreed, but if he's doing that? then why provide the criticism anyway?
[16:53:28] <deathanchor> just wasting both people's time
[16:53:31] <deathanchor> and efforts
[16:53:33] <StephenLynx> so others won't follow his patch.
[16:53:37] <StephenLynx> path*
[16:53:52] <StephenLynx> if he doesn't listen, at least others have the chance.
[16:54:07] <StephenLynx> and its my time to be used as I wish, and he is free to use /ignore on me.
[16:54:25] <StephenLynx> so the "wasted time" is a moot point.
[16:55:20] <deathanchor> StephenLynx: so I'm thinking of using a framework for my app to work with mongo. My main codebase is python. What do you suggest?
[16:55:23] <deathanchor> serious question
[16:55:31] <StephenLynx> Zero knowledge of python.
[16:55:34] <StephenLynx> I can't tell you that.
[16:55:44] <deathanchor> what do you work in and use?
[16:55:44] <StephenLynx> but as a general rule, I wouldn't use any web framework.
[16:55:51] <StephenLynx> io.js
[16:55:56] <StephenLynx> with no frameworks.
[16:56:19] <deathanchor> well I'm not using any web frameworks
[16:56:30] <deathanchor> just a framework to communicate with mongo
[16:56:34] <StephenLynx> ah, an ODM?
[16:56:39] <deathanchor> I was planning on using mongoengine
[16:56:54] <StephenLynx> I don't use them either.
[16:57:06] <StephenLynx> the driver is already meant to be used as it is.
[16:57:12] <deathanchor> so just straight up pymongo?
[16:57:17] <StephenLynx> is what I would do.
[16:57:31] <StephenLynx> is pymongo endorsed by 10gen?
[16:57:39] <StephenLynx> I know the driver I use for node.js is.
[16:57:44] <StephenLynx> so I trust it on itself.
[16:58:20] <StephenLynx> GothAlice is experienced with mongoengine and might be able to tell you the benefits of using.
[16:58:32] <deathanchor> yeah, I know the benes
[16:58:45] <deathanchor> that's why I'm using it for a startup project
[16:58:47] <deathanchor> LLOE
[16:58:48] <GothAlice> deathanchor: pymongo is the official MongoDB driver. Pretty much everything else gets built on top of it.
[16:58:49] <StephenLynx> so take in consideration if you actually need it.
[16:58:57] <StephenLynx> if you do, use it.
[16:59:11] <StephenLynx> if you don't keep it simple with the regular driver.
[16:59:17] <GothAlice> At the same time, I use the MongoEngine ODM, as it provides many features the raw driver does not.
[16:59:19] <StephenLynx> more dependencies, more problems.
[17:00:04] <GothAlice> I.e. it provides a proper "model" layer to your app, with schema, data validation and transformation, event callbacks / triggers, and can even handle pseudo-relational stuff for you, including reverse delete rules.
[17:00:24] <deathanchor> yeah, I prefer frameworks that specialize in one thing. I was using django for playing around, but it is too over reaching with somethings.
[17:00:37] <GothAlice> Indeed. It's difficult to grow past Django's built-in limits.
[17:00:59] <StephenLynx> imo, these ODMs provide stuff that you shouldn't use in the first place, according to mongo's design.
[17:01:01] <StephenLynx> but its up to you.
[17:01:21] <StephenLynx> make sure to look for benchmarks
[17:01:21] <deathanchor> yeah that's why I'm just going to use mongoengine and find a web framework that just handles requests and another to handle serving content.
[17:01:29] <StephenLynx> so you know what compromises you are making.
[17:01:38] <deathanchor> the ODM just makes things easy for inital setup
[17:01:46] <StephenLynx> from a performance point of view
[17:01:49] <deathanchor> you can lax the rules I believe on mongoengine
[17:02:01] <GothAlice> deathanchor: Mild bit of self-promotion, here, but my web framework (WebCore) is about as light-weight as you can get. (The web.core module is < 300 lines, and everything is optional.)
[17:02:26] <deathanchor> I'll check it out, was going to look at tornado
[17:02:51] <GothAlice> pymongo isn't asynchronous, thus you'll need a different base driver to use Tornado properly.
[17:03:11] <Owner> so someone told me they shutdown one mongo of a cluster of 3
[17:03:19] <Owner> and the other two "broke"
[17:03:43] <deathanchor> Owner: depends on the config
[17:03:44] <GothAlice> deathanchor: https://github.com/marrow/WebCore/blob/rewrite/example/monster.py is the "one example to rule them all" for WebCore 2.
[17:04:09] <Owner> deathanchor, what could be missing?
[17:05:10] <deathanchor> thx GothAlice
[17:05:25] <deathanchor> Owner: are the other two members priority zero?
[17:05:29] <deathanchor> or hidden?
[17:05:33] <deathanchor> what do the logs say?
[17:05:44] <Owner> ill have to check on those things...
[17:05:59] <deathanchor> I'd start with the logs
[17:06:06] <deathanchor> logs always tell you what went wrong
[17:06:30] <Owner> if we have logs ill check that first
[17:07:27] <Owner> well we got a tinkerer...
[18:13:59] <christo_m> okay well
[18:14:05] <christo_m> all i wanted to do was do a proper insert query
[18:14:12] <christo_m> not debate why my choice in mongoose is incorrect
[18:14:24] <christo_m> its literally a red herring in the problem i was presenting.. i was trying to show how i wanted my data structured
[18:15:00] <christo_m> its like "yo my timing belt is broken on my car" and you go "well why did you buy a Ford"
[18:15:42] <deathanchor> use topics; db.mongoose.insert({ 'mouth' : 'gagball' });
[18:16:22] <christo_m> lol
[18:16:43] <gcfhvjbkn> i've setup a small cluster
[18:16:54] <gcfhvjbkn> with sharding done with sharding tags
[18:17:09] <gcfhvjbkn> i see mongo created an unique index on my sharding key somehow
[18:17:14] <christo_m> GothAlice: im trying to setup a model like this: http://pastebin.com/QMJhq3DH
[18:17:16] <gcfhvjbkn> which i don't want at all
[18:17:16] <deathanchor> htat's required
[18:17:29] <gcfhvjbkn> that's required? ok
[18:17:30] <christo_m> how would i go about inserting seasons and episodes based on the title of the show?
[18:17:36] <gcfhvjbkn> thanks, didn't know about it
[18:17:45] <deathanchor> you have to have an index on the shardkey
[18:17:55] <gcfhvjbkn> necessarily unique?
[18:18:01] <deathanchor> no
[18:18:20] <gcfhvjbkn> oh
[18:18:29] <deathanchor> but all updates/inserts require that shardkey as part of the query.
[18:18:57] <gcfhvjbkn> i wonder why mongo made it unique though.. i didn't create the index
[18:19:30] <christo_m> deathanchor: maybe you know?
[18:21:01] <deathanchor> christo_m: not sure what you mean by inserting seasons/eps based on title of show.
[18:21:33] <christo_m> deathanchor: i mean i look up a show by its title, and insert a season and episode...
[18:21:54] <christo_m> the season will have a name , maybe Season 3, maybe just 3, and the episode will have a title
[18:22:01] <christo_m> mabye Episode 2, maybe an actual title
[18:22:03] <deathanchor> so a title can have multiple seasons/episodes?
[18:22:08] <christo_m> of course..
[18:22:11] <christo_m> thots how shows work.
[18:22:57] <deathanchor> can we clarify first title is "showtitle" and second title is "Episode title" ?
[18:23:14] <christo_m> that is correct.
[18:23:26] <deathanchor> what info do you want to insert?
[18:23:59] <christo_m> deathanchor: http://pastebin.com/CpNUuYat something like this
[18:24:16] <christo_m> i want to post data i find.. and have my insert or upsert or whatever on the server side push the information appropriately
[18:24:24] <christo_m> im pretty sure i want an upsert
[18:24:29] <christo_m> to create shows that dont already exist etc
[18:24:36] <deathanchor> christo_m: correct
[18:24:43] <christo_m> im sorry, i dont know what to use.. i pasted pastie last time and people didnt like that
[18:24:58] <deathanchor> gist is nice becuase you can comment on it
[18:25:04] <deathanchor> gist.github.com
[18:25:08] <deathanchor> but it's not anony
[18:25:45] <christo_m> deathanchor: okay ill use that from now on
[18:26:10] <christo_m> anyway, do you know what im trying to do?
[18:26:12] <christo_m> im not sure if im being clear
[18:26:32] <christo_m> im scraping some links with casperjs and trying to hit the api to insert them as i find them.
[18:28:38] <deathanchor> my best guess without trying: https://gist.github.com/deathanchor/441f0dc2e80254a46fbb
[18:29:01] <christo_m> deathanchor: i need to match by season title as well though
[18:32:11] <christo_m> deathanchor: like potentially ill be inserting a new sesaon, or appending an episode to an existing season by season name
[18:32:55] <deathanchor> yeah well this is where a model helps and should know how to handle this, I haven't done much updates in to nested docs like that
[18:33:56] <christo_m> deathanchor: true.
[18:34:17] <christo_m> are you saying i should define that in my schema as a nested collection instead of doing it how im doing it?
[18:35:07] <deathanchor> well with mongoengine you can just define another class as a subdoc
[18:35:13] <deathanchor> not sure how mongoose does thing
[18:35:21] <deathanchor> updated gist again with another guess
[18:35:47] <christo_m> thats how you do it with mongoose, you just define another Schema
[18:36:16] <christo_m> ya i see your gist i guessed that might be it
[18:36:18] <christo_m> ill test
[18:36:23] <christo_m> deathanchor: thank you for you're help :)
[18:36:33] <deathanchor> eh, best I can do
[18:36:59] <deathanchor> now to mess with some shards, configs, and syncing...
[18:45:19] <Owner> well im not sure where ill find the logs from last night,...its logging so much that i would be digging for a while
[18:45:51] <Owner> trying to figure out how the replica is even setup
[18:47:49] <deathanchor> can you connect directly on one member?
[18:47:58] <Owner> i can connect to all 3
[18:47:59] <deathanchor> do an rs.config()
[18:48:05] <deathanchor> and rs.status()
[18:48:42] <Owner> ok
[18:49:56] <Owner> ive never used mongo before so i dont know how to do that
[18:52:52] <Owner> i have ssh access
[18:58:11] <Owner> i ran mongo but i get this errno:104 Connection reset by peer 127.0.0.1:27017
[18:58:18] <MacWinne_> does compacting your mongo collection only make sense in situations where you are adding and removing documents? if our collection is basically only adding activity documents, would there be any reason to do a compact?
[18:58:30] <cheeser> MacWinne_: probably not
[18:59:52] <MacWinne_> cool, thanks
[19:01:52] <Owner> why is mongo sayign connection reset by peer
[19:03:16] <cheeser> check the logs
[19:04:52] <Owner> the logs are being spammed too much
[19:05:45] <deathanchor> Owner: ssh to one of the machines, run mongo on there
[19:05:51] <Owner> thats what im doing
[19:05:59] <Owner> thats where the error comes from
[19:06:02] <deathanchor> Owner: is it a shard replset?
[19:06:13] <Owner> i dont know what it is, i didnt build it
[19:06:15] <deathanchor> Owner: what port is mongod running on?
[19:06:31] <Owner> 27017 i think, but anyway nevermind, i gave up on it
[19:06:42] <Owner> ill do something else
[19:07:06] <Owner> ill let someone else worry about it, i didnt pick mongo...
[19:07:52] <deathanchor> ^over
[19:39:49] <christo_m> deathanchor: hmm looks like that query didnt work
[19:39:49] <christo_m> :(
[19:40:51] <deathanchor> yeah I don't do much nested stuff.
[19:43:22] <christo_m> damnit
[19:46:30] <christo_m> anyone else around?
[19:47:05] <saml> what are you doing christo_m ?
[19:47:32] <charnel> can I create a compound index from 2 fields in an existing collection with more than 20K documents. Both fields are integer
[19:47:58] <christo_m> saml: https://gist.github.com/christomitov/54618a62f501ff29cf50
[19:48:13] <christo_m> saml: my schema and the update im trying to do
[19:49:01] <saml> nested $addToSet works?
[19:49:07] <christo_m> saml: no.
[19:49:10] <christo_m> well, not in my case
[19:49:24] <saml> what's req?
[19:49:33] <saml> req.body.episode_title. a constant?
[19:49:52] <christo_m> saml: this is what gets set http://i.imgur.com/4XJnFMM.png
[19:49:54] <christo_m> saml: no it changes.
[19:49:58] <christo_m> based on the parameter
[19:50:03] <christo_m> those are POST request params
[19:51:40] <christo_m> saml: basically im trying to upsert show episodes as they come in from the post
[19:51:56] <christo_m> but they come with a season too, which may exist in the database, in which case i want to just append to that season.
[19:52:39] <cheeser> $addToSet the season then $push the episode
[19:54:47] <saml> christo_m, i'd build episodes array in app and do $set
[19:55:01] <saml> actually what cheeser said
[19:55:25] <saml> can client POST repeating episodes?
[19:55:48] <saml> probably saner to reconstruct episodes array in the app
[19:56:25] <christo_m> saml: im scraping them as they come in and posting to the api
[19:56:43] <christo_m> cheeser: sorry can you edit my gist im not sure how to do that
[19:56:47] <christo_m> syntax is unfamiliar to me still
[19:56:56] <saml> a tv show has multiple seasons and each season has episodes
[19:57:10] <christo_m> correct.
[19:57:27] <christo_m> but an episode comes in as a single call with a season and show
[20:00:04] <fewknow_> holla
[20:00:21] <christo_m> tru
[20:00:51] <saml> christo_m, are you crawling tv review articles?
[20:00:56] <christo_m> saml: no sir.
[20:02:34] <christo_m> saml: so...
[20:02:34] <saml> so seasons is an array, sorted by what?
[20:03:01] <saml> seasons: [{name:'Season 1', eps:[]}, {name: 'Season 0', eps:[...]}]
[20:03:27] <saml> you must guarantee there's no repeating Season 1 in seasons array
[20:03:45] <christo_m> https://gist.github.com/christomitov/54618a62f501ff29cf50
[20:03:47] <christo_m> is this how?
[20:04:04] <christo_m> well no, there could be many Season 1's coming in for that show
[20:04:07] <christo_m> it should be appending to its episode list.
[20:04:09] <christo_m> array*
[20:04:11] <saml> you're considering seasons as object
[20:04:49] <saml> give me example document. your Schema says seasons is an array but your update assumes seasons is object
[20:05:00] <saml> and what's that Schema?
[20:05:06] <christo_m> the schema is correct
[20:05:10] <christo_m> it should be an array
[20:05:18] <christo_m> of season objects, each having an array of episode objects
[20:05:25] <christo_m> i clearly cant write the query which is why im asking.
[20:06:03] <saml> db.tvshows.update({_id: TVSHOWID}, {$push: {seasons: YOURSEASON}})
[20:07:08] <saml> build a full season, YOURSEASON, and update the tvshow with TVSHOWID?
[20:07:22] <christo_m> i cant build a full season..
[20:07:29] <christo_m> i get things coming in one at a time man.
[20:07:35] <saml> i'm a girl sorry
[20:07:36] <christo_m> im going to get episodes one by one
[20:08:52] <christo_m> i guess im not explaining this clearly
[20:08:58] <christo_m> oh well gg i guess
[20:10:14] <christo_m> im getting episodes one by one that may belong to a season that exists or not
[20:10:27] <christo_m> if it does exist i need to update the existing entry (the title of the seasons will match)
[20:10:32] <christo_m> then i need to push the episode into it.
[20:10:48] <christo_m> saml: does that make sense?
[20:10:56] <saml> yah do it in the app
[20:11:01] <christo_m> i dont know what that means.
[20:11:49] <saml> var show = db.tvshows.find({_id: TVSHOWID}); var updatedShow = insertNewEpisode(httpRequest, show); db.shows.update({_id: TVSHOWID}, updatedShow);
[20:12:27] <christo_m> that doesnt make sense..
[20:12:32] <christo_m> im hitting the db twice?
[20:12:38] <saml> yes
[20:12:42] <christo_m> why would i do that..
[20:12:50] <saml> because otherwise, you can't web scale
[20:13:02] <saml> your schema sucks
[20:13:09] <christo_m> then how should my schema be
[20:13:27] <christo_m> in my head a show is a collection of seasons, and seasons are a collection of episodes.
[20:13:33] <christo_m> doesnt get easier than that really.
[20:14:11] <saml> insert into episodes(showId, seasonId, epNumber, epTitle) VALUES ('Seinfeld', 2, 14, 'Somthing something bad episode');
[20:15:33] <christo_m> the reason i want to do it this way is so that i can build the accordians on the front end very easily..
[20:15:40] <christo_m> if i do it that way im going to have to manually construct those things.
[20:17:09] <saml> you could get (season,episode) pair as follows: (1,1), (1,2), (3,1), (2,1), (3,2), (2,2) ...
[20:17:26] <saml> i don't think mongodb has ability to sort array during update
[20:18:12] <christo_m> interesting..
[20:18:18] <christo_m> i guess ill flatten my structure then.
[20:18:35] <christo_m> that *is* what you're suggesting isn't it?
[20:18:54] <saml> i'm suggesting your app or script to construct seasons array
[20:19:11] <saml> instead of relying on $push or $addToSet
[20:19:51] <saml> i mean, i'm suggseting you do two db operations. first to GET show data. manipulate seasons array. then SET it back
[20:20:13] <christo_m> i see
[20:20:23] <christo_m> i just dont think its proper to hit the db twice for this.
[20:20:25] <christo_m> but whatever
[20:21:12] <saml> if you're using normalized db, you don't have to hit twice
[20:21:23] <saml> but your schema is denormalized, which is common for mongodb
[20:21:50] <saml> just think like REST/HTTP. you GET resource. modify representation. and PUT
[20:22:00] <saml> you read a file. modify. write the file.
[20:22:57] <christo_m> ok
[20:23:05] <saml> in normalized world, you can keep appending to file, or create new files.. and you can join things to construct final data during read. but mongodb doesn't have such capability
[20:23:29] <saml> instead, mongo is heavily read optimized. so during write, you do some calculation
[20:29:31] <gcfhvjbkn> speaking of write operations, how sharding affects write performance?
[20:29:44] <gcfhvjbkn> i used to have 1.5-2k writes/sec on one instance
[20:30:20] <gcfhvjbkn> now i sharded my data so that each one out of 5 servers has a portion that it has to write
[20:30:35] <gcfhvjbkn> surprisingly, i still have 1.5-2k writes/sec
[20:30:53] <gcfhvjbkn> i don't even see what is that that i am runnin into
[20:31:14] <cheeser> well, you'll still write to one host (a mongos) which will then fan out writes according to any sharding you have set up.
[20:31:33] <cheeser> sharding is more for scaling that performance.
[20:31:41] <cheeser> well, s/performance/speed/
[20:32:05] <gcfhvjbkn> hmm, that's a pity; if i recall correctly write performance was one of the three reasons mongo docs have for using sharding
[20:32:15] <gcfhvjbkn> what if i have more mongos?
[20:32:24] <gcfhvjbkn> i've got 5 atm
[20:32:48] <gcfhvjbkn> i mean mongos instances
[20:33:45] <gcfhvjbkn> http://www.slideshare.net/daumdna/mongodb-scaling-write-performance
[20:33:58] <gcfhvjbkn> these guys here too have some success with sharding..
[20:37:33] <deathanchor> gcfhvjbkn: did you balance the data over to the shards?
[20:38:11] <gcfhvjbkn> deathanchor: i did
[22:00:26] <Kamuela> Does mongo support it's own HTTP server?
[23:30:13] <Doyle> Will changing the hostname of a server while mongod is running cause any issues?
[23:44:45] <cheeser> it'll probably break your replica set
[23:54:49] <Doyle> cheeser, the instances will still remain resolvable by the names entered in the config
[23:55:47] <Doyle> a dhcp optinoset is incorrect. It's appending a misc domain on the end of the private DNS names so the instances report their hostnames as ip-x-x-x-x.ec2.internalmiscdomain.com
[23:56:03] <Doyle> thsi breaks things that rely on the instances reporting their own hostname
[23:56:10] <Doyle> like some monitoring and control scripts
[23:56:55] <Doyle> Their bad hostnames aren't even resolvable...
[23:57:17] <Doyle> I think I should be able to hostname xyz them and be fine
[23:57:30] <Doyle> or adjust the dns optinoset and be fine
[23:57:44] <Doyle> thoughts?