[00:24:21] <jmo> Hey there. I am confused about the proper use case for mongo. If I have a data model that seems to fit just fine in a SQL database, is it a good idea to think about using mongo?
[00:26:18] <Boomtime> jmo: are you prepared to learn something new or would you rather only ever use what you are familiar with?
[00:27:17] <Boomtime> MongoDB is a pretty general purpose database - but like _all_ databases, unless you know how to use it, you're probably not going to have a good experience
[00:27:42] <Boomtime> that means you need to be prepared to learn something new, that is all
[00:28:39] <jmo> Boomtime: very willing to learn something new. the flexible schema part for development is mostly what draws me. i'm worried that it's meant more for unstructured / inconsistently structured data
[00:28:57] <jmo> could you really take a relational model and map it into mongo without trouble?
[00:32:39] <jmo> i know this channel will tend to be biased, but is there a case when you definitely *shouldn't* use mongo?
[00:33:30] <Boomtime> when you don't know what you're doing - otherwise nearly anything can work - just like an sql
[00:36:52] <jmo> Boomtime: thanks for the tip, sounds like a plan :)
[00:37:20] <jmo> what about all the stuff i keep reading about losing data?
[00:37:41] <jmo> it seems like that's just because people aren't using 'safe'
[04:48:00] <alaee_> Hi. Can anyone help me with mongodb optimization please? I have this very simple 100 documents long collection and when I'm using it in a Flask app with uwsgi and nginx, the bottleneck is my mongodb and I can't go beyond 200 reqs/sec.
[04:48:57] <alaee_> BTW, I use mongoengine for connecting to my mongo
[06:36:06] <macwinner> alaee_: are you using any indexes?
[06:36:37] <alaee_> I have the default index on _id and a unique index on one of my four fields.
[06:37:42] <alaee_> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passa
[06:37:42] <alaee_> ges, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum. 1394/6/5
[06:38:21] <macwinner> i'm not familiar with flask and uwsgi
[06:38:40] <macwinner> and haven't used nginx really.. so i'm not sure if they support some sort of connection pooling
[06:53:28] <alaee> <alaee_> Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem I
[06:53:34] <alaee> <alaee_> I send requests using ab -n 10000 -c 100 and I get the results in 50 seconds. Meaning 200 reqs/sec
[07:30:54] <alaee> Hi. Can anyone help me with mongodb optimization please? I have this very simple 100 documents long collection and when I'm using it in a Flask app with uwsgi and nginx, the bottleneck is my mongodb and I can't go beyond 200 reqs/sec.
[07:54:47] <mtree> its better to do couple of aggregate qureries, or make one search and reduce results to the required format?
[07:56:29] <kali> mtree: i try to deport as much work as reasonable to the application server, as this part is easier to scale than a database
[07:56:52] <kali> mtree: not sure if that answer your question, but without cardinalities anyway, there's not much that can be said
[07:58:05] <mtree> kali: i wrote code that gets LOTS of records from the collection and I filter/reduce these results in code to get what I need (sum of some conditional values)
[07:58:21] <mtree> but my boss says its not cool and that I should do it all using aggregate
[07:58:38] <mtree> even if it means triggering more than 1 query
[07:58:48] <kali> mtree: well, moving LOTS of records over the network does not sound good :)
[09:55:40] <Derick> you'd still need to do it in two pipeline stages though
[09:55:50] <mtree> Derick: can u provide me example?
[09:56:50] <Derick> mtree: it's probably easier if you show me your input documents, and describe (with example) what you'd like as output (in a pastebin)
[10:24:05] <charnel> How can I optimize a query which is looking for ip range. I am using $gt and $le which is slow. But I converted the Ip addresses to integer which gained me some time. Basically now I am only looking if the given integer is between min and max values. Can I improve performance more ?
[10:35:42] <kali> charnel: you have an index on the field, i assume ?
[12:35:02] <ssutch> we are trying to add a member to our replicaset but it stays in startup2 state forever and continually crashes with this message: 2015-08-27T12:31:10.370+0000 E REPL [rsBackgroundSync] sync producer problem: 16235 going to start syncing, but buffer is not empty
[12:35:17] <ssutch> running 3.0.6 on the entire cluster
[12:35:45] <ssutch> it's a 2-shard cluster with 3 configs and we are trying to startup a node on rs0
[12:44:48] <deathanchor> ssutch: no other errors around that line?
[12:49:01] <ssutch> the other that commonly occurs directly after a replicaset failure is [rsSync] 10276 DBClientBase::findN: transport error (PRIMARYNAME)
[12:49:19] <ssutch> ony ever happens after it gets all the indexes
[12:49:56] <ssutch> if someone AT mongodb were available for immediate support im sure my boss would be happy to pay for someone to help us out. today is our launch day hah
[12:51:17] <cheeser> support staff are standing by. :)
[12:51:36] <deathanchor> ssutch: can you gist/pastebin the logs with the errors? (you can sanitize if you want)
[12:56:10] <ssutch> the only potentially weird thing is we are using an ELB to route traffic (so we can set up a cname in route53). it's worked for us for weeks though
[12:56:50] <deathanchor> single CNAME resolution? (meaning the CNAME points to an A record)
[12:57:19] <deathanchor> I have had problems with CNAME -> CNAME -> A resolution.
[13:12:31] <mbwe> hi everybody i have a collection devices and it has a name field, how could i do an insert only when the name field does not exist already
[13:13:37] <mbwe> thus {name: "#010101"} and if there is already a record with name: "#010101" i want to cancel that insert
[14:07:17] <Grixa> I have a problem with slowly finding data in collection (~2500000 records). My script executes search query about 500000 times and it takes about 10 minutes. Collection contains records with windows paths (for example: C:\Users\admin\Documents\1.pdf).
[14:07:51] <deathanchor> Grixa: index? did you do an explain on the query?
[14:09:14] <deathanchor> so 45ms is too slow for you?
[14:11:05] <Grixa> It is too slow, while i am querying search for 500000 times in my python script. My idea is to store some hash (CRC32 for example) and use it for indexed search. Or it will be meaningless?
[14:12:00] <deathanchor> my calc is 45ms per search x 500k time = 6.25 minutes
[14:12:24] <deathanchor> what are you returning from the doc?
[14:12:53] <deathanchor> indexOnly" : false <- if you want faster try to make this true by making a compound index of what you want and what you are looking for
[14:13:21] <Grixa> I am using find_one() from pymongo. Collection contains only "path" field
[14:13:50] <deathanchor> then use the filter option to return only the path.
[14:18:40] <deathanchor> yeah gotta remember that mongo will return the full doc by def which requires a read from disk, but if the index is in memory and all you need is in the index use projection to avoid disk reads
[14:22:35] <deathanchor> Grixa: just keep your index size in mind when spec'ing out your machine memory, if it gets too large then you'll need more memory to avoid paging.
[14:30:08] <Grixa> Thanks! Aaand another stupid question: if i want only recieve _id from my collection, i use this query: db.ups_paths.find({"path" : "..." }, {"path": 0}). And this query again not used index ( "indexOnly" : false )
[14:52:46] <jpbjpb> I used mongoimport to pull in some JSON, but I could’nt figure out how to import the references to another collection, so I’m trying to update by hand
[14:55:42] <cheeser> for the most part it works like a champ. just some corner cases where people want to reuse instances that weren't designed for it.
[14:55:43] <jpbjpb> but when I run that in the REPL, it returns three dots and hangs
[15:19:46] <cheeser> because it's using the index path_1 to resolve the query but you're suppressing the path field so it has to use more than the index to return documents
[16:41:33] <StephenLynx> knowledge without criticism is useless.
[16:41:59] <StephenLynx> torvalds is a great role model.
[16:47:06] <deathanchor> please critic when I ask, "What do you think about?" but unwanted criticism is just that... unwanted.
[16:50:40] <StephenLynx> thats not how criticism works.
[16:51:24] <StephenLynx> let say you create some bullshit, ok?
[16:51:34] <StephenLynx> it doesn't work, its completely wrong
[16:51:44] <StephenLynx> but you say "I don't want criticism :^)"
[16:52:04] <StephenLynx> the less someone wants criticism, the more it needs.
[16:52:46] <StephenLynx> because when you just cover your ears and scream "LALALALA I CAN'T HEAR YOU", it signs you are on your way to completely screw up
[16:53:18] <deathanchor> agreed, but if he's doing that? then why provide the criticism anyway?
[16:53:28] <deathanchor> just wasting both people's time
[16:53:52] <StephenLynx> if he doesn't listen, at least others have the chance.
[16:54:07] <StephenLynx> and its my time to be used as I wish, and he is free to use /ignore on me.
[16:54:25] <StephenLynx> so the "wasted time" is a moot point.
[16:55:20] <deathanchor> StephenLynx: so I'm thinking of using a framework for my app to work with mongo. My main codebase is python. What do you suggest?
[16:59:11] <StephenLynx> if you don't keep it simple with the regular driver.
[16:59:17] <GothAlice> At the same time, I use the MongoEngine ODM, as it provides many features the raw driver does not.
[16:59:19] <StephenLynx> more dependencies, more problems.
[17:00:04] <GothAlice> I.e. it provides a proper "model" layer to your app, with schema, data validation and transformation, event callbacks / triggers, and can even handle pseudo-relational stuff for you, including reverse delete rules.
[17:00:24] <deathanchor> yeah, I prefer frameworks that specialize in one thing. I was using django for playing around, but it is too over reaching with somethings.
[17:00:37] <GothAlice> Indeed. It's difficult to grow past Django's built-in limits.
[17:00:59] <StephenLynx> imo, these ODMs provide stuff that you shouldn't use in the first place, according to mongo's design.
[17:01:21] <StephenLynx> make sure to look for benchmarks
[17:01:21] <deathanchor> yeah that's why I'm just going to use mongoengine and find a web framework that just handles requests and another to handle serving content.
[17:01:29] <StephenLynx> so you know what compromises you are making.
[17:01:38] <deathanchor> the ODM just makes things easy for inital setup
[17:01:46] <StephenLynx> from a performance point of view
[17:01:49] <deathanchor> you can lax the rules I believe on mongoengine
[17:02:01] <GothAlice> deathanchor: Mild bit of self-promotion, here, but my web framework (WebCore) is about as light-weight as you can get. (The web.core module is < 300 lines, and everything is optional.)
[17:02:26] <deathanchor> I'll check it out, was going to look at tornado
[17:02:51] <GothAlice> pymongo isn't asynchronous, thus you'll need a different base driver to use Tornado properly.
[17:03:11] <Owner> so someone told me they shutdown one mongo of a cluster of 3
[17:03:43] <deathanchor> Owner: depends on the config
[17:03:44] <GothAlice> deathanchor: https://github.com/marrow/WebCore/blob/rewrite/example/monster.py is the "one example to rule them all" for WebCore 2.
[17:04:09] <Owner> deathanchor, what could be missing?
[18:25:45] <christo_m> deathanchor: okay ill use that from now on
[18:26:10] <christo_m> anyway, do you know what im trying to do?
[18:26:12] <christo_m> im not sure if im being clear
[18:26:32] <christo_m> im scraping some links with casperjs and trying to hit the api to insert them as i find them.
[18:28:38] <deathanchor> my best guess without trying: https://gist.github.com/deathanchor/441f0dc2e80254a46fbb
[18:29:01] <christo_m> deathanchor: i need to match by season title as well though
[18:32:11] <christo_m> deathanchor: like potentially ill be inserting a new sesaon, or appending an episode to an existing season by season name
[18:32:55] <deathanchor> yeah well this is where a model helps and should know how to handle this, I haven't done much updates in to nested docs like that
[18:58:11] <Owner> i ran mongo but i get this errno:104 Connection reset by peer 127.0.0.1:27017
[18:58:18] <MacWinne_> does compacting your mongo collection only make sense in situations where you are adding and removing documents? if our collection is basically only adding activity documents, would there be any reason to do a compact?
[20:11:01] <christo_m> i dont know what that means.
[20:11:49] <saml> var show = db.tvshows.find({_id: TVSHOWID}); var updatedShow = insertNewEpisode(httpRequest, show); db.shows.update({_id: TVSHOWID}, updatedShow);
[20:23:05] <saml> in normalized world, you can keep appending to file, or create new files.. and you can join things to construct final data during read. but mongodb doesn't have such capability
[20:23:29] <saml> instead, mongo is heavily read optimized. so during write, you do some calculation
[20:29:31] <gcfhvjbkn> speaking of write operations, how sharding affects write performance?
[20:29:44] <gcfhvjbkn> i used to have 1.5-2k writes/sec on one instance
[20:30:20] <gcfhvjbkn> now i sharded my data so that each one out of 5 servers has a portion that it has to write
[20:30:35] <gcfhvjbkn> surprisingly, i still have 1.5-2k writes/sec
[20:30:53] <gcfhvjbkn> i don't even see what is that that i am runnin into
[20:31:14] <cheeser> well, you'll still write to one host (a mongos) which will then fan out writes according to any sharding you have set up.
[20:31:33] <cheeser> sharding is more for scaling that performance.
[22:00:26] <Kamuela> Does mongo support it's own HTTP server?
[23:30:13] <Doyle> Will changing the hostname of a server while mongod is running cause any issues?
[23:44:45] <cheeser> it'll probably break your replica set
[23:54:49] <Doyle> cheeser, the instances will still remain resolvable by the names entered in the config
[23:55:47] <Doyle> a dhcp optinoset is incorrect. It's appending a misc domain on the end of the private DNS names so the instances report their hostnames as ip-x-x-x-x.ec2.internalmiscdomain.com
[23:56:03] <Doyle> thsi breaks things that rely on the instances reporting their own hostname
[23:56:10] <Doyle> like some monitoring and control scripts
[23:56:55] <Doyle> Their bad hostnames aren't even resolvable...
[23:57:17] <Doyle> I think I should be able to hostname xyz them and be fine
[23:57:30] <Doyle> or adjust the dns optinoset and be fine