PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Sunday the 22nd of July, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[02:19:08] <ra21vi> I have a quick question. I searched in doc, but still not clear how to achieve it. My doc has a field which is array. It is meant to store string vallues. I want to query for all doc where that list field is not empty. Is it possible?
[02:47:01] <DigitalKiwi> are replica sets of 2 machines alright or should it be 3 or is that only necessary if you have enough traffic that you need two handling requests and one for "disaster recovery"?
[02:50:11] <jY> DigitalKiwi: it is 3
[02:50:22] <jY> you can put an arbiter on one of the 2 machines
[02:50:27] <vsmatck> DigitalKiwi: You need 3 to be able to do a master election. But you can do 2 and there is a way to put up a phoney server that only participates in elections.
[02:50:33] <jY> 2 votes isn't enough for a majority
[02:51:49] <DigitalKiwi> and what are "config servers"?
[02:52:58] <vsmatck> DigitalKiwi: They're not related to replica sets.
[02:53:09] <jY> DigitalKiwi: that is when you shard
[02:53:10] <vsmatck> It's a sharding thing.
[02:53:12] <DigitalKiwi> ahh
[02:53:30] <DigitalKiwi> I was reading the "simple sharding guide" and it uses both replica sets and sharding
[02:53:57] <DigitalKiwi> http://www.mongodb.org/display/DOCS/Simple+Initial+Sharding+Architecture that
[02:53:59] <jY> yes that is the correct way to shard
[02:54:05] <jY> each shard should be a replicaset
[02:54:21] <vsmatck> Replica sets are needed for sharding. If you have two shards with no replicas you double your risk of failure. So you use replica sets to mitigate the risk.
[02:54:52] <vsmatck> But sharding should be avoided unless you need it.
[02:55:11] <vsmatck> But then, it's good to know about. Because if you think you may need it you can make choices that will help you later if you do.
[02:55:55] <DigitalKiwi> sharding is for when the size of your data grows beyond the size of the storage on each replica set?
[02:56:25] <deoxxa> DigitalKiwi: not always
[02:57:11] <DigitalKiwi> or maybe even, the size of a collection is more appropriate, you could get around sharding if each collection wasn't bigger than each server, right?
[02:57:11] <vsmatck> Some people use it for performance. You can scale writes with it. Some people have trouble keeping their indexes in memory without sharding.
[02:57:18] <deoxxa> DigitalKiwi: can also be for when you have too much query load, or you want to do some tricky stuff with putting shards in different locations for response times, etc
[02:57:26] <deoxxa> DigitalKiwi: that is sharding
[02:59:32] <DigitalKiwi> The guys at urban airship were using mongodb and they switched off most of their stuff to postgresql cause they said they were having problems, one of the more interesting things I got out of it, which I'm not sure if it's good or bad, but interesting...was running more than one mongod on the same server as shards to increase write speed or slt
[03:00:14] <vsmatck> Infamous presentation. :)
[03:00:19] <deoxxa> yep
[03:00:24] <DigitalKiwi> hmm?
[03:01:16] <vsmatck> It'd be intersting to talk about specific parts of it. http://schmichael.com/files/schmongodb/Scaling%20with%20MongoDB%20(with%20notes).pdf
[03:01:28] <DigitalKiwi> and then infoq lost all credibility with me by posting about that anonymous pastebin of that FUD some guy was trying to spread :(
[03:02:27] <vsmatck> Another infamous thing. heh
[03:02:47] <vsmatck> You know that saying, "there are tools that people complain about, and tools that no one uses".
[03:04:32] <DigitalKiwi> and foursquare had some issues recently? too right
[03:05:00] <vsmatck> Need to be specific.
[03:05:55] <DigitalKiwi> anyway I'm not saying any of these are reasons I shouldn't use mongo as I like it so far and want to use it, but I do want to avoid some of these problems that the others have had
[03:06:00] <DigitalKiwi> uh they lost a bunch of data
[03:06:36] <DigitalKiwi> maybe it wasn't recent I don't know, I thought it was but...trying to find
[03:06:54] <jY> 4sq didn't lose data
[03:07:43] <jY> they use aws and added shards too late and ran out of IO
[03:07:55] <jY> as chunks moved around from their shards
[03:08:38] <DigitalKiwi> so it was a preventable case they got by ignoring all the warnings i've seen of waiting too long to start adding shards
[03:11:24] <jY> pretty much
[03:12:10] <jY> there were some other stuff.. like most of their documents were under 4k block size of their file system
[03:12:23] <jY> so it was doing a lot more work to move chunks they it needed to be done.
[03:12:29] <jY> mongo is a great tool
[03:12:37] <jY> but it isn't the right tool for every job
[03:19:39] <DigitalKiwi> do you think with urban airship that was just a case of them not knowing the tools or was mongo to blame or?
[03:20:29] <vsmatck> That question is too reduced. You can see exactly what they did and decide if it's a unavoidable problem with mongo, or not.
[03:20:48] <vsmatck> You'd want to, if you were actually going to use mongo for something real, anyways. , ,
[03:22:37] <DigitalKiwi> I have this theory that for every 1 person/company that whinges loudly about a tool not working there are dozens of people that it works great for that remain quiet
[03:24:17] <DigitalKiwi> I'm probably not the first or only person with that theory but still
[03:24:48] <deoxxa> damn it, i'm trying to use this hammer to make pasta and it's not working
[03:24:51] <deoxxa> nobody should use hammers
[03:25:06] <deoxxa> i mean first of all it didn't stir properly, took way too long to make anything
[03:25:16] <deoxxa> then when i tried to taste the sauce, i smashed my own teeth out!
[03:25:32] <deoxxa> hammers aren't scalable
[03:26:30] <deoxxa> though i did find it was easy to get started with - all my carpenter friends had been talking about it for a while
[03:38:13] <DigitalKiwi> vsmatck: I think it looks like they tried to scale vertically instead of horizontally and that wasn't how mongod is meant to work. I'm not sure either way whether the dataset they were using for it was appropriate to mongo but I do think they tried to solve problems they encountered assbackwards. does any of that seem reasonably correct?
[03:39:40] <DigitalKiwi> and yeah I've read this a few times http://blog.schmichael.com/2011/11/05/failing-with-mongodb/ and read those slides as well but feel I'm missing a lot of the story by only seeing slides and not hearing what was said for them
[06:44:55] <jasonl_> i need to use skip() in a large data set in mongodb but i hit the exception "getMore: cursor didn't exist on server, possible restart or timeout?" would setting batchSize help in this case? or noly set notimeout can solve this issue?
[10:17:58] <circlicious> where are the mongo hipsters?
[12:37:46] <UnSleep> http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo im working on this...
[12:38:08] <UnSleep> im thinking about how to manage it... does the order/number of tags affects the performance?
[12:39:03] <deoxxa> wait, people take that page seriously?
[12:45:05] <UnSleep> its better to try to do relationships between documents
[12:45:07] <UnSleep> i think
[12:46:47] <UnSleep> "i think its better than beign trying to..."
[12:49:23] <UnSleep> im thinking to order the tags in the most searched ones to speed up the searches but i dont know if its stupid
[12:59:18] <UnSleep> i could create a collection for each initial letter.... or two ones...
[12:59:44] <uatecuk> hi
[13:09:09] <UnSleep> or create an array with results and search with an unique tag... (the inverse way)
[17:00:59] <mattg_> i set the connection to notimeout and in java code i do explicitly close the cursor with finally block. but just want to double check for sure. if the code fails to close the cursor, is it possible to manually perform close operation in mongo shell? or any way to achieve this?
[19:51:03] <ra21vi> Hi
[19:53:07] <ra21vi> I am trying to iterate over all users stored in mongodb and divide them in bucket. I need to distribute some users to Async Task to process. In pymongo, how can I do it without loading all in memory
[19:53:49] <ra21vi> right now I have around 2000 records. But i am more concerned when it grows to 1M. Is there effective way to do it
[19:53:57] <vsmatck> "all in memory" on the mongo server, or your application?
[19:54:10] <ra21vi> vsmatck: in my application
[19:54:38] <vsmatck> If you use a cursor you can get stuff in chunks.
[19:54:51] <ra21vi> I am using pymongo. There i am stuck
[19:55:14] <vsmatck> Ah, I'm not familiar with pymongo. Maybe someone in here is.
[20:08:40] <ra21vi> can map reduce run access other collections?
[23:59:18] <Siegfried> Hello