[02:19:08] <ra21vi> I have a quick question. I searched in doc, but still not clear how to achieve it. My doc has a field which is array. It is meant to store string vallues. I want to query for all doc where that list field is not empty. Is it possible?
[02:47:01] <DigitalKiwi> are replica sets of 2 machines alright or should it be 3 or is that only necessary if you have enough traffic that you need two handling requests and one for "disaster recovery"?
[02:50:22] <jY> you can put an arbiter on one of the 2 machines
[02:50:27] <vsmatck> DigitalKiwi: You need 3 to be able to do a master election. But you can do 2 and there is a way to put up a phoney server that only participates in elections.
[02:50:33] <jY> 2 votes isn't enough for a majority
[02:51:49] <DigitalKiwi> and what are "config servers"?
[02:52:58] <vsmatck> DigitalKiwi: They're not related to replica sets.
[02:53:09] <jY> DigitalKiwi: that is when you shard
[02:54:21] <vsmatck> Replica sets are needed for sharding. If you have two shards with no replicas you double your risk of failure. So you use replica sets to mitigate the risk.
[02:54:52] <vsmatck> But sharding should be avoided unless you need it.
[02:55:11] <vsmatck> But then, it's good to know about. Because if you think you may need it you can make choices that will help you later if you do.
[02:55:55] <DigitalKiwi> sharding is for when the size of your data grows beyond the size of the storage on each replica set?
[02:57:11] <DigitalKiwi> or maybe even, the size of a collection is more appropriate, you could get around sharding if each collection wasn't bigger than each server, right?
[02:57:11] <vsmatck> Some people use it for performance. You can scale writes with it. Some people have trouble keeping their indexes in memory without sharding.
[02:57:18] <deoxxa> DigitalKiwi: can also be for when you have too much query load, or you want to do some tricky stuff with putting shards in different locations for response times, etc
[02:59:32] <DigitalKiwi> The guys at urban airship were using mongodb and they switched off most of their stuff to postgresql cause they said they were having problems, one of the more interesting things I got out of it, which I'm not sure if it's good or bad, but interesting...was running more than one mongod on the same server as shards to increase write speed or slt
[03:01:16] <vsmatck> It'd be intersting to talk about specific parts of it. http://schmichael.com/files/schmongodb/Scaling%20with%20MongoDB%20(with%20notes).pdf
[03:01:28] <DigitalKiwi> and then infoq lost all credibility with me by posting about that anonymous pastebin of that FUD some guy was trying to spread :(
[03:05:55] <DigitalKiwi> anyway I'm not saying any of these are reasons I shouldn't use mongo as I like it so far and want to use it, but I do want to avoid some of these problems that the others have had
[03:06:00] <DigitalKiwi> uh they lost a bunch of data
[03:06:36] <DigitalKiwi> maybe it wasn't recent I don't know, I thought it was but...trying to find
[03:12:37] <jY> but it isn't the right tool for every job
[03:19:39] <DigitalKiwi> do you think with urban airship that was just a case of them not knowing the tools or was mongo to blame or?
[03:20:29] <vsmatck> That question is too reduced. You can see exactly what they did and decide if it's a unavoidable problem with mongo, or not.
[03:20:48] <vsmatck> You'd want to, if you were actually going to use mongo for something real, anyways. , ,
[03:22:37] <DigitalKiwi> I have this theory that for every 1 person/company that whinges loudly about a tool not working there are dozens of people that it works great for that remain quiet
[03:24:17] <DigitalKiwi> I'm probably not the first or only person with that theory but still
[03:24:48] <deoxxa> damn it, i'm trying to use this hammer to make pasta and it's not working
[03:26:30] <deoxxa> though i did find it was easy to get started with - all my carpenter friends had been talking about it for a while
[03:38:13] <DigitalKiwi> vsmatck: I think it looks like they tried to scale vertically instead of horizontally and that wasn't how mongod is meant to work. I'm not sure either way whether the dataset they were using for it was appropriate to mongo but I do think they tried to solve problems they encountered assbackwards. does any of that seem reasonably correct?
[03:39:40] <DigitalKiwi> and yeah I've read this a few times http://blog.schmichael.com/2011/11/05/failing-with-mongodb/ and read those slides as well but feel I'm missing a lot of the story by only seeing slides and not hearing what was said for them
[06:44:55] <jasonl_> i need to use skip() in a large data set in mongodb but i hit the exception "getMore: cursor didn't exist on server, possible restart or timeout?" would setting batchSize help in this case? or noly set notimeout can solve this issue?
[10:17:58] <circlicious> where are the mongo hipsters?
[12:37:46] <UnSleep> http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo im working on this...
[12:38:08] <UnSleep> im thinking about how to manage it... does the order/number of tags affects the performance?
[12:39:03] <deoxxa> wait, people take that page seriously?
[12:45:05] <UnSleep> its better to try to do relationships between documents
[13:09:09] <UnSleep> or create an array with results and search with an unique tag... (the inverse way)
[17:00:59] <mattg_> i set the connection to notimeout and in java code i do explicitly close the cursor with finally block. but just want to double check for sure. if the code fails to close the cursor, is it possible to manually perform close operation in mongo shell? or any way to achieve this?
[19:53:07] <ra21vi> I am trying to iterate over all users stored in mongodb and divide them in bucket. I need to distribute some users to Async Task to process. In pymongo, how can I do it without loading all in memory
[19:53:49] <ra21vi> right now I have around 2000 records. But i am more concerned when it grows to 1M. Is there effective way to do it
[19:53:57] <vsmatck> "all in memory" on the mongo server, or your application?