[00:31:34] <Fieldy> can I get some pointers for apps that will use mongodb where my use case is several TB of gzip compressed proxy log files? I'd like to stick a few key fields in mobgodb, including the filename it came from, so we can search for terms in fields and then know what filename to be searched. versus searching all of them manually as we do now.
[00:51:08] <nyov> how do I escape special characters in a mongo shell regex? specifically I want to escape [ and ]
[00:56:35] <harttho> Can you parse strings as ints in the Aggregation Framework?
[03:16:00] <Fieldy> how can I bring gzip compressed proxy logfiles (selected fields) into mongodb? i'm not a programmer, i'm not a scripter... am I out of my league?
[03:31:56] <jdelgado> Quick question - Best way to prevent data loss in the following event: ServerA reads Doc1, ServerB reads Doc1, ServerA writes Doc1 (increment somthing), ServerB writes Doc1 with an increment, but didn't get the chance to take ServerA's increment - thus data is lost.
[03:32:36] <jdelgado> Was looking up the two phase commit, but would it prevent writes?
[08:59:17] <styles_> Hey guys, GridFS... would this be worth using to store videos (like youtube style) then just have frontend servers stream them? it seems like this would be more overhead than writing them to disk and using the filesystem
[10:07:28] <qsd> Hi, I have a schema where I can have different type of accounts, each account is a document, I need now to represent a many-to-many relations among accounts: one account (of type: 0) can follow multiple other accounts (of type: 1) with a given weight, and inversely it's as important to know which are the followers from a the type:1 accounts, (at some events they need to modify and replcate data on their followers)
[10:08:52] <qsd> so to sum up I can have find operations to read who do this account follow with which weight, and who are the followers of this accounts with which weight
[10:09:56] <qsd> I thought of separate collections for that is it a good idea?
[10:30:08] <qsd> the difference would be account would either have a from: [{id: id1, w: w1}, ...] (for followers acounts) or a to: [{id: id2, w:w2},..] (for 'emitting' accounts)
[10:30:28] <qsd> with such a schema, everything fits in account collection
[10:34:20] <kali> qsd: i meant that one http://blog.mongodb.org/post/65612078649/schema-design-for-social-inboxes-in-mongodb
[10:36:23] <qsd> in the post I don't unbderstand some example mention 'recipient', although their document don"t have such field, 'to' maybe?
[10:50:01] <qsd> replication is important for data availability in find queries? or for possible data corruption, data loss
[10:50:03] <sinclair|work> Nodex: so, im a bit curious as to how to scale out over redis, if the redis master is the one getting the insane writes from each application
[10:50:30] <kali> qsd: availibility, and read scalability
[10:50:31] <sinclair|work> Nodex: im specifically thinking about scaling out over redis + socket.io btw
[10:51:01] <Nodex> the reads get directed at the slaves and writes at the master
[10:51:31] <sinclair|work> Nodex: so, from that point of view, how should i configure socket.io for that?
[10:51:43] <sinclair|work> Nodex: also noting that i have a pub/sub component to this
[10:52:09] <sinclair|work> Nodex: also noting that the socket.io redis library expects a pub/sub and client (for read and writes)
[10:56:17] <Nodex> I've just started playing with it for (laugh) nodejs (as the front to an API)
[10:57:34] <kali> why bother ? just setup a proxy_pass in nginx
[10:58:00] <Nodex> it was more for the capabilities of spawning multiple processes and taking the blocking aspect out of play
[10:58:10] <Nodex> I had it under a proxy pass before
[10:58:47] <kali> yeah, for spawning multiple backends, passenger is great
[10:59:25] <Nodex> i like the fact it does all the management for you and I don't have to worry about spawning X processes for X cpu's
[11:00:52] <kali> yeah, that's good, particularly if you have several apps to manage on the same server, with resource allocation to balance among them
[11:01:18] <Nodex> it's got some terrible quirks I found though :/
[11:01:48] <kali> i don't remember having had big issues with it... but that was three years ago
[11:02:26] <Nodex> currently I can't upload a file bigger than 25k LOL
[11:02:36] <Nodex> sorry, a request bigger than 25k
[11:03:00] <kali> Nodex: we had big files upload going through it (with our ruby backends)
[11:03:10] <kali> so i don't think it's a structural problem
[11:04:03] <kali> Nodex: you know about nginx client_max_body_size ?
[11:51:25] <Nodex> best to just ask the question :)
[11:52:15] <BlueRayONE> does anyone here knows how to setup a replica set on only _one_ virtual server, to be able to run the mongodb-connector? I need the rs really only for the connector. i already tried the tutorial "Deploy a Replica Set for Testing and Development", but there i already get errors.
[11:54:09] <BlueRayONE> @nodex jojo, I'm just such a nice guy who says first everywhere hello ;) #courtesy
[11:57:45] <BlueRayONE> sitting there already for hours and days to get this connector running... (to sync mongoDB <> solr)
[13:18:27] <joannac> BlueRayONE: what errors areyou getting?
[13:20:26] <BlueRayONE> "replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)" and these i want to set them by rs0.initiate() in the shell. but when i can't start it thats a bit difficult. and when i start it without -replset option this doesnt work.
[13:22:52] <BlueRayONE> @joannac sorry, I should have mentioned you^^ and sorry #2, but i have to leave, time for lunch. come back later ;)
[13:23:20] <joannac> what's the output of rs.initiate()
[13:24:32] <BlueRayONE> @joannac "rs0 is not defined." (rs0 is the name). /afk
[14:20:31] <BlueRayONE> with filled jelly back. so, rs.initiate() returns { "ok" : 0, "errmsg" : "server is not running with --replSet" }. but thats not true, i started the monogd with " sudo /etc/init.d/mongod start --dbpath /var/lib/mongodb/data --replSet rs0 --smallfiles --oplogSize 128"
[15:11:06] <BigOrangeSU> Hi I had quick question about a recommended data model. I have a list of list of users and which groups they are in (all ints). { user : 12341, groups : [{id:123,ts:123123123},{id:231,ts:231321321}]} . I was wondering what the optimized data model would be for db.users.find({ groups: { $elemMatch : {id : 123 } } }).count() ? I need to scale to about 1 billion users in an average of 50 groups. Thanks!
[15:19:20] <BlueRayONE> btw, problem solved: adding "replSet=rs0" and "oplogSize=100" to the config, starting the mongod without any params and then executing in the mongo-shell "rs.initiate({"_id" : "rs0","version" : 1,"members" : [{"_id" : 0,"host" : "localhost:27017"}]})". kthxbai ;)
[15:19:21] <Nodex> what are you machine specs for 1billion documents?
[15:20:29] <qsd> BigOrangeSU: just wondering when you say changing the model like {user : 213 , groups[], groups_ts[]}, id and corresponding timesstamps will have same position in arrays?
[15:41:09] <BigOrangeSU> then sum teh totals over shard
[15:41:33] <Nodex> perhaps a separate document that's a master of all the shards that has a total amount and amount per shard etc - for convienience more than anything
[15:42:26] <Nodex> plus a total total but you can work that out by adding the shards in your app fairly easilt
[15:42:27] <BigOrangeSU> its going to get even more complex because eventually i will want to know the distinct number of users in a given combination of groups
[15:49:02] <qsd> I have documents like {_id:.., followers:[{id:231,w:0.2},{id:3, w:1.2}] , the following query will do what I want, (selecting the item with that id) but I don't understand the mecanism of projection, also should I do like you said above, with doing 2 separate arrays folrs, folrs_w,? ..... db.accounts.find({"followers.id":3}, {"followers.$": 1})
[15:58:18] <qsd> BigOrangeSU: yes it's doing the same as my query above without the projection, I'm interesting to just return the {id:.., w: ..} wanted, actually, the id are unique in this array, maybe an object is more adapted :)
[15:59:34] <Nodex> the projection should work fine
[16:05:35] <qsd> hmm with an object {_id: .., folrs:{"231": 0.2, "3": 1.3}} , are there ways to query the value associated with id="3" in folrs? that's why I went with arrays, I didn't find example for this case, I could of course query the whole map and treat it on client
[16:09:47] <mattapp__> Hey guys. Is there a way to know the resources consumed by each DB query?
[16:14:09] <qsd> I'd need a sort of db.accounts.find({"folrs": "11"}, {"folrs.11": 1}) or db.accounts.find({"folrs.11": $}, {"folrs.11": 1})
[16:14:46] <qsd> db.accounts.find({}, {"folrs.11": 1}) on the other hand project all documents and reveals the good one, but that's probably not ideal
[16:28:33] <Nodex> projection is for parts of arrays not for dcuments
[16:37:15] <luto> http://docs.mongodb.org/manual/core/replication-introduction/ says "Arbiters do not require dedicated hardware." but http://docs.mongodb.org/manual/core/replica-set-arbiter/ says "Do not run an arbiter on systems that also host the primary or the secondary members"
[16:38:17] <algernon> you can run arbiters on virtualized hardware, just don't run the primary or secondary members on the same host.
[16:39:30] <luto> that sounds expensive to set up. I'd like to have one primary server and a 2nd one to which I can fail-over if something is wrong with my first one
[16:42:09] <mattapp__> Hey guys. Is there a way to know the resources consumed by each DB query?
[16:58:58] <qsd> Nodex: ok, sadly I think like they say here http://stackoverflow.com/questions/8213637/mongo-db-design-of-following-and-feeds-where-should-i-embed that a separate table for linking followers can fit better for me, it's quite close to sql..
[17:09:45] <qsd> well for the fun of it I prefer the initial solution, with 1 collection :)
[20:21:40] <nofxx> App running a 5 members replica set, userbase increasing it's getting bottlenecked...(linode IO doesn't help too). Idea: Move 5 replica set to 7 SSD machines, 2 shards 3members + 1 for mongos. Sounds like a plan?
[20:24:32] <nofxx> * 5 vps/8 cores/HDD to 7 vps/1 core/SDD ... guess it'll be a helluv improve