PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 23rd of November, 2013

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:31:34] <Fieldy> can I get some pointers for apps that will use mongodb where my use case is several TB of gzip compressed proxy log files? I'd like to stick a few key fields in mobgodb, including the filename it came from, so we can search for terms in fields and then know what filename to be searched. versus searching all of them manually as we do now.
[00:51:08] <nyov> how do I escape special characters in a mongo shell regex? specifically I want to escape [ and ]
[00:56:35] <harttho> Can you parse strings as ints in the Aggregation Framework?
[03:16:00] <Fieldy> how can I bring gzip compressed proxy logfiles (selected fields) into mongodb? i'm not a programmer, i'm not a scripter... am I out of my league?
[03:31:56] <jdelgado> Quick question - Best way to prevent data loss in the following event: ServerA reads Doc1, ServerB reads Doc1, ServerA writes Doc1 (increment somthing), ServerB writes Doc1 with an increment, but didn't get the chance to take ServerA's increment - thus data is lost.
[03:32:36] <jdelgado> Was looking up the two phase commit, but would it prevent writes?
[08:25:50] <cspeak> Anyone somewhat comfortable aggregation optimization? hah
[08:59:17] <styles_> Hey guys, GridFS... would this be worth using to store videos (like youtube style) then just have frontend servers stream them? it seems like this would be more overhead than writing them to disk and using the filesystem
[10:07:28] <qsd> Hi, I have a schema where I can have different type of accounts, each account is a document, I need now to represent a many-to-many relations among accounts: one account (of type: 0) can follow multiple other accounts (of type: 1) with a given weight, and inversely it's as important to know which are the followers from a the type:1 accounts, (at some events they need to modify and replcate data on their followers)
[10:08:52] <qsd> so to sum up I can have find operations to read who do this account follow with which weight, and who are the followers of this accounts with which weight
[10:09:56] <qsd> I thought of separate collections for that is it a good idea?
[10:11:35] <qsd> 1 separate collection*
[10:14:50] <qsd> I document per relation is maybe a bit ankward
[10:14:54] <qsd> 1*
[10:22:28] <kali> qsd: check out this for inspiration, even if it is not exactly matching your need
[10:22:54] <kali> qsd: you probably want to group relations in buckets
[10:24:42] <qsd> you mean the first post in http://blog.mongodb.org/ ?
[10:25:02] <qsd> if yes, I was reading it :
[10:30:08] <qsd> the difference would be account would either have a from: [{id: id1, w: w1}, ...] (for followers acounts) or a to: [{id: id2, w:w2},..] (for 'emitting' accounts)
[10:30:28] <qsd> with such a schema, everything fits in account collection
[10:34:20] <kali> qsd: i meant that one http://blog.mongodb.org/post/65612078649/schema-design-for-social-inboxes-in-mongodb
[10:36:23] <qsd> in the post I don't unbderstand some example mention 'recipient', although their document don"t have such field, 'to' maybe?
[10:37:13] <kali> i would guess so
[10:37:29] <qsd> db.shardCollection("mongodbdays.inbox", {"recipient": 1, "sent":1})
[10:37:56] <qsd> else recipient is used to iterate over to, so it's ok
[10:42:49] <sinclair|work> does anyone here know anything about redis?
[10:42:55] <kali> a bit
[10:42:58] <sinclair|work> specifically master / slave replication
[10:43:05] <kali> haha, nice joke
[10:43:14] <sinclair|work> kali: ?
[10:43:43] <kali> sinclair|work: i stopped using redis because it has not support for failover, but that was a few years ago
[10:43:47] <ron> sinclair|work: you asked in #redis. keep it there.
[10:43:50] <kali> sinclair|work: not sure where it stands now
[10:44:13] <sinclair|work> kali: sentinal is suppose to handle failover
[10:44:21] <sinclair|work> ron: i did ask in redis
[10:44:31] <sinclair|work> ron: care to help me in redis?
[10:44:53] <qsd> sinclair|work: do you, combine mongo to redis or just redis?
[10:45:00] <ron> I know you asked in #redis. I said you should keep it in that channel and not randomly ask in other channels.
[10:45:15] <sinclair|work> ron: ...why?
[10:45:47] <ron> basic irc etiquette. for the same reason you don't ask that question in #justarandomchannel
[10:46:01] <ron> (and btw, that was not the channel name I originally thought of writing).
[10:46:09] <sinclair|work> ron: your chat is distracting people from helping me with my question
[10:46:32] <sinclair|work> sooo, as you already said were unable to help. i might ignore
[10:46:35] <ron> based on your behavior, I consider that a good thing.
[10:46:59] <sinclair|work> qsd: not mongo to redis replication, rather redis to redis replication, specifically a slave to master
[10:47:26] <sinclair|work> qsd: from my understanding, master replications to slave, so it doesn't seem to make sense to write to the slave at all
[10:47:45] <sinclair|work> qsd: perhaps a understanding of how mongo replication works might lend some insights tho
[10:47:58] <qsd> no idea
[10:48:13] <qsd> I'm not that far
[10:49:07] <Nodex> sinclair|work : replication is one way, it's not bidirectional in redis
[10:49:13] <kali> the principal of master/slave is that you perform all writes to the master
[10:49:21] <kali> or else, it's master/master
[10:49:26] <sinclair|work> Nodex: that's fine
[10:50:01] <qsd> replication is important for data availability in find queries? or for possible data corruption, data loss
[10:50:03] <sinclair|work> Nodex: so, im a bit curious as to how to scale out over redis, if the redis master is the one getting the insane writes from each application
[10:50:30] <kali> qsd: availibility, and read scalability
[10:50:31] <sinclair|work> Nodex: im specifically thinking about scaling out over redis + socket.io btw
[10:51:01] <Nodex> the reads get directed at the slaves and writes at the master
[10:51:31] <sinclair|work> Nodex: so, from that point of view, how should i configure socket.io for that?
[10:51:43] <sinclair|work> Nodex: also noting that i have a pub/sub component to this
[10:52:09] <sinclair|work> Nodex: also noting that the socket.io redis library expects a pub/sub and client (for read and writes)
[10:52:25] <Nodex> we should take this to #redis
[10:52:32] <Nodex> kali : you use Ruby right?
[10:52:35] <sinclair|work> Nodex: sounds good
[10:52:52] <kali> Nodex: among others, yes
[10:53:16] <Nodex> do you happen to use Phusion passenger?
[10:53:23] <kali> i used to
[10:54:16] <Nodex> is there a reason you don't anymore ? - like it's shit or anything?
[10:55:01] <kali> it was a bad fit for us... it makes a lot of sense if you're using apache with several ruby application running
[10:55:19] <Nodex> ugg apache :/
[10:55:36] <kali> we're not using apache anymore for the main platform, and we have just one big rails app
[10:55:49] <kali> so passenger was not helping
[10:55:56] <kali> we switched to unicorn
[10:56:17] <Nodex> I've just started playing with it for (laugh) nodejs (as the front to an API)
[10:57:34] <kali> why bother ? just setup a proxy_pass in nginx
[10:58:00] <Nodex> it was more for the capabilities of spawning multiple processes and taking the blocking aspect out of play
[10:58:10] <Nodex> I had it under a proxy pass before
[10:58:47] <kali> yeah, for spawning multiple backends, passenger is great
[10:59:25] <Nodex> i like the fact it does all the management for you and I don't have to worry about spawning X processes for X cpu's
[11:00:52] <kali> yeah, that's good, particularly if you have several apps to manage on the same server, with resource allocation to balance among them
[11:01:18] <Nodex> it's got some terrible quirks I found though :/
[11:01:48] <kali> i don't remember having had big issues with it... but that was three years ago
[11:02:26] <Nodex> currently I can't upload a file bigger than 25k LOL
[11:02:36] <Nodex> sorry, a request bigger than 25k
[11:03:00] <kali> Nodex: we had big files upload going through it (with our ruby backends)
[11:03:10] <kali> so i don't think it's a structural problem
[11:04:03] <kali> Nodex: you know about nginx client_max_body_size ?
[11:04:34] <Nodex> yeh, it's set at 5g
[11:04:55] <Nodex> i know it's not nginx and I know it's not NodeJS coz it works when I proxy it
[11:07:10] <Nodex> it's got to be some Phusion temp directory permission error or somehting
[11:08:17] <kali> there might be a switch from an in-memory to an in-file strategy at about 25k, that would kinda make sense
[11:08:36] <kali> strace's your friend :)
[11:11:04] <Nodex> yer, gonna have to trace it :/
[11:19:39] <Nodex> seems it's using 8k buffers for some reason :S
[11:48:24] <BlueRayONE> hey guys
[11:51:25] <Nodex> best to just ask the question :)
[11:52:15] <BlueRayONE> does anyone here knows how to setup a replica set on only _one_ virtual server, to be able to run the mongodb-connector? I need the rs really only for the connector. i already tried the tutorial "Deploy a Replica Set for Testing and Development", but there i already get errors.
[11:54:09] <BlueRayONE> @nodex jojo, I'm just such a nice guy who says first everywhere hello ;) #courtesy
[11:57:45] <BlueRayONE> sitting there already for hours and days to get this connector running... (to sync mongoDB <> solr)
[11:59:34] <Nodex> :/
[13:12:19] <BlueRayONE> hmkay...no one..
[13:18:27] <joannac> BlueRayONE: what errors areyou getting?
[13:20:26] <BlueRayONE> "replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)" and these i want to set them by rs0.initiate() in the shell. but when i can't start it thats a bit difficult. and when i start it without -replset option this doesnt work.
[13:22:52] <BlueRayONE> @joannac sorry, I should have mentioned you^^ and sorry #2, but i have to leave, time for lunch. come back later ;)
[13:23:20] <joannac> what's the output of rs.initiate()
[13:24:32] <BlueRayONE> @joannac "rs0 is not defined." (rs0 is the name). /afk
[13:25:31] <joannac> not rs0, rs.initiate()
[14:20:31] <BlueRayONE> with filled jelly back. so, rs.initiate() returns { "ok" : 0, "errmsg" : "server is not running with --replSet" }. but thats not true, i started the monogd with " sudo /etc/init.d/mongod start --dbpath /var/lib/mongodb/data --replSet rs0 --smallfiles --oplogSize 128"
[15:11:06] <BigOrangeSU> Hi I had quick question about a recommended data model. I have a list of list of users and which groups they are in (all ints). { user : 12341, groups : [{id:123,ts:123123123},{id:231,ts:231321321}]} . I was wondering what the optimized data model would be for db.users.find({ groups: { $elemMatch : {id : 123 } } }).count() ? I need to scale to about 1 billion users in an average of 50 groups. Thanks!
[15:12:29] <qsd> ts is a timestamp?
[15:12:46] <BigOrangeSU> yes
[15:13:25] <BigOrangeSU> I can change to model to have {user : 213 , groups[], groups_ts[]} if that will improve performance
[15:13:41] <Nodex> it will
[15:13:58] <BigOrangeSU> i have tried placing an index in the original model on groups.id
[15:14:12] <BigOrangeSU> but still not getting the sub second performance i need to host this app as a web applciation
[15:17:53] <BigOrangeSU> any thoughts?
[15:19:20] <BlueRayONE> btw, problem solved: adding "replSet=rs0" and "oplogSize=100" to the config, starting the mongod without any params and then executing in the mongo-shell "rs.initiate({"_id" : "rs0","version" : 1,"members" : [{"_id" : 0,"host" : "localhost:27017"}]})". kthxbai ;)
[15:19:21] <Nodex> what are you machine specs for 1billion documents?
[15:20:29] <qsd> BigOrangeSU: just wondering when you say changing the model like {user : 213 , groups[], groups_ts[]}, id and corresponding timesstamps will have same position in arrays?
[15:20:40] <BigOrangeSU> yes
[15:20:48] <Nodex> [15:17:57] <Nodex> what are you machine specs for 1billion documents?
[15:20:51] <Nodex> your*
[15:22:25] <BigOrangeSU> 16gm ram, 2tb 10,000rpm solid state, 8 core, not sure exact speed
[15:22:37] <Nodex> then you probably won't get the performance you're after
[15:22:50] <Nodex> depending on your indexes they probably don't fit in your RAM
[15:23:00] <BigOrangeSU> how do you determine the specs?
[15:23:07] <BigOrangeSU> ensuring the indexes fit in ram?
[15:23:47] <Nodex> db.stats()
[15:24:05] <Nodex> "indexSize" : 1000000000000,
[15:24:07] <Nodex> in bytes
[15:25:27] <BigOrangeSU> gotcha
[15:25:57] <BigOrangeSU> is that normally the best way to determine what the best specs are for a machine?
[15:26:06] <BigOrangeSU> ensure the indexes fit in memory
[15:26:15] <Nodex> that's the general guidelines to keeping it fast
[15:26:27] <BigOrangeSU> will mongo prioritize to keep the indexes in memory?
[15:26:34] <BigOrangeSU> or is there additional config that I need to do?
[15:26:42] <Nodex> it's managed by the operating system
[15:26:50] <BigOrangeSU> k
[15:26:56] <Nodex> it's all memory mapped so it pages in and out on an LRU
[15:27:20] <Nodex> if you have enough ram it should never need to page in and out
[15:27:27] <BigOrangeSU> sweet
[15:27:33] <Nodex> (aside first load of course)
[15:27:55] <Nodex> what times are they taking at the moment (your queries) ?
[15:28:07] <BigOrangeSU> i havent fully tested
[15:28:14] <Nodex> you can check in the mongodb.log - it wil tell you anything over 100ms iirc
[15:28:46] <BigOrangeSU> at 7.6 million records its taking 4ish seconds
[15:29:18] <Nodex> somehting not right there, can you show me the indexes?
[15:29:30] <Nodex> db.foo.getIndexes() ... where foo is your collection name
[15:29:35] <Nodex> pastebin / gist
[15:29:56] <BigOrangeSU> i cant this second since im rebuilding the index
[15:30:09] <Nodex> and perhaps an explain.... db.foo.find({your_query}).explain();
[15:30:36] <BigOrangeSU> i appreciate your help, this is good info to get me started
[15:30:49] <BigOrangeSU> i feel like to need to do some more work on my end before i engage anyone elses help
[15:30:50] <BigOrangeSU> i appreicate it
[15:30:51] <Nodex> no probs
[15:31:02] <Nodex> it's how we learn ;)
[15:31:34] <Nodex> do you only ever query by user id ?
[15:32:49] <BigOrangeSU> no never
[15:33:18] <BigOrangeSU> i only query by the groups
[15:33:25] <Nodex> group id only?
[15:34:40] <Nodex> just trying to get a handle on what data you're trying to extract from the DB
[15:35:02] <BigOrangeSU> yes all users who are in x group
[15:35:03] <Nodex> are you trying to get all the groups that a user is a member of OR are you trying to get all the users of a particular group
[15:35:18] <BigOrangeSU> all users of a particular group or combination of group
[15:35:30] <BigOrangeSU> for example give me all users who are in group 1, 2 and 3
[15:35:44] <Nodex> I would have a separate collection for that tbh
[15:36:00] <BigOrangeSU> im nervous that i will get kileld by the 16gb limit
[15:36:04] <BigOrangeSU> sorry 16mb limit
[15:36:27] <Nodex> I would store all the users in a groups collection
[15:36:41] <Nodex> members : [1,2,3,4,5,6,7]
[15:36:43] <BigOrangeSU> i could have something like { group_id : 1 , users : [1231,1231,12312314] }
[15:36:56] <BigOrangeSU> if each user is bigint
[15:37:11] <Nodex> you can use an $in with that
[15:37:12] <BigOrangeSU> how many do you think I could fit into the array?
[15:37:22] <BigOrangeSU> with the 16mb limit
[15:37:35] <Nodex> db.groups.find({members : { $in : [1,2,4]}});
[15:37:39] <Nodex> a lot
[15:38:12] <Nodex> you can always shard it anyway on an app level
[15:38:47] <Nodex> group_id:1, users : [1,3,5,7] ......new document group_id:1, users:[2,4,6,8].....
[15:39:10] <BigOrangeSU> yea interesting
[15:39:27] <BigOrangeSU> my groups will have millions of users in them
[15:39:32] <BigOrangeSU> hundreds of million
[15:39:53] <Nodex> you will certainly have to chard it then
[15:39:56] <Nodex> shard*
[15:40:16] <Nodex> work out the optimal number per document and keep a running total for each shard of the group
[15:40:40] <BigOrangeSU> how do you keep the running total?
[15:40:50] <BigOrangeSU> like a background process?
[15:40:51] <Nodex> id:1, members:[....],'total' : 12345
[15:40:54] <Nodex> $inc
[15:40:59] <BigOrangeSU> got it
[15:41:09] <BigOrangeSU> then sum teh totals over shard
[15:41:33] <Nodex> perhaps a separate document that's a master of all the shards that has a total amount and amount per shard etc - for convienience more than anything
[15:42:03] <Nodex> {shard_id:ObjectId(...), total:12345, group_id:1....}
[15:42:09] <BigOrangeSU> yea
[15:42:26] <Nodex> plus a total total but you can work that out by adding the shards in your app fairly easilt
[15:42:27] <BigOrangeSU> its going to get even more complex because eventually i will want to know the distinct number of users in a given combination of groups
[15:42:31] <Nodex> easily*
[15:43:20] <Nodex> that's fairly easy but it's going to have to be done app side or in a map/reduce job
[15:43:58] <BigOrangeSU> yea i figured that
[15:46:07] <BigOrangeSU> ok i got quite a project ahead of me
[15:46:13] <BigOrangeSU> thanks for your help
[15:46:20] <BigOrangeSU> i will report back the progress
[15:46:29] <Nodex> no probs, good luck
[15:46:41] <BigOrangeSU> yea
[15:49:02] <qsd> I have documents like {_id:.., followers:[{id:231,w:0.2},{id:3, w:1.2}] , the following query will do what I want, (selecting the item with that id) but I don't understand the mecanism of projection, also should I do like you said above, with doing 2 separate arrays folrs, folrs_w,? ..... db.accounts.find({"followers.id":3}, {"followers.$": 1})
[15:51:43] <Nodex> that should work eh
[15:51:45] <Nodex> yeh*
[15:52:00] <BigOrangeSU> qsd: db.accounts.find({followers: { $elemMatch : {id : 231 }}})
[15:58:18] <qsd> BigOrangeSU: yes it's doing the same as my query above without the projection, I'm interesting to just return the {id:.., w: ..} wanted, actually, the id are unique in this array, maybe an object is more adapted :)
[15:59:34] <Nodex> the projection should work fine
[16:05:35] <qsd> hmm with an object {_id: .., folrs:{"231": 0.2, "3": 1.3}} , are there ways to query the value associated with id="3" in folrs? that's why I went with arrays, I didn't find example for this case, I could of course query the whole map and treat it on client
[16:09:47] <mattapp__> Hey guys. Is there a way to know the resources consumed by each DB query?
[16:14:09] <qsd> I'd need a sort of db.accounts.find({"folrs": "11"}, {"folrs.11": 1}) or db.accounts.find({"folrs.11": $}, {"folrs.11": 1})
[16:14:46] <qsd> db.accounts.find({}, {"folrs.11": 1}) on the other hand project all documents and reveals the good one, but that's probably not ideal
[16:28:33] <Nodex> projection is for parts of arrays not for dcuments
[16:37:15] <luto> http://docs.mongodb.org/manual/core/replication-introduction/ says "Arbiters do not require dedicated hardware." but http://docs.mongodb.org/manual/core/replica-set-arbiter/ says "Do not run an arbiter on systems that also host the primary or the secondary members"
[16:37:33] <luto> which one is true? o.o
[16:37:50] <algernon> both
[16:38:17] <algernon> you can run arbiters on virtualized hardware, just don't run the primary or secondary members on the same host.
[16:39:30] <luto> that sounds expensive to set up. I'd like to have one primary server and a 2nd one to which I can fail-over if something is wrong with my first one
[16:42:09] <mattapp__> Hey guys. Is there a way to know the resources consumed by each DB query?
[16:58:58] <qsd> Nodex: ok, sadly I think like they say here http://stackoverflow.com/questions/8213637/mongo-db-design-of-following-and-feeds-where-should-i-embed that a separate table for linking followers can fit better for me, it's quite close to sql..
[17:09:45] <qsd> well for the fun of it I prefer the initial solution, with 1 collection :)
[20:21:40] <nofxx> App running a 5 members replica set, userbase increasing it's getting bottlenecked...(linode IO doesn't help too). Idea: Move 5 replica set to 7 SSD machines, 2 shards 3members + 1 for mongos. Sounds like a plan?
[20:24:32] <nofxx> * 5 vps/8 cores/HDD to 7 vps/1 core/SDD ... guess it'll be a helluv improve
[23:23:37] <ProLoser> hallo
[23:23:45] <ProLoser> i'm having some problems debugging a mongoose error
[23:23:59] <ProLoser> i'm getting this: MongoError: not okForStorage
[23:24:05] <ProLoser> i have no idea what's wrong with the payload
[23:24:13] <ProLoser> is there some way i can gleam more information?