PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Saturday the 8th of September, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[01:12:29] <Gavilan2> Hi! What's the best ORM for JavaScript? And most "transparent" to the business model? I'll be trying to port it to meteor...
[01:19:07] <bizzle> just asked this really simple question on SO about casbah/scala http://stackoverflow.com/questions/12327269/is-there-a-more-idiomatic-way-to-use-casbah-to-check-a-password
[01:19:18] <bizzle> can anyone help a brotha?
[01:33:13] <bizzle> not too lively in here on a friday night I guess haha
[02:20:56] <ecksit> hi, I am looking to convert from mongodb to mysql. is there any online docs for this? i am struggling to find consistent sources.
[02:21:32] <ecksit> example, i have a few tables in mongo that need to be converted so i can import them to a mysql instance.
[02:38:17] <UForgotten> ecksit: that's not an easy conversion path... dunno if anyone has written code to do such a thing.
[02:38:36] <ecksit> :|
[02:38:40] <ecksit> really?
[02:39:18] <ecksit> is there a way to get a dump of the mongo db tables in different formats?
[02:39:45] <UForgotten> I doubt it
[02:39:56] <UForgotten> not without someone writing a program to do it.
[02:40:02] <UForgotten> ask teh googles.
[02:40:18] <ecksit> (just fyi, i have zero mongo experience. i inherited this project and they no longer want to use mongo.)
[02:41:47] <UForgotten> sorry :( too hard for them? ;)
[02:42:49] <futini> which is a best approach for work whit id on mongo, example, the _id for defect is too long, then in a url es hard work whit this
[02:43:02] <ecksit> the dev they had used mongo for an internal app and now he left no one can manage it.
[02:43:09] <futini> is a good way create a another id incremental?
[02:43:25] <ecksit> so, they are getting me to change it back to mysql for their devs
[02:46:46] <timeturner_> what's the best way to store stuff in a subdocument that grows without limit?
[02:47:04] <timeturner_> I want to store them in a subdocument for the purposes of querying speed
[02:47:24] <timeturner_> and then grab the rest of the documents that couldn't fit in the document from another collection individually
[02:55:56] <mrpoundsign> hello everyone. I am trying to figure out how to do an atomic findAndModify that updates a field on all subdocuments to be the same thing. For example:
[02:56:01] <mrpoundsign> db.foo.insert({status: 'waiting', name: 'foo', thingies: [{name: "one", status: 'waiting'}, {name: "two", status: 'waiting'}]})
[02:56:31] <mrpoundsign> I want to set the status to 'ready' and the status on both sub objects to 'ready' in one command. I tried the following:
[02:57:10] <mrpoundsign> db.foo.findAndModify({query: {status: 'waiting'}, update: {$set: {status: 'ready', "thingies.status": 'ready'}}}) which I suspected wouldn't work. Do I need to iterate over every sub object int he array and specify them all?
[04:35:42] <phatduckk> hey guys - anyone here have experience identifying causes on high lock % ?
[04:46:00] <mrpoundsign> phatduckk: what do you mean?
[04:52:22] <phatduckk> mrpoundsign: i see a bunch of slow inserts, updates due to lock waits
[04:52:43] <phatduckk> i also see my lock% in mongostat spike high (almost 200%) from time to time
[04:53:29] <phatduckk> trying to track down what my code is doing that mongo doesnt like - or find out if I need beefier hardware
[04:54:16] <mrpoundsign> well more hardware always helps. haha
[04:54:20] <mrpoundsign> but
[04:54:43] <phatduckk> ya, exactly
[04:54:51] <phatduckk> so for example. here's a line from mongodb.log
[04:54:51] <phatduckk> Sat Sep  8 00:40:46 [conn69259] insert jeraff.activityStream keyUpdates:0 locks(micros) w:3100666 3105ms
[04:55:05] <phatduckk> now - this write took 3 seconds - and essentially all due to a lock
[04:55:07] <phatduckk> WTF
[04:55:08] <phatduckk> lol
[04:55:24] <mrpoundsign> haha. what are you running on? are you on a cloud/virtualization?
[04:56:06] <phatduckk> Xen virtualization against a RAID 10 on a filer
[04:56:15] <phatduckk> 2 core box - 8GB RAM
[04:56:38] <mrpoundsign> how's your filer mounted?
[04:57:38] <phatduckk> no exactly sure - im pretty sure its not NFS tho
[04:57:45] <phatduckk> here's sb.stats()
[04:57:45] <phatduckk> "db" : "jeraff", "objects" : 12005850, "avgObjSize" : 245.65176776321545, "dataSize" : 2949258276, "storageSize" : 3885572096, "numExtents" : 262, "indexes" : 75, "indexSize" : 1573217744, "fileSize" : 14958985216, "nsSizeMB" : 16,
[04:58:11] <phatduckk> iostat and mongotop dont really look troubling at all
[04:58:54] <mrpoundsign> yeah I'm not an expert in this area, but I suspect it's the FS...
[04:59:03] <mrpoundsign> but
[04:59:12] <mrpoundsign> you should ask someone who knows more. I'm just trying to help haha
[04:59:18] <phatduckk> how do i "make sure" its the FS
[04:59:32] <phatduckk> i mean, i could just be doing some stupid queries
[04:59:32] <phatduckk> lol
[05:00:24] <mrpoundsign> well make sure you're not on NFS, hopefully you're using iscsi
[05:00:42] <mrpoundsign> also are you using SSDs or real disks?
[05:01:10] <phatduckk> real disks
[05:01:16] <phatduckk> here's a graph of my disks: http://cl.ly/image/2w050B3y1M2I
[05:01:31] <phatduckk> they're not really hitting any sort of "oh shit" levels there
[05:02:03] <phatduckk> they peak at 1MB/sec which seems to correspond to about 25% saturation
[05:02:39] <mrpoundsign> right but with a remote filer, your network / protocol will more often be the bottleneck before the disks.
[05:03:18] <phatduckk> http://cl.ly/image/3V1A3B1d040u
[05:03:27] <phatduckk> that's looking pretty mellow too
[05:03:42] <phatduckk> that would work fine over wifi
[05:03:43] <phatduckk> lol
[05:05:50] <mrpoundsign> what os?
[05:05:57] <phatduckk> ubuntu
[05:06:18] <phatduckk> 11.10 to be exact
[05:06:47] <mrpoundsign> you might try iozone
[05:06:54] <mrpoundsign> to test your FS
[05:06:56] <mrpoundsign> http://www.cyberciti.biz/tips/linux-filesystem-benchmarking-with-iozone.html
[05:07:09] <phatduckk> ill do that now
[05:07:33] <mrpoundsign> there's a ton of tools for filesystem testing.
[05:08:26] <mrpoundsign> you can even see if you can max out the network using something like dd as a simple test.
[05:08:41] <phatduckk> cool
[05:08:49] <phatduckk> lemme get iozone installed real quick
[05:08:56] <mrpoundsign> also do you know if the filer is on the same switch as the host?
[05:09:36] <mrpoundsign> might be going through a 100mb hub somewhere. haha
[05:09:45] <mrpoundsign> have had those kinds of issues in the past.
[05:10:43] <mrpoundsign> Again, I am no expert, but I would personally go for a more distributed mongo cluster over more, smaller hardware with local storage. I think remotely mounted storage is not recommended. At least make sure you're using iscsi, if you can.
[05:11:03] <phatduckk> lol
[05:11:09] <phatduckk> these are good tips tho
[05:11:26] <phatduckk> lemme hop on a secondary
[05:11:31] <phatduckk> dont wanna kill the primary
[05:16:39] <mrpoundsign> indeed.
[05:17:05] <mrpoundsign> also are the filesystems mounted with the native flag? that can help quite a bit.
[05:17:29] <phatduckk> the test's running now
[05:17:33] <mrpoundsign> and is the entire VM on the filer or just the mongodb data directory?
[05:17:36] <phatduckk> ill post a gist once its ready
[05:17:38] <mrpoundsign> cool beans.
[05:17:42] <phatduckk> the data dir is
[05:17:50] <mrpoundsign> ok cool
[05:17:51] <phatduckk> VMs come with small local storage
[05:18:09] <phatduckk> ok https://gist.github.com/3672032
[05:18:13] <mrpoundsign> right. so you should be able to see what type of filesystem it is in /etc/fstab
[05:18:59] <phatduckk> yup
[05:19:37] <phatduckk> its ext4 (defaults 0 2)
[05:19:56] <mrpoundsign> hmm that's not nfs
[05:20:18] <phatduckk> the iozone output looks like its pretty fast too
[05:20:36] <phatduckk> ya im certain its not NFS
[05:23:50] <phatduckk> im assuming im doing some sort of abusive queries
[05:23:57] <phatduckk> just have no clue how to identify them
[05:24:13] <mrpoundsign> Well I think locking is typically for writes.
[05:24:20] <mrpoundsign> I *think*
[05:25:46] <mrpoundsign> how did the network look as you ran ozone?
[05:25:52] <mrpoundsign> iozone*
[05:26:02] <phatduckk> about to random friend on phone
[05:26:02] <phatduckk> lol
[05:27:09] <phatduckk> what shold i run to check the network?
[05:31:28] <mrpoundsign> sorry, I meant the IO graphs you sent before haha
[05:32:37] <phatduckk> lemme grab em
[05:32:38] <phatduckk> one sec
[05:33:31] <phatduckk> i have partial graphs
[05:33:31] <phatduckk> http://cl.ly/image/270O0i351V0R
[05:33:35] <phatduckk> looks like 25%
[05:33:48] <phatduckk> thats the disk io graph while iozone is running
[05:33:53] <phatduckk> disk IO seems fine...
[05:36:31] <mrpoundsign> is that much higher than you saw when mongodb is churning? if it's not much higher, it means, for whatever reason (CPU, network, mount options), you're not able to get more performance out of the filer from that server.
[05:37:30] <mrpoundsign> if it's significantly (hundreds of %) higher, then it could be something else (but you can probably still get better performance with things like mounting with noatime, or other tweaking.
[05:37:50] <phatduckk> heh
[05:38:36] <phatduckk> mongo doesnt seems to crop over 25% io
[05:40:14] <mrpoundsign> looks like it's around the 25% mark, which is what I saw on the mongo graphs as well. First thing I would try is mounting noatime, see if that helps and how much. That's basically a locked write every time you access the file (read and/or write). It's often turned off for remote mounts.
[05:42:18] <phatduckk> k
[05:42:36] <phatduckk> i read lately its not needed as much as it used to be b/c ubuntu's defaults are better now
[05:42:39] <phatduckk> but ill try it
[05:42:49] <mexlex> is ubuntu any good for servers?
[05:43:41] <mrpoundsign> mexlex: yeah.
[05:44:13] <mexlex> what about centOS?
[05:44:14] <mrpoundsign> mexlex: but that's subjective. I like it. *shrug
[05:44:24] <mrpoundsign> mexlex: yeah it's good too.
[05:44:31] <mexlex> well in terms of updates and patches and stuff liek that
[05:47:38] <mrpoundsign> mexlex: they're both actively supported. Generally I use ubuntu because I am not afraid of newer versions of software. CentOS tends to lag. But again, it's subjective -- what you know best you'll do the best with.
[05:48:45] <mrpoundsign> my current employer loves CentOS so we use that for all our stuff. But I laugh every time they spend 2 days getting something that is an apt-get away in ubuntu. Again, my statements are my opinion, and I could be wrong ( before I get yelled at :P )
[05:50:54] <phatduckk> ya this is baffling
[05:51:30] <mrpoundsign> phatduckk: have you tried native yet? It really does make a difference :)
[05:51:37] <mrpoundsign> noatime*
[05:51:40] <phatduckk> native disk?
[05:51:42] <mrpoundsign> fricking autocorrect!
[05:51:54] <phatduckk> ill try it in a second...
[05:51:59] <mrpoundsign> kk
[05:52:46] <phatduckk> running another iozone
[05:59:41] <phatduckk> damn the default iozone test takes forever!
[06:05:57] <mrpoundsign> haha
[06:06:17] <mrpoundsign> how's the IO graph looking? you can just type in the % here. :)
[06:19:37] <mrpoundsign> phatduckk: don't leave me hangin' bro! :P
[06:19:45] <phatduckk> lol sorry
[06:19:50] <mrpoundsign> no stress haha
[06:19:51] <phatduckk> i put the node on noatime
[06:19:58] <phatduckk> and now its stuck in STARTUP2 state
[06:20:03] <phatduckk> i think its fucked
[06:20:09] <phatduckk> Sat Sep 8 06:18:58 [conn105] problem detected during query over config.system.profile : { $err: "not master or secondary; cannot currently read from this replSet member", code: 13436 }
[06:25:03] <phatduckk> apps super fast with 1 server dead
[06:25:04] <phatduckk> lol
[06:26:25] <mrpoundsign> haha ugh
[06:26:34] <mrpoundsign> what version of mongodb are you using?
[06:27:09] <phatduckk> 2.2
[06:27:47] <phatduckk> stuck in state of STARTUP2
[06:30:14] <mrpoundsign> hmm you have a replication set? or master/slave?
[06:30:56] <phatduckk> replication
[06:31:10] <mrpoundsign> how many nodes?
[06:31:31] <phatduckk> 3
[06:31:38] <phatduckk> this one's a secondary
[06:31:42] <phatduckk> (obviously)
[06:31:58] <phatduckk> oh, looks like it got its shit together
[06:32:03] <phatduckk> took a while tho
[06:32:12] <mrpoundsign> ha!
[06:32:38] <phatduckk> last fucking thing i needed on a friday night
[06:32:40] <phatduckk> LOL
[06:33:00] <mrpoundsign> Well buddy, let's take a shot of something.
[06:33:04] <mrpoundsign> haha
[06:33:23] <phatduckk> seriously
[06:33:24] <phatduckk> http://cl.ly/image/1a0h1m1j3q10
[06:33:25] <phatduckk> LOL
[06:33:33] <phatduckk> io went fucking insane
[06:34:32] <mrpoundsign> nice hostname haha
[06:34:45] <phatduckk> root@jfab:~/server-automation/jfab (master) $ ./mongostatus -c prod
[06:34:45] <phatduckk> Replica Set: jeraffprod
[06:34:45] <phatduckk> Members:
[06:34:45] <phatduckk> 4 chainsaw.mongo.prod 192.168.255.186:27017 SECONDARY
[06:34:45] <phatduckk> 5 scream.mongo.prod 192.168.254.82:27017 PRIMARY
[06:34:45] <phatduckk> 6 nightmare.mongo.prod 192.168.254.189:27017 SECONDARY
[06:34:47] <phatduckk> lol
[06:34:56] <mrpoundsign> lol
[06:35:00] <mrpoundsign> was there a failover?
[06:35:15] <mrpoundsign> or was scream the primary before?
[06:35:57] <phatduckk> scream was primary
[06:36:10] <mrpoundsign> might have been just catching up with the replog.
[06:36:12] <phatduckk> fuck, i only have rum
[06:36:17] <phatduckk> must have been
[06:36:35] <phatduckk> might as well change the other secondary now
[06:36:35] <phatduckk> lol
[06:38:46] <phatduckk> second one was instant recovery
[06:39:16] <phatduckk> alright mrpoundsign thanks a ton dude
[06:39:22] <phatduckk> i owe u a couple dirnks
[06:39:24] <phatduckk> where u at?
[06:39:35] <phatduckk> any chance youre near SF?
[06:40:15] <mrpoundsign> haha. I am in Corning, CA
[06:40:33] <mrpoundsign> did it help at all? I still don't know if it's any better haha
[06:42:54] <phatduckk> neither do i
[06:42:55] <phatduckk> lol
[06:43:06] <mrpoundsign> HA!
[06:43:09] <phatduckk> i think im gonna take yoru advice and drink
[06:43:21] <mrpoundsign> I'll be here all weekend. in and out. so if you need anything lmk. haha
[06:43:23] <phatduckk> i was toying with the idea of using noatime anyways
[06:43:27] <phatduckk> ya ditto
[06:43:36] <phatduckk> what app your building?
[06:43:47] <mrpoundsign> I work for sugarcrm
[06:43:57] <phatduckk> oh nice
[06:44:17] <mrpoundsign> we're making a system to do private deployments in the cloud for larger clients and multi-tenant systems for partners.
[06:44:27] <mrpoundsign> how about you?
[06:44:43] <phatduckk> im doing an iPhone app called Well
[06:44:55] <phatduckk> we just got featured a couple weeks ago and shit melted hard
[06:45:02] <mrpoundsign> when's the android app coming out? :P
[06:45:31] <mrpoundsign> oh hey sounds pretty cool. I am using Astrid right now.
[06:45:37] <phatduckk> ture story - i wrote one a bit ago
[06:45:40] <phatduckk> BUT
[06:45:58] <phatduckk> 2 of my investors' admin's phones would OOM and the app was using 3MB of RAM
[06:46:00] <phatduckk> so i gave up
[06:46:01] <phatduckk> lol
[06:46:21] <mrpoundsign> haha!
[06:46:32] <phatduckk> how do u oom at 3mb?
[06:46:34] <phatduckk> ll
[06:46:36] <phatduckk> lol
[06:46:39] <phatduckk> so i said fuck it
[06:46:53] <phatduckk> probably gonna do some phonegap thing soon tho
[06:46:58] <phatduckk> just to tide us over
[06:47:03] <mrpoundsign> yeah
[06:47:40] <phatduckk> + i gotta figure out if im gonna keep mongo or not
[06:48:18] <phatduckk> gonna go to one of the office hours and see how it goes before i take switching DBs seriously tho
[06:48:53] <mrpoundsign> indeed.
[06:50:30] <phatduckk> OK poor man's benchamrks are in
[06:50:32] <phatduckk> ready?
[06:50:40] <mrpoundsign> sure haha
[06:51:22] <phatduckk> Timing cached reads: 17220 MB in 1.98 seconds = 8681.34 MB/sec
[06:51:22] <phatduckk> vs
[06:51:29] <phatduckk> Timing cached reads: 18482 MB in 1.98 seconds = 9326.03 MB/sec
[06:52:26] <mrpoundsign> hmm, not a huge difference. <10%
[06:53:53] <phatduckk> better than 0%
[06:54:06] <mrpoundsign> haha indeed.
[06:54:08] <phatduckk> actually
[06:54:18] <phatduckk> w/ O_DIRECT is a big diff
[06:54:25] <phatduckk> Timing O_DIRECT cached reads: 942 MB in 2.00 seconds = 471.07 MB/sec
[06:54:25] <phatduckk> vs
[06:54:29] <phatduckk> Timing O_DIRECT cached reads: 1640 MB in 2.00 seconds = 820.53 MB/sec
[06:54:37] <mrpoundsign> oh wow
[06:54:48] <mrpoundsign> not sure what O_DIRECT is but that's good news :)
[06:55:34] <phatduckk> "In many cases, this can produce results that appear much faster than the usual page cache method, giving
[06:55:34] <phatduckk> a better indication of raw device and driver performance."
[06:55:39] <phatduckk> its a read only test tho
[06:55:46] <mrpoundsign> yeah
[06:55:55] <mrpoundsign> how's mongostat looking?
[06:56:09] <phatduckk> probably fine
[06:56:12] <phatduckk> its midnight
[06:56:13] <phatduckk> lol
[06:56:19] <mrpoundsign> yeah
[06:56:54] <mrpoundsign> are they all remounted or just one of the slaves? lol
[06:57:01] <mrpoundsign> switch master. it's midnight. :P
[06:57:08] <mrpoundsign> haha jk
[06:57:08] <phatduckk> LOL
[06:57:14] <phatduckk> VPN just booted me. its a sign!
[06:57:22] <mrpoundsign> time for bed. :P
[06:57:38] <phatduckk> lol
[06:57:39] <phatduckk> yup
[06:57:49] <phatduckk> aight. im out for tonight.
[06:57:53] <phatduckk> thanks buddy
[06:57:57] <phatduckk> ttyl
[10:26:36] <aroman> is there are an IRC channel for mongolab?
[14:05:48] <mongooo> hi. are ttl collections not available in 2.2 too?
[14:56:54] <bizzle> in the ensureIndex method on the java DBCollection, there is a bollean called "force", what does that mean?
[14:57:06] <bizzle> boolean* sorry
[15:27:53] <Antaranian> hi ladies
[15:29:34] <Antaranian> how can I retrieve docs with nested doc id ?
[15:30:14] <fg3> Antaranian, glad to see I'm not the only one with this problem
[15:30:47] <Antaranian> fg3: any luck ?
[15:30:57] <ron> huh? nested doc id?
[15:31:35] <fg3> Antaranian, no luck yet
[15:32:01] <Antaranian> ron: I have a property of nested doc, which is known. now i want to find all docs containing embedded doc with that value of attribute
[15:32:54] <ron> db.collection.find({"embeddeddoc.docid":"whatever"})?
[15:34:53] <Antaranian> getting no result with that
[15:35:13] <ron> well, try pastebining a sample doc.
[16:06:13] <fg3> ron, you there
[16:06:44] <fg3> need help with this question: http://pastebin.com/uwG9kNjq
[16:10:09] <fg3> oops posted that last question to wrong channel
[16:17:22] <ron> fg3: sorry, not familiar with the language in the sample.
[16:20:39] <fg3> ron, my fault posted to wrong channel
[16:23:19] <fg3> ron, correct me if I'm wrong -- it's not possible to edit documents if they have nested arrays of 2 or more levels correct -- because the positional operator cannot handle it.
[16:27:19] <ron> fg3: not sure, honestly. embedded docs and within arrays and arrays within arrays are limited in mongo.\
[16:27:40] <fg3> ron, thanks
[16:33:41] <darklrd> hello, if I am using mongodb to log chat messages for multiple websites and say for each user of a particular website, I allocate a collection to store his messages, then I keep on hitting namespace limit
[16:33:59] <darklrd> how do I solve this problem? Any suggestion?
[16:53:49] <vsmatck> darklrd: I don't know if/how you want to query the data. But have you considered append only files?
[16:54:30] <darklrd> vsmatch, for querying I just need to extract latest say 50 msgs first
[16:54:48] <darklrd> vsmatch, and then later on keeping on repeating this procedure
[16:55:12] <darklrd> vsmatch, what do you mean by append only files?
[16:56:24] <darklrd> vsmatch, are you suggesting to use system files?
[16:56:50] <vsmatck> Yeah. Appending to files on the filesystem. But it doesn't sound like that fully meets your needs.
[16:57:33] <vsmatck> Redis would be perfect for keeping the last 50 messages in memory. Then you could keep a total history in append only files, or mongo.
[16:58:17] <darklrd> vsmatch, yes I can use append only files, I am using redis at the moment for recent messages, but I was looking forward to use mongo somehow
[16:58:40] <vsmatck> a
[16:58:42] <vsmatck> ah
[16:59:08] <darklrd> vsmatch, thank you so much for your response, so is there any way to solve this problem, a different implementation perhaps?
[16:59:25] <vsmatck> Thinking about it. :)
[16:59:52] <vak> hi all
[17:01:08] <darklrd> vsmatch, mongo will enable me to perform some basic query operations and allow me to set up multiple servers :)
[17:01:58] <darklrd> vsmatch, I thought mongo would be best for this kind of scenario
[17:02:22] <vak> for my directory per db storage I am getting something new for no clear reason: [initandlisten] exception in initAndListen: 14043 clear tmp files caught exception exception: boost::filesystem::is_directory: Permission denied: "/var/brain-storage/mongodb-storage-pdb/mybd"
[17:03:13] <vsmatck> I'm not sure how well it would work but you could put all chat logs in one collection. Then build a secondary index on username and post timestamp. A descending index.
[17:03:34] <vsmatck> You could shard on username (I think) *reads*.
[17:04:16] <ron> vsmatck: for a minute there I thought this is #redis ;)
[17:04:36] <vsmatck> ron: heyo! :)
[17:04:55] <ron> sup ;)
[17:05:02] <darklrd> all chat longs in one collection? :O
[17:05:15] <darklrd> *logs
[17:05:21] <vsmatck> You have to shard on the collection level.
[17:06:24] <darklrd> Hmm, but still I am apprehensive about storing everything in one collection :|
[17:10:10] <darklrd> vsmatch, then it effectively becomes append only file system ;)
[17:11:40] <darklrd> ron, can you suggest something?
[17:11:48] <ron> about what?
[17:11:52] <ron> eat less sugar.
[17:12:26] <darklrd> ron, lemme copy my ques
[17:12:42] <darklrd> ron, if I am using mongodb to log chat messages for multiple websites and say for each user of a particular website, I allocate a collection to store his messages, then I keep on hitting namespace limit
[17:13:07] <ron> you keep a collection per user? o_O
[17:13:26] <darklrd> ron, yeah :|
[17:13:32] <ron> okay. why?
[17:14:17] <darklrd> ron, because I need to pick messages specific to a user
[17:14:34] <darklrd> ron, could there be a better implementation?
[17:14:46] <ron> you could... query?
[17:15:11] <darklrd> ron, against all site messages?
[17:15:38] <ron> what if you needed to get specific messages to a specific user of a specific day? would you have a collection per user per day?
[17:16:10] <vsmatck> It's a btree. So you'd get there in log(<total numer of messages>).
[17:16:39] <darklrd> Hmm, I see, I thought if I index timestamp
[17:16:52] <vsmatck> You shouldn't do SKIP in mongo. It's too slow.
[17:17:00] <vsmatck> But you can go directly to a date.
[17:17:07] <vsmatck> Like show all messages for last friday.
[17:17:10] <ron> right.
[17:17:14] <darklrd> I see
[17:17:20] <ron> but, you're losing me.
[17:17:32] <darklrd> :D
[17:18:00] <ron> just have a collection for the messages, and pour it all in. index on the fields you'd normally use to query.. and that's it.
[17:18:14] <ron> if you query many fields, index wisely.
[17:18:36] <darklrd> ok! :)
[17:20:16] <vsmatck> Oh, sounds like this is a lot of traffic too. You may want to divide websites in to different databases. Mongo has a write lock on the database level in the latest version.
[17:20:53] <darklrd> I see, yes I was planning to use different db per website
[17:20:54] <vsmatck> Unless the users for all websites are the same. In which case it seems like they'd need to be together *shrugs*.
[17:21:11] <darklrd> no, they are all different
[17:21:17] <darklrd> :)
[17:21:46] <darklrd> so it seems I need just one collection per DB
[17:22:42] <darklrd> initially, when I had started, I thought I was doing justice by making collection per user and using mongo wisely :) lol
[17:23:10] <vsmatck> I think it is possible to increase the numbe of collections mongo supports through a config option also.
[17:23:28] <vsmatck> I think it maxes out at 3 million. But I don't know the implications of going that big.
[17:23:37] <darklrd> yes, but then does it affect performance somehow?
[17:23:39] <darklrd> I see
[17:26:07] <darklrd> what beats me is the fact that if I store all messages in a single collection, and they are already increasing at an alarming rate, won't it become problem later, or would sharding take care of this?
[17:27:53] <vsmatck> Partitioning data is the only way to increase write performance. Sharding accomplishes that.
[17:28:14] <ron> in MongoDB at least.
[17:28:19] <vsmatck> If you shareded on username it'd effectively uniformly (or fairly uniformly) spread writes across the servers.
[17:28:31] <darklrd> hmm
[17:29:15] <darklrd> it won't affect querying?
[17:29:50] <darklrd> read performance I mean
[17:29:58] <vsmatck> It does. Sharding limits querying in specific ways documented here. http://www.mongodb.org/display/DOCS/Sharding+Limits
[17:30:16] <vsmatck> It would increase read performance.
[17:31:09] <darklrd> vsmatch, ron, thank you so much, atleast I am heading somewhere now
[17:31:18] <vsmatck> Well, as long as each of your queries was only getting routed to one server.
[17:31:33] <vsmatck> :)
[17:31:42] <darklrd> Yeah, then it would be okay
[17:32:37] <darklrd> Is there a limit on number of documents too that you can store in a single collection?
[17:33:51] <vsmatck> There are some limits you have to worry about. They're not really limits in mongo though.
[17:34:23] <darklrd> Open file limits and all?
[17:34:26] <vsmatck> For example if you want any type of decent performance your indexes should be in memory. Disks are so huge now relative to main memory.
[17:34:53] <darklrd> Yes, indexes should be in RAM
[17:35:16] <vsmatck> I was watching a talk by jeremy zawodny over at craigslist. He was talking about how they don't fill up their disks because they get to a point where their indexes start falling out of memory.
[17:35:42] <darklrd> so, that becomes the deciding factor then, I will search on net for that video, thank you again :)
[17:35:54] <vsmatck> I know it's on the 10gen website somewhere.
[17:36:17] <darklrd> sweet, now I am definitely heading in right direction :)
[17:36:31] <vsmatck> http://www.10gen.com/presentations/mongosf-2012/mongodb-at-craigslist-one-year-later
[17:36:54] <vsmatck> Seems like your problem may be similar to theirs. I doubt their indexes are complicated. They just have lots of documents.
[17:37:43] <darklrd> thank you so much! :D
[17:37:46] <vsmatck> Oh, also if you're planning on doing pagination of chat logs you can group in to pages. heh
[17:38:59] <vsmatck> Or chunks based on time range. Like you just keep dumping all logs for a particular hour in to one document.
[17:40:01] <darklrd> Hmm, yes
[18:23:00] <taf2> hey, i'm trying to use addToSet with timestamps and it's not working…
[18:23:19] <taf2> these are all the same: "tsv" : [ ISODate("2012-09-08T18:20:00.842Z"), ISODate("2012-09-08T18:20:00.985Z"), ISODate("2012-09-08T18:20:00.583Z") ]
[18:23:28] <taf2> and my update/upsert was this
[18:23:37] <taf2> '$addToSet': {tsv: params.ts}
[18:23:49] <taf2> where params.ts was a new Date(), that I rounded the seconds to 0
[18:24:07] <taf2> oh… maybe it's the Z field?
[18:24:21] <taf2> 842Z and 985Z being different
[18:25:00] <taf2> oh yeah, must set the millisecond
[18:26:27] <taf2> yep, needed to also set setMilliseconds(0)
[18:26:47] <taf2> wow, when i started out with mongodb i was really dumb about how i used it
[18:27:01] <taf2> learning to use upsert now
[18:27:08] <taf2> and actually setting my _id's to meaningful values…
[18:27:18] <taf2> anyways thanks for mongodb
[18:30:18] <taf2> oh.. question, so i have a sharded mongodb… on a collection visitors… using the default _id field… if i changed that to something a bit more meaningful to my domain… what do i need to consider in terms of the mongodb sharding?
[18:30:21] <taf2> if anything?
[21:48:02] <gigo1980> hi i have big local files in my collection what is going wrong ?
[21:51:20] <mrpoundsign> what do you mean by big local files?
[21:52:30] <mrpoundsign> gigo1980: might want to look at http://www.mongodb.org/display/DOCS/Excessive+Disk+Space
[22:18:04] <hallas> How can i insert a document to an embedded collection?
[22:18:15] <hallas> upsert of some kind?
[22:26:35] <gigo1980> mrpoundsign: on the filesystem of the set there are some local.* files, about 20 GB
[22:26:42] <MrHeavy> I'm trying to set up my first replica set and I'm having trouble following the rs.status() output
[22:26:53] <MrHeavy> stateStr is "RECOVERING" but errmsg is "initial sync couldn't connect to it-mongodb01:27017"
[22:27:09] <MrHeavy> Is that errmsg indicating a current or historical condition w.r.t replication?
[22:54:45] <Vile> hi guys! Would aggregation framework work with multiple different fields from array?
[22:56:15] <Vile> i.e. i have { a:[{b:"123"},{b:"456"}] } and want to get {result:["123","456"]}