PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 8th of August, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:59:33] <vsmatck> acidjazz: Google for "mongodb journaling" click first link.
[01:08:54] <acidjazz> vsmatck: it points to the mongodb doc page which does not specify how to check if its on
[01:09:03] <acidjazz> vsmatck: ive clicked every link on the results page
[02:42:39] <warz> can you create an index on a property and put ObjectIds in it? ive got an index on a field, and am putting ObjectIds into it. my queries don't seem to be returning anything, though.
[02:42:53] <warz> and i've got the query params correct, it's just not returning results.
[02:43:10] <warz> (this would result in both an _id index, and another index)
[02:44:39] <warz> doh. im an idiot, nevermind.
[03:59:28] <icicled> is there a way to pipe input to mongorestore ?
[04:00:42] <skot> no, not really
[04:00:47] <skot> What are you trying to do?
[04:36:00] <circlicious> bro
[04:36:07] <circlicious> how do i know if i am running with journaling enabled or not?
[04:41:02] <circlicious> my mongod wont start, this is the error, what should i do, should i repair ? http://pastie.org/private/fstcecyi0solcpxbgoppta i can see the /data/db/journal along with files in it, dunnpo why its happening, ugh
[04:49:25] <circlicious> good
[04:49:34] <circlicious> repair doesnt work, journal files arnt there, mongod wont start :|
[05:02:49] <skot> What version is it?
[05:03:00] <skot> Is it part of a replica set?
[05:06:19] <circlicious> dont think so, i dont use replication
[05:06:34] <circlicious> db version v2.0.0, pdfile version 4.5
[05:11:47] <circlicious> skot: can you help me please?
[05:15:45] <skot> What startup options are you using?
[05:16:20] <skot> by default journaling is on for 2.0.0 so you would have to turn it off.
[05:16:39] <skot> Do you have the logs from last time it was shutdown, before the current error message?
[05:17:01] <circlicious> not sure
[05:17:06] <circlicious> i removed mongod.lock
[05:17:23] <circlicious> now whenever i star tit, it preallocates journal files, EVERYTIME
[05:17:41] <circlicious> and doesnt starts with service mongodb start :/
[05:18:56] <circlicious> quite frustrating
[07:10:49] <lizzin> so i have a db full of collections that consist of info on various business including location(in lat, long),name, address, type of business and a few other misc info
[07:12:09] <lizzin> then the foursquare checkin provides my app with a lat and a long. but this lat and long rarely ever match up 100% with the data in the db. kind of expected...
[07:12:28] <lizzin> but what is a good way to verify that the two locations are the same?
[07:12:35] <lizzin> with mongodb
[07:13:30] <lizzin> the best i can come up with is to take the foursquare lat+long then do a near query on the db and then do some sort of funky 'address/name' matching
[07:14:27] <lizzin> what is a better way to do this
[07:44:14] <[AD]Turbo> hola
[08:51:05] <mbuf> is there a tutorial that I can follow to write unit, functional tests for Rails with mongodb?
[08:56:38] <remonvv> unit tests with mongodb are going to be tricky ;)
[08:56:49] <remonvv> integration tests are a bit of a pain too
[08:58:11] <mbuf> remonvv: I see
[08:58:24] <remonvv> the problem is that there is no way to mock or embed MongoDB
[08:58:24] <mbuf> remonvv: so how do people write tests?
[08:58:41] <remonvv> there is no in-memory version and I'm not aware of any language that has a full MongoDB mock implementation.
[08:58:48] <mbuf> remonvv: I see
[08:59:05] <remonvv> With difficulty. Our integration tests run mongod instances, load test data, run the test, and kill the instance.
[08:59:07] <mbuf> remonvv: or, is there a way to keep a check on the code?
[08:59:15] <remonvv> It's slow, but it works.
[08:59:22] <mbuf> remonvv: do you have some documentation that I can refer for the same?
[09:00:16] <remonvv> Not really, certainly not for rails.
[09:00:39] <mbuf> remonvv: okay
[09:00:42] <remonvv> In Java we use basic JUnit for our tests where each test case starts a new mongod instance.
[09:00:48] <remonvv> I suppose you'd have to do the same,.
[09:01:06] <remonvv> It's a pain. 10gen needs to spend time on "testability" of MongoDB ;)
[09:01:51] <remonvv> Or someone else really. We need mock implementation of at least the basic subset of MongoDB functionality.
[09:10:53] <redwing> Has anyone run mongodb on sparc64 before?
[09:18:18] <remonvv> Those still exist? It won't build. MongoDB is little endian only for now.
[09:20:25] <_johnny> they're always making fun of the little indian guy :(
[09:21:08] <ron> last week we discovered that a guy from work things that indie games are made... in india.
[09:21:58] <NodeX> lmao
[09:21:59] <_johnny> :)
[09:22:41] <ron> obviosuly /s/things/thinks
[09:22:51] <ron> wtf is wrong with me? obviously!
[09:23:02] <NodeX> *where do we start*
[09:23:05] <_johnny> s/obvio/seriou
[09:23:38] <_johnny> that runs default in my irssi. makes discussions far less .. serious
[09:24:53] <redwing> hmm... isn't the SparcV9 bi-endian ?
[09:26:32] <remonvv> All Sparcs are big-endian iirc, hence MongoDB not compiling on them for the timing being ;)
[09:26:36] <NodeX> I think that's a personal lifestyle choice and we should judge V9
[09:26:50] <ron> shouldn't.
[09:27:02] <ron> and I wholeheartedly agree.
[09:27:03] <NodeX> err yup lol
[09:27:56] <remonvv> ron, do something useful here for a change
[09:28:04] <remonvv> commence..
[09:28:22] <ron> remonvv: I _knew_ that was your fantasy!
[09:28:28] <ron> I'm not some piece of meat!
[09:28:38] <ron> you are not the boss of me! I am the boss of me!
[09:29:54] <remonvv> I think we both know that that's not true.
[09:30:08] <remonvv> Anyone ever had issues with indexes magically disappearing after removeShard?
[09:30:37] <remonvv> or actually..removeShard -> addShard
[09:30:37] <ron> niemeyer: hello, and welcome to #mongodb. we hope you enjoy your stay. if you have any questions, please feel free to address them to the general public in the channel. please come again.
[09:31:01] <NodeX> ron is now the doorman?
[09:31:17] <ron> I'm the little indian.
[09:31:26] <kali> another fatasy of yours ?
[09:31:32] <remonvv> No I think business tends to be better if the doorman is attractive
[09:31:37] <remonvv> Visually, I mean.
[09:31:50] <NodeX> get him a tiarra
[09:31:50] <remonvv> As in not horrible looking.
[09:31:54] <NodeX> tiara *
[09:31:58] <remonvv> As in not an eyesore
[09:32:11] <remonvv> You know, because ron isn't pretty
[09:32:20] <remonvv> Just in case someone missed the implication.
[09:32:25] <NodeX> hahaah
[09:32:31] <NodeX> I don't get it?
[09:32:37] <NodeX> can you be more specific ?
[09:32:56] <ron> remonvv: you're just jealous.
[09:38:12] <remonvv> NodeX, you never get it :(
[09:39:36] <ron> remonvv: that's why he uses PHP.
[09:48:11] <NodeX> lol
[09:50:20] <NodeX> and yet my apps/sites are top 1% of speed in the world, imagine what would happen if I wrote in a fast language :P
[09:51:17] <_johnny> NodeX: out of curiocity, do you use php -> bytecode at all in any of your projects?
[09:51:28] <_johnny> curiousity*
[09:56:11] <NodeX> what do yo mean by bytecode?
[09:56:17] <NodeX> compiled using hiphop or somehting?
[09:57:47] <_johnny> yeah
[09:58:02] <_johnny> i'm out of the php loop, so not sure what's the latest, but yeah, that's what i meant
[09:58:47] <NodeX> on some sites where i need absolute speed yes
[10:00:07] <NodeX> my bottlenecks are -almost- always network card / bandwidth so it's not always needed
[10:00:29] <_johnny> i agree. or db
[10:00:55] <ron> NodeX: that's because nobody uses your sites.
[10:00:57] <_johnny> at least in my experience. sure languages will vary, but it's not the most speed unfluencial :P
[10:01:00] <NodeX> ron : LOL
[10:01:03] <_johnny> ouch, burn
[10:01:05] <_johnny> :p
[10:01:43] <NodeX> BAD php will kill a site
[10:01:50] <NodeX> same as BAD java or BAD python
[10:02:01] <_johnny> exactly
[10:02:17] <ron> of course.
[10:02:18] <NodeX> doesn't make every Java/Python coder bad coders
[10:02:28] <NodeX> also doens't make the language's bad
[10:02:32] <NodeX> doesn't
[10:02:57] <ron> unfortunately, there's no such thing as GOOD php :)
[10:03:00] <_johnny> except php
[10:03:04] <_johnny> i kid
[10:05:20] <NodeX> *yawn*
[10:05:26] <ron> heh :)
[10:10:13] <_johnny> ron: i do, however, think facebook disagrees :)
[10:10:29] <ron> _johnny: facebook doesn't use php.
[10:10:41] <redwing_> it abuses it?
[10:11:01] <_johnny> well, not at the end, but wasn't the whole point of hiphop to use it before compilation?
[10:11:21] <ron> it's because they thought they're smart.
[10:11:26] <ron> they're not.
[10:11:50] <_johnny> actually it was more of a timebubble thing
[10:12:02] <ron> well, lunchtime!
[10:12:24] <_johnny> php was big back then, and rewriting everything just because people prefer ruby now, and then to scala in a few years because whatever, isn't "smart" :P
[10:12:39] <_johnny> which is why you have thrift
[10:13:48] <ron> yes. thrift was the best invention of them all.
[10:14:09] <ron> seriously though, lunchtime.
[10:14:13] <_johnny> :)
[10:22:17] <Derick> PHP is still big :-þ
[10:31:07] <NodeX> it's the biggest language or has the largest share iirc
[10:31:17] <NodeX> somehting like 74% of websites run php
[10:43:35] <Bartzy> Hey
[10:43:45] <Bartzy> Journal isn't written to disk every time it's written to ?
[10:46:17] <Derick> Bartzy: no, not guaranteed to be on disk *every time*
[10:46:35] <Derick> however, it syncs to disk a lot more often than the normal data files
[10:46:37] <Bartzy> so how journaling makes mongo safe ?
[10:46:56] <Bartzy> if the server crashed before the journal synced to disk - it is in dirty state ?
[10:47:00] <Derick> it makes it safer
[10:47:00] <Bartzy> or can be ?
[10:47:08] <Bartzy> how come ?
[10:47:25] <Derick> it's safer than having to wait the minute for a data file sync
[10:47:46] <Derick> the journal is synced every 100ms IIRC
[10:47:59] <Derick> (and it's a lot faster to sync than the data files)
[10:49:33] <Bartzy> Why it's a lot faster to sync ?
[10:49:49] <Bartzy> And still, if it the data files are corrupted during a fail - why the journal helps ?
[10:49:55] <Derick> the files are smaller, and it's append only, so that just makes it a lot faster operation
[10:50:14] <Derick> the data files can't get currupted, just missing information
[10:50:22] <Derick> which could be readded from the journal
[11:10:39] <Bartzy> Derick: But their state is unknown
[11:10:43] <Bartzy> both for the data files and the journal files
[11:14:55] <Bartzy> Also - What is the working set ?
[11:15:15] <Bartzy> I read an example - if you have a million user, but only half are active, then only those half data is your working set
[11:15:21] <Bartzy> active = ? Once a day? 10 times a second ?
[11:15:30] <NodeX> LRU probably
[11:15:36] <NodeX> or the opposite of LRU
[11:16:03] <Bartzy> NodeX: Can you explain ? :)
[11:16:07] <Bartzy> What does it have to do with LRU ?
[11:16:35] <Bartzy> working set is different in time. The working set per second is very different than the working set per minute
[11:19:17] <cmex> hi all
[11:19:44] <remonvv> Bartzy, working set is your hot data. Hot data in the MongoDB sense is data that is in physical memory due to the OS swapping it in there based on frequent access/MRU.
[11:19:58] <cmex> i have a question
[11:20:09] <NodeX> ^
[11:20:38] <cmex> someone is working with c# driver here?
[11:21:08] <cmex> hello?
[11:21:10] <remonvv> Bartzy, the working set isn't really defined per specific timespan. It's all the data MongoDB can keep into physical memory. That's why things like right balancing for large indexes is as important as limiting the amount of hot data you have at any point in time (if possible, which it isn't always)
[11:21:40] <Bartzy> How can I know how much RAM do I need then
[11:21:44] <remonvv> cmex, take a chill pill dude :) C# is not as common as other options. If someone is working with it and is reading this I'm sure they'll respond.
[11:22:03] <Bartzy> Ideally it's working set + indexes in RAM. But what is the working set. If it's the data mongodb can keep into RAM, then I can never know how much RAM I need :)
[11:22:47] <remonvv> Bartzy, hard to say. Easy answer is as much as you can afford, the slightly more complicated answer involves testing or estimating your data sizes and which chunk of it will be hot at any point in time.
[11:23:06] <remonvv> My strategy tends to involve worst case scenario tests with real data.
[11:23:23] <S7> Hi, i'm coming from the sql world, we're usually creating a stored procedure and then calling it from the code, is it a good practise creating a stored javascript and calling from the code or it's best to avoid?
[11:23:58] <remonvv> There's a reason a lot of developers/companies lean towards "try it and see" ways of determining hardware costs/utilization. It's becoming increasingly hard to predict what kind of resources deliver what kind of performance.
[11:24:06] <remonvv> S7, bad idea.
[11:24:45] <cmex> remonvv because of performance or othe reason
[11:24:46] <remonvv> S7, anything JavaScript is a bad idea for anything where performance is an issue so if you'd normally write a stored procedure (performance optimalisation) then in MongoDB you should go for a native query.
[11:25:15] <S7> tnx remonvv =)
[11:25:16] <remonvv> cmex, not sure. Just not that many C# developers here. Java, PHP, Python, C and Ruby are most common.
[11:25:24] <remonvv> S7, np
[11:25:41] <kali> S7: avoid server side javascript at all cos
[11:25:42] <kali> t
[11:26:17] <Bartzy> remonvv: We can afford a 512GB RAM server. Doesn't mean we want to waste money :|
[11:26:28] <Bartzy> So I guess my question is how do I estimate when I need more RAM
[11:26:33] <Bartzy> and what are my initial RAM requirements.
[11:26:35] <cmex> remonvv a question about performance it was with question of S7
[11:26:42] <Bartzy> Index size is easy to know.
[11:27:16] <Bartzy> even a rough estimation
[11:29:10] <NodeX> when your app slows down :P
[11:29:59] <remonvv> Bartzy, I'm not saying get the biggest server you can find. I'm just saying create a realistic test and run it on different hardware. Performance profiles/bottlenecks are pretty straightforward for MongoDB.
[11:30:27] <remonvv> Bartzy, you need more RAM if your pagefaults/second are non-zero/high continuously.
[11:30:33] <gigo1980> hi all, i have a map reduce function in the map reduce function i all create an dynamic array with hashes as key ? is this posible ?
[11:30:41] <Bartzy> remonvv: Right, but still, there are recommendations everywhere on the web to get as much RAM as the index size + working set. And I didn't find a clue about what a working set really is.
[11:30:41] <remonvv> But please note, even querying data on disk is relatively fast on MongoDB.
[11:30:55] <remonvv> A lot of people that ask these things tend to find out they can do what they need on relatively little hardware.
[11:31:31] <remonvv> Working set is hard to define
[11:32:09] <remonvv> Well, it's easy to define, hard to estimate its size ;)
[11:33:13] <remonvv> Basically think of it like this; for queries that are important for your application speed as much *AS POSSIBLE* should be in memory. Most application we make at my company tend to have working sets of only a few Gb even though we serve over 100,000 concurrent users on a frequent basis.
[11:34:03] <remonvv> So look at your software, evaluate which queries are executed frequently and go from there.
[11:34:19] <remonvv> I'm assuming this is so relevant for you because you intend to buy hardware rather than rent it?
[11:34:39] <Bartzy> remonvv: No, we already rented the hardware
[11:34:57] <Bartzy> But I just reread some of the stuff in MongoDB in action and was curious, because it's so unintuitive.
[11:35:04] <Bartzy> (the working set estimation)
[11:35:48] <remonvv> It's not. It's roughly similar to any other database. What makes it slightly less intuitive is that MongoDB's storage engine is built directly on top of OS mapped memory functionality.
[11:35:59] <remonvv> Rather than hard configuring query buffer and cache sizes.
[11:36:15] <Bartzy> Well, I know the queries that are executed frequently. Those are for active users. But what happens when a active user that didn't visit the app, visits it now and needs his/her data shown. It's not in RAM, so they have a bad experience. If they come back soon, it may be in RAM.... Very hard to know
[11:36:32] <Bartzy> remonvv: Of course, didn't say this is specfiic to MongoDB
[11:36:39] <Bartzy> It's a vague term :)
[11:36:42] <remonvv> Well, keep in mind we're usually talking about the difference between 0.01ms and 1ms
[11:36:52] <remonvv> Especially if the backing IO solution is any good.
[11:37:00] <Bartzy> I don't think so.
[11:37:01] <kali> it's more like 10ms it is has to come from disk
[11:37:04] <kali> +if
[11:37:25] <remonvv> Nah, our average from disk response time for single doc queries is sub 1ms
[11:37:26] <Bartzy> We have SSDs as the storage backend, in RAID 10, and queries not from RAM takes 500ms+
[11:37:39] <Bartzy> remonvv: How do you know average?
[11:37:43] <Bartzy> How do you know if you hit disk or not
[11:37:43] <remonvv> Bartzy, that seems incredibly unlikely.
[11:37:51] <remonvv> cold tests
[11:37:53] <kali> remonvv: with magnetic disks ?
[11:37:57] <remonvv> no, SSD
[11:37:59] <Bartzy> remonvv: Well, 100 million docs
[11:38:00] <kali> ha :)
[11:38:05] <Bartzy> remonvv: What are cold tests
[11:38:24] <NodeX> why dont you save a headache and get smaller servers and shard?
[11:38:25] <Bartzy> BTW - there is no query result cache at all in MongoDB ?
[11:38:31] <Bartzy> i.e. I run my query twice - it happens twice ?
[11:38:46] <Bartzy> NodeX: I don't know if sharding is easier than getting big servers.
[11:38:47] <remonvv> Bartzy, that isn't really relevant. The time added for a pagefault is the diskswap. Are you sure you're testing single doc queries and that your indexes are hitting?
[11:38:59] <remonvv> 500ms is very slow regardless for a lookup on an indexed field.
[11:39:09] <Bartzy> remonvv: No, these are multiple docs
[11:39:18] <remonvv> Bartzy, ah :) How many?
[11:39:25] <Bartzy> remonvv: ~1500 nscanned
[11:39:31] <Bartzy> sometime more
[11:39:46] <Bartzy> I can test now for not-in-RAM single doc query. How do I clean the cache? Only by restart ?
[11:39:52] <remonvv> Okay, yeah that can happen in that case.
[11:39:58] <remonvv> Yep, restart.
[11:40:01] <Bartzy> damn
[11:40:02] <Bartzy> :)
[11:40:09] <Bartzy> no such thing as echo 3 > /sys/something? :)
[11:40:15] <Bartzy> I remember something like that
[11:40:26] <remonvv> It's not a super useful test anyway. Just do a smoke test on your hardware and see what's bottlenecking
[11:40:29] <kali> Bartzy: sync && echo 1 > /proc/sys/vm/drop_caches
[11:40:30] <remonvv> You know about mongostat?
[11:40:35] <Bartzy> smoke test ?
[11:40:37] <kali> or 3, yes
[11:40:45] <Bartzy> remonvv: Yeah, but what does it give me in terms of bottlenecks ?
[11:41:31] <remonvv> well, high pagefaults means it's swappig a lot, in/out stats give you network bottlenecks if any, and lock % is extremely relevant to overall query performance
[11:41:49] <Bartzy> wow that was fast.
[11:42:02] <Bartzy> now I don't know an ObjectId to query :D
[11:42:29] <remonvv> that's rather relevant ;)
[11:42:53] <Bartzy> ok so now I do db.things.find({_id: one-id-here}).explain and see the ms ?
[11:43:13] <remonvv> yep
[11:43:25] <Derick> restart of mongodb doesn't really help either... you need to reboot the machine really :-)
[11:43:37] <Bartzy> mongod still has 72G mapped in htop. Irrelevant ?
[11:43:50] <kali> Derick: /proc/sys/vm/drop_caches should work, i think
[11:43:50] <Bartzy> Derick: But the command kali provided did clear the cache.
[11:43:59] <Derick> ok
[11:44:02] <Derick> neat trick
[11:44:03] <Bartzy> but now indexes are not in RAM :|
[11:44:07] <Bartzy> so that's an issue :p
[11:44:42] <Bartzy> remonvv: How do you did that one-doc test after restart - if the index is not in RAM ? :p
[11:44:50] <Bartzy> 0ms
[11:44:58] <remonvv> I don't want the index in ram
[11:45:20] <remonvv> Worst case scenario is nothing in ram, and the index pages are swapped in and out just like normal data.
[11:45:39] <Bartzy> when I do that one-doc search - it means the entire _id index is getting into RAM ? Or only part of it ?
[11:45:40] <remonvv> Yeah it's 0ms according to explain from SSD backed volumes, but it's slightly more
[11:45:49] <remonvv> its int math rather than fractional ;)
[11:46:02] <remonvv> do the test by running mongostat
[11:46:05] <remonvv> side by side
[11:46:10] <remonvv> you should see one or two pagefaults
[11:47:01] <remonvv> followed by quite a few more
[11:47:34] <remonvv> Typically I see 1 or 2, and I think on sharded environment followed by a swap in of an entire chunk or something.
[11:47:49] <remonvv> Not exactly sure if MongoDB is prefetching something or if it actually needs 40-50 pages.
[11:48:14] <Bartzy> but how come mongo doesn't get the entire index ?
[11:48:21] <NodeX> Bartzy : are you using a PHP framework for your app ?
[11:48:22] <remonvv> Why would it? An index can be 100Gb
[11:48:24] <Bartzy> and how does it know where the index is in the data files ? :o
[11:48:28] <Bartzy> NodeX: No, why ?
[11:48:51] <Bartzy> remonvv: Because how else can it know where in the btree it should look for - it will need an index for the index :p
[11:49:00] <NodeX> I was going to comment that all the performance you gain with micromanaging things you will lose in a framework
[11:49:27] <Derick> the index are btrees, so only the relevant parts of the btree have to be pulled into memory
[11:49:30] <Derick> and yes, there is an "indx" to teoo mongodb where everything is on disk
[11:49:43] <remonvv> Bartzy, that's not quite how it works though is it ;) It knows where the index data is and is addressing that data in the virtualized memory. The OS will swap that page into physical memory if it isn't already (and is hot enough).
[11:49:45] <Bartzy> NodeX: I'm just learning that way about mongo. Of course I don't really care if a document takes 4ms to fetch or 0.1ms.
[11:50:25] <Bartzy> Derick: But how does it know where are the relevant parts ?
[11:50:31] <Bartzy> of the btree
[11:50:44] <remonvv> Bartzy, it's more an OS MRU paging thing than a MongoDB thing. MongoDB uses memory mapped files. The data files (data and indexes) are addressed through that and hot pages are swapped in based on the OS memory management.
[11:51:03] <Derick> Bartzy: it's part of the btree structure
[11:51:09] <Bartzy> right - but how MongoDB knows what pages to ask for
[11:51:11] <remonvv> Bartzy, every collection maintains metadata concerning the index b-trees.
[11:51:20] <remonvv> MongoDB isn't asking for pages.
[11:51:23] <remonvv> the OS is.
[11:51:28] <NodeX> lolol#
[11:51:46] <Bartzy> And what is in that metadata? where in the data files are parts of the indexes ?
[11:51:49] <NodeX> mongo is asking for this mapped file and the OS is doing the work
[11:52:10] <Bartzy> If it's asking for a mapped file only then the OS would load the entire thing into memory
[11:52:15] <Bartzy> so I guess it asks for a specific part of a mapped file ?
[11:52:44] <remonvv> No, it doesn't care.
[11:52:45] <NodeX> that assumes the OS maps entire indexes to 1 file no?
[11:52:58] <remonvv> It simply says "load memory from address X to X+S"
[11:53:25] <remonvv> OS figures out if that range of the memory mapped file is in memory and if it chooses to (usually based on an MRU scheme) it swaps those file pages to physical memory.
[11:53:41] <remonvv> MongoDB knows where to look for what because it maintains index metadata per collection.
[11:53:41] <kali> or rather "ensure address X to X+S is loaded"
[11:53:48] <remonvv> kali, right
[11:53:50] <remonvv> well
[11:54:15] <remonvv> not really, it doesn't ensure it's loaded, not all OSs swap directly at the first read if physical memory is highly contended
[11:55:01] <Bartzy> And Mongo knows to ask for Address X to X+S because it has this metadata about all addresses ?
[11:55:05] <remonvv> right
[11:55:18] <Bartzy> And mongo loads that metadata from disk at start up or something ?
[11:55:40] <remonvv> it knows where the b-tree for that colleciton is located based on some metadata (I think .ns files right Derrick?) and it simply addresses that space.
[11:55:47] <NodeX> [12:49:19] <@Derick> and yes, there is an "indx" to teoo mongodb where everything is on disk
[11:55:52] <remonvv> I *think* it loads it when it needs to
[11:56:09] <NodeX> I think translated that means "there is an index to tell mongod where everything is
[11:56:16] <remonvv> But b-tree offsets might be the exception. It would be a good reason why there's a 14k collection limit.
[11:56:31] <remonvv> There is, I think it's the .ns (namespace) file.
[11:56:46] <remonvv> Derrick can tell us. He has the 10gen stamp of approval.
[11:56:50] <remonvv> :)
[11:56:54] <Bartzy> :)
[11:56:58] <Bartzy> Thanks for all the info
[11:57:06] <kali> as for the rationale behind it, there is a very good article there: https://www.varnish-cache.org/trac/wiki/ArchitectNotes
[11:57:14] <kali> it's about varnish, but the same apply to mongodb
[11:58:03] <remonvv> Bartzy, also, if you're worried about performance start small but enable sharding from the start.
[11:58:25] <remonvv> Very easy to scale up as needed and sharding scales up memory along with all other hardware resources.
[11:58:48] <Bartzy> Yeah it was not our first thought and now we're already deep into "scale vertically as much as possible before sharding"
[11:58:49] <Bartzy> :|
[11:58:51] <NodeX> [12:38:12] <NodeX> why dont you save a headache and get smaller servers and shard?
[11:58:56] <NodeX> 20 mins ago!
[11:59:01] <kali> NodeX :)
[11:59:06] <NodeX> lmfao
[12:01:10] <kali> i also like to have an easy "scale up" solution: i'm not using the biggest server available (on aws). if i find myself cornered, i can scale up easily and then start to find out what's wrong
[12:01:33] <cmex> noone is using c# driver here? :(
[12:01:46] <NodeX> people in here have brains cmex :P
[12:01:54] <NodeX> best to ask in the google group
[12:02:21] <cmex> i dont want to star holywar just asking the question NodeX
[12:02:35] <NodeX> and I am just saying. Best to ask in the google group
[12:02:57] <NodeX> not alot of people in here do use it
[12:39:55] <remonvv> kali, we use that strategy too. The only concern is that 10 big nodes are much more reliable than 80 small nodes.
[12:40:10] <remonvv> So there is some guesstimation needed to know what to start with.
[12:41:34] <remonvv> cmex, ignore the haters! :) C# is pretty nice. Just not hugely popular around here.
[12:55:27] <gigo19801> what is the best way to format the system
[12:55:59] <gigo19801> sorry, is there an way to get an customFormat of the DateTIme inside mongodb
[12:59:17] <NodeX> that's an appside problem
[13:13:55] <PDani> hi
[13:16:13] <PDani> i have a single-instance mongodb with a collection with 3 fields: _id, block_id, payload. payloads are always 4096byte binaries. _id is an ever-incremented unique integer. there is a secondary index on the collection, { "v" : 1, "key" : { "block_id" : 1, "_id" : -1 }, "ns" : "mongobd.testdev", "name" : "_block_id_id" }
[13:18:15] <PDani> i'm doing many queries like: query: { query: { block_id: 868413 }, orderby: { _id: -1 } } ntoreturn:1 nscanned:1 nreturned:1 reslen:4166 163ms, there's no other query during these. when i read sequentially by block_id, it's 10 times faster than when i query with random block_id
[13:19:53] <PDani> i have low cpu usage, low storage utilization. the collection is 2-3 times bigger than the memory size. I don't know what can be the bottleneck?
[13:30:57] <remonvv> PDane, all else being equal you're looking at the performance difference between reading from a virtualized page that is in physical memory and one that is not. Sequential data is likely to be in memory while randomly accessed data typically is not.
[13:31:30] <remonvv> run mongostat and see if faults/sec rises when you switch to random reads compared to sequential ones.
[13:31:50] <remonvv> Also, share your indexes
[13:31:54] <remonvv> pastie.org
[14:08:56] <gigo19801> how can i make sub one datetime from an other datetime ?
[14:11:33] <remonvv> App side, yes
[14:12:15] <remonvv> Or through the AF if you're 2.2+, I think there are some date functions
[14:30:08] <zakg> i am encountering an error importing pymongo
[14:31:19] <ranman> zakg: are you sure you're using the right python and that you're installing pymongo to the correct version of python?
[14:31:56] <zakg> yes.i m using django-mongodb
[14:34:25] <zakg> ranman,http://dpaste.com/783266/
[14:35:20] <zakg> i am using python version 2.7.2 and pymongo 2.1.1
[14:37:49] <zakg> objectid stays in bson not in pymongo
[15:01:40] <Bilge> If I have a LOT of transaction records should I store precalculated totals in another collection?
[15:02:04] <Bilge> Or must I always calculate aggregate values using queries
[15:03:04] <Bilge> I'm assuming there would be a significant performance difference that would make maintaining stored totals more efficient even though they could be vulnerable to going out of sync
[15:04:55] <PDani> I have as much page faults as queries, and on pagefaults, mongodb seems to read more data than needed (aggregated size of read documents is 10 times smaller than actual data read from disk)
[15:05:20] <PDani> is there some readahead, or whole-page-read in mongodb?
[15:05:47] <Derick> a pagefault will always read a whole page
[15:06:11] <PDani> and how can i set the pagesize to a smaller value?
[15:06:17] <Derick> you can't
[15:06:25] <Derick> it's an OS/hardware thing
[15:07:01] <PDani> it means, that random reads will always have a big overhead on ~5000byte documents?
[15:07:15] <linsys> The PAGESIZE is set at kernel compile time.
[15:07:16] <linsys> Furthermore, that selection is only valid for i386 hardware. If you run
[15:07:16] <linsys> a 64-bit system or any other architecture, the page size is 4K and
[15:07:16] <linsys> cannot be changed.
[15:07:31] <PDani> oh
[15:07:35] <PDani> it's 4k then
[15:08:03] <Derick> yes
[15:08:06] <PDani> it doesn't explain the 10 times bigger read speed on storage
[15:08:13] <PDani> hm
[15:08:16] <ranman> zakg: That seems like an error in django_mongodb_engine
[15:08:56] <PDani> one pagefault causes one pageread, right?
[15:09:32] <ranman> zakg: you can do pip install pymongo==1.7 and that will probably make it work
[15:09:45] <Derick> PDani: the index also needs to be read
[15:10:05] <PDani> Derick, there's no index miss on mongostat
[15:10:36] <Derick> doesn't mean the index shouldn't be read from disk if it's not in memory
[15:11:20] <PDani> my index size is 300MB, and the instance have 3.7G memory, mongo seems to be using only 2G of it
[15:11:31] <PDani> i think it should fit into memory
[15:12:18] <linsys> Derick: index miss means the index wasn't in memory when it was asked for or referenced.
[15:12:47] <PDani> so index fits into memory in my case
[15:12:56] <PDani> because there's no index miss
[15:12:59] <remonvv> It wont be unless it is hit by MongoDB
[15:13:15] <remonvv> it wont load it into memory, it will end up in memory if needed.
[15:13:22] <remonvv> There is a subtle difference
[15:13:22] <linsys> right..
[15:14:51] <remonvv> and yes, index misses is similar to faults/sec in that it tells you how often a b-tree node is accessed that isn't in physical memory
[15:15:08] <remonvv> unfortunately percentage isn't the most useful statistic ever but hey ;)
[15:15:08] <PDani> is there any readahead in mongodb when i do a query like { $query: { block_id: 685233 }, $orderby: { _id: -1 } } ntoreturn:1 nscanned:1 nreturned:1 reslen:4166 134ms?
[15:16:16] <PDani> because it seems that mongodb reads from disk much more data than the size of documents i query for
[15:16:16] <remonvv> if there are multiple matching document it'll prepare an initial resultset that the returned cursor can iterate over. If it needs more it will issue getMore commands
[15:16:18] <remonvv> well it reads at minimum 1 page
[15:16:22] <remonvv> 1 memory page
[15:16:36] <remonvv> and usually quite a bit more
[15:16:42] <remonvv> Why are you so worried about all this though?
[15:16:43] <linsys> PDani: How are you determining this? That mongodb reads from disk more data then the size of the document you query?
[15:16:47] <PDani> yes, I have a limit(1) so it should read at most 2 pages
[15:16:52] <remonvv> This is all pretty irrelevant
[15:17:13] <zakg> ranman,it works.
[15:17:26] <remonvv> limit(1) has nothing to do with it, that just tells mongo to only return 1 document. Is very speedy if you're not sorting but you are so it has to sort the resultset before it can determine what the first document is.
[15:17:28] <PDani> linsys, I calculate the size of read documents per sec on client side, and i see the iostat output
[15:17:35] <ranman> zakg: yeah -- it just isn't using the latest version of the python driver.
[15:18:03] <PDani> remonvv, but i have an index for this: blockid:1, _id:-1
[15:18:07] <remonvv> PDani, you're looking at the wrong things. If you have performance issues insight into how MongoDB's mmap engine works isn't going to fix much for you. It's not something you can affect
[15:18:22] <zakg> ranman,are there no other upgraded versions?
[15:18:51] <remonvv> PDane, okay, that fixes that
[15:18:57] <remonvv> Anyway, i'm off, good luck ;)
[15:19:03] <ranman> not sure -- I guess I could submit a pull request later today
[15:51:53] <addisonj> has anyone noticed that the mongodb distro site is unbearably slow of late?
[16:25:47] <cedrichurst> dumb question… if i wanted to compute the cross-product of keys in two collections, would i need to do that at the application layer
[16:26:02] <cedrichurst> for example if i had one collection containing salesperson and another containing fiscal quarters
[16:26:24] <cedrichurst> and i wanted to create a new collection with a compound key for every salesperson in every fiscal quarter
[16:26:58] <cedrichurst> is that something mongo can handle natively?
[16:58:07] <Bilge> If I have a LOT of transaction documents should I store precalculated totals in another collection or always calculate aggregate values every time?
[17:09:00] <nickswe> Annoying on Windows 7 when trying to run "mongod.exe" it says that I am missing folder /data/db/... but I am having "dbpath=C:\mongodb\data" in my mongod.cfg... why does it not understand this?
[17:45:19] <brynary> hello -- I'm experiencing extremely slow MongoDB reads under what I think is a relatively low write load (few hundred/s). in mongostat, locked % gets very high, faults is 0
[17:46:32] <brynary> I'm running fast hardware, plenty of RAM, 8 CPUs. 15k RPM SAS drives. I'm not sure what sort of perf I should expect. many simple inserts are taking 350ms+
[17:46:41] <brynary> any pointers?
[18:21:07] <jarrod> namlook here?
[18:53:53] <Bilge> If I have a LOT of transaction documents should I store precalculated totals in another collection or always calculate aggregate values every time?
[19:17:19] <aster1sk> Hey guys, please critique http://pastie.org/private/r4ecadqwxurc1ibvqcps2q
[19:17:54] <aster1sk> New to mongo, I'd love some feedback on whether this appears to be a reasonable aggregate object or if I'm totally doing it wrong.
[19:37:22] <federated_life> whats a good way to tell if a replica member was a step down, something about op counters ?
[19:40:13] <crudson> aster1sk: really depends on what problem you are trying to solve, what your input documents are etc.
[19:41:28] <aster1sk> crudson: Thanks. This model is what I want however I'm afraid of the ridiculous nesting.
[19:42:14] <aster1sk> For instance I couldn't (in one query) determine how many android device views from canada... I suppose the frontend could figure that out but I'm not sure the boss will buy it.
[19:43:21] <aster1sk> Also when upsert / incrementing the view counts will require two queries to determine if the indexed array exists.
[19:47:05] <crudson> aster1sk: That epends whether you want this updated in realtime. I wouldn't be afraid to have as many map reduce operations as suits your querying needs, which shoul be your primary concern.
[19:47:22] <crudson> (sorry my cat jumpe on my keyboar and broke my D most of the time)
[19:48:46] <crudson> If you have a specific aggregate question you can get advice here for sure (I have to run out now but there'll be plenty of experts around)
[19:49:04] <aster1sk> Hah, yeah close to realtime would be a plus. The aggregate documents are 'per day per issue'. Doesn't have to be up to the minute but I'm sure they'd be happier with near-realtime.
[19:49:34] <aster1sk> Excellent feedback crudson, much appreciated.
[21:06:56] <ninegrid> anyone have any experience with the haskell driver?
[21:33:07] <wereHamster> I once wrote a 10 line snap server which displays data from mongodb.
[21:33:37] <wereHamster> I literally have no idea how it works. But it works. Which was good enough for me :)
[21:47:31] <fdv> Hi guys. It seems to me that renameCollection is a privileged command, and needs to be run while using the admin db. Now, the command takes two parameters, the old name of the collection and the new name, but can anybody tell me how to specify the database?
[21:51:22] <fdv> when I try to add the db name like db.runCommand({renameCollection: "thedb.foo", to: "thedb.bar"}), I get an error stating "exception: source namespace does not exist"
[21:52:15] <crudson> fdv: renameCollection is for within a single db only
[21:52:54] <crudson> fdv: additionally, it can be run from any db: db.fromCol.renameCollection('toCol')
[21:53:14] <fdv> crudson: but when I 'use thedb' and then try to rename "foo" to "bar", I get another error, { "errmsg" : "access denied; use admin db", "ok" : 0 }
[21:53:40] <fdv> and the docs say "You must run this command against the admin database. and thus requires you to specify the complete namespace (i.e., database name and collection name.)"
[21:54:12] <fdv> but I haven't tried the other syntax..
[21:54:50] <fdv> heh. that worked!
[21:55:38] <fdv> there's obviously something I don't get here... :p
[21:56:52] <crudson> it's an "administrative helper", so even though it may get execute in that namespace, the translation is done for you. http://www.mongodb.org/display/DOCS/dbshell+Reference#dbshellReference-AdministrativeCommandHelpers
[21:58:13] <fdv> crudson: ok, that makes it a bit clearer. thanks!
[22:01:50] <crudson> no probs
[22:12:01] <rossdm> question: I have a MongoDB replica set using IP over Infiniband. RCP'ing a file between mongo servers averages 200MB/s transfer speed. Sticking a file in GridFS and waiting for replication to two replicates yields 140MB/s out (70MB/s in for each server). Any ideas why replication would be slower?
[22:53:40] <Bilge> If I have a lot of transaction documents should I store precalculated totals in another collection or always calculate aggregate values every time?
[23:02:24] <dstorrs> Bilge: it depends on your use cae
[23:02:26] <dstorrs> *case
[23:02:44] <dstorrs> if you can calculate on the fly (in the browser), do so. that reduces your storage and CPU needs
[23:03:26] <dstorrs> if you can get away with just caching the latest data in (e.g.) memcached and doing the totals from there, fall back to there.
[23:03:34] <dstorrs> if you have to precalc and store in DB, do so.
[23:04:12] <dstorrs> but it's always best to distribute your processing onto the client if that doesn't compromise security and site performance. much cheaper than scaling your own hardware
[23:12:52] <Bilge> It's not heavy processing if you're constantly keeping totals updates
[23:13:03] <Bilge> It could even be just a simple +/-1 each time
[23:13:59] <Bilge> But it's conceivable that if you just maintain totals and records separately that they could somehow become out of sync, particularly if there are bugs in the code or transactions don't happen atomically