[00:59:33] <vsmatck> acidjazz: Google for "mongodb journaling" click first link.
[01:08:54] <acidjazz> vsmatck: it points to the mongodb doc page which does not specify how to check if its on
[01:09:03] <acidjazz> vsmatck: ive clicked every link on the results page
[02:42:39] <warz> can you create an index on a property and put ObjectIds in it? ive got an index on a field, and am putting ObjectIds into it. my queries don't seem to be returning anything, though.
[02:42:53] <warz> and i've got the query params correct, it's just not returning results.
[02:43:10] <warz> (this would result in both an _id index, and another index)
[04:36:07] <circlicious> how do i know if i am running with journaling enabled or not?
[04:41:02] <circlicious> my mongod wont start, this is the error, what should i do, should i repair ? http://pastie.org/private/fstcecyi0solcpxbgoppta i can see the /data/db/journal along with files in it, dunnpo why its happening, ugh
[07:10:49] <lizzin> so i have a db full of collections that consist of info on various business including location(in lat, long),name, address, type of business and a few other misc info
[07:12:09] <lizzin> then the foursquare checkin provides my app with a lat and a long. but this lat and long rarely ever match up 100% with the data in the db. kind of expected...
[07:12:28] <lizzin> but what is a good way to verify that the two locations are the same?
[07:13:30] <lizzin> the best i can come up with is to take the foursquare lat+long then do a near query on the db and then do some sort of funky 'address/name' matching
[07:14:27] <lizzin> what is a better way to do this
[09:28:38] <ron> you are not the boss of me! I am the boss of me!
[09:29:54] <remonvv> I think we both know that that's not true.
[09:30:08] <remonvv> Anyone ever had issues with indexes magically disappearing after removeShard?
[09:30:37] <remonvv> or actually..removeShard -> addShard
[09:30:37] <ron> niemeyer: hello, and welcome to #mongodb. we hope you enjoy your stay. if you have any questions, please feel free to address them to the general public in the channel. please come again.
[10:12:24] <_johnny> php was big back then, and rewriting everything just because people prefer ruby now, and then to scala in a few years because whatever, isn't "smart" :P
[11:19:44] <remonvv> Bartzy, working set is your hot data. Hot data in the MongoDB sense is data that is in physical memory due to the OS swapping it in there based on frequent access/MRU.
[11:21:10] <remonvv> Bartzy, the working set isn't really defined per specific timespan. It's all the data MongoDB can keep into physical memory. That's why things like right balancing for large indexes is as important as limiting the amount of hot data you have at any point in time (if possible, which it isn't always)
[11:21:40] <Bartzy> How can I know how much RAM do I need then
[11:21:44] <remonvv> cmex, take a chill pill dude :) C# is not as common as other options. If someone is working with it and is reading this I'm sure they'll respond.
[11:22:03] <Bartzy> Ideally it's working set + indexes in RAM. But what is the working set. If it's the data mongodb can keep into RAM, then I can never know how much RAM I need :)
[11:22:47] <remonvv> Bartzy, hard to say. Easy answer is as much as you can afford, the slightly more complicated answer involves testing or estimating your data sizes and which chunk of it will be hot at any point in time.
[11:23:06] <remonvv> My strategy tends to involve worst case scenario tests with real data.
[11:23:23] <S7> Hi, i'm coming from the sql world, we're usually creating a stored procedure and then calling it from the code, is it a good practise creating a stored javascript and calling from the code or it's best to avoid?
[11:23:58] <remonvv> There's a reason a lot of developers/companies lean towards "try it and see" ways of determining hardware costs/utilization. It's becoming increasingly hard to predict what kind of resources deliver what kind of performance.
[11:24:45] <cmex> remonvv because of performance or othe reason
[11:24:46] <remonvv> S7, anything JavaScript is a bad idea for anything where performance is an issue so if you'd normally write a stored procedure (performance optimalisation) then in MongoDB you should go for a native query.
[11:29:59] <remonvv> Bartzy, I'm not saying get the biggest server you can find. I'm just saying create a realistic test and run it on different hardware. Performance profiles/bottlenecks are pretty straightforward for MongoDB.
[11:30:27] <remonvv> Bartzy, you need more RAM if your pagefaults/second are non-zero/high continuously.
[11:30:33] <gigo1980> hi all, i have a map reduce function in the map reduce function i all create an dynamic array with hashes as key ? is this posible ?
[11:30:41] <Bartzy> remonvv: Right, but still, there are recommendations everywhere on the web to get as much RAM as the index size + working set. And I didn't find a clue about what a working set really is.
[11:30:41] <remonvv> But please note, even querying data on disk is relatively fast on MongoDB.
[11:30:55] <remonvv> A lot of people that ask these things tend to find out they can do what they need on relatively little hardware.
[11:31:31] <remonvv> Working set is hard to define
[11:32:09] <remonvv> Well, it's easy to define, hard to estimate its size ;)
[11:33:13] <remonvv> Basically think of it like this; for queries that are important for your application speed as much *AS POSSIBLE* should be in memory. Most application we make at my company tend to have working sets of only a few Gb even though we serve over 100,000 concurrent users on a frequent basis.
[11:34:03] <remonvv> So look at your software, evaluate which queries are executed frequently and go from there.
[11:34:19] <remonvv> I'm assuming this is so relevant for you because you intend to buy hardware rather than rent it?
[11:34:39] <Bartzy> remonvv: No, we already rented the hardware
[11:34:57] <Bartzy> But I just reread some of the stuff in MongoDB in action and was curious, because it's so unintuitive.
[11:35:48] <remonvv> It's not. It's roughly similar to any other database. What makes it slightly less intuitive is that MongoDB's storage engine is built directly on top of OS mapped memory functionality.
[11:35:59] <remonvv> Rather than hard configuring query buffer and cache sizes.
[11:36:15] <Bartzy> Well, I know the queries that are executed frequently. Those are for active users. But what happens when a active user that didn't visit the app, visits it now and needs his/her data shown. It's not in RAM, so they have a bad experience. If they come back soon, it may be in RAM.... Very hard to know
[11:36:32] <Bartzy> remonvv: Of course, didn't say this is specfiic to MongoDB
[11:38:24] <NodeX> why dont you save a headache and get smaller servers and shard?
[11:38:25] <Bartzy> BTW - there is no query result cache at all in MongoDB ?
[11:38:31] <Bartzy> i.e. I run my query twice - it happens twice ?
[11:38:46] <Bartzy> NodeX: I don't know if sharding is easier than getting big servers.
[11:38:47] <remonvv> Bartzy, that isn't really relevant. The time added for a pagefault is the diskswap. Are you sure you're testing single doc queries and that your indexes are hitting?
[11:38:59] <remonvv> 500ms is very slow regardless for a lookup on an indexed field.
[11:39:09] <Bartzy> remonvv: No, these are multiple docs
[11:40:45] <Bartzy> remonvv: Yeah, but what does it give me in terms of bottlenecks ?
[11:41:31] <remonvv> well, high pagefaults means it's swappig a lot, in/out stats give you network bottlenecks if any, and lock % is extremely relevant to overall query performance
[11:48:51] <Bartzy> remonvv: Because how else can it know where in the btree it should look for - it will need an index for the index :p
[11:49:00] <NodeX> I was going to comment that all the performance you gain with micromanaging things you will lose in a framework
[11:49:27] <Derick> the index are btrees, so only the relevant parts of the btree have to be pulled into memory
[11:49:30] <Derick> and yes, there is an "indx" to teoo mongodb where everything is on disk
[11:49:43] <remonvv> Bartzy, that's not quite how it works though is it ;) It knows where the index data is and is addressing that data in the virtualized memory. The OS will swap that page into physical memory if it isn't already (and is hot enough).
[11:49:45] <Bartzy> NodeX: I'm just learning that way about mongo. Of course I don't really care if a document takes 4ms to fetch or 0.1ms.
[11:50:25] <Bartzy> Derick: But how does it know where are the relevant parts ?
[11:50:44] <remonvv> Bartzy, it's more an OS MRU paging thing than a MongoDB thing. MongoDB uses memory mapped files. The data files (data and indexes) are addressed through that and hot pages are swapped in based on the OS memory management.
[11:51:03] <Derick> Bartzy: it's part of the btree structure
[11:51:09] <Bartzy> right - but how MongoDB knows what pages to ask for
[11:51:11] <remonvv> Bartzy, every collection maintains metadata concerning the index b-trees.
[11:51:20] <remonvv> MongoDB isn't asking for pages.
[11:52:45] <NodeX> that assumes the OS maps entire indexes to 1 file no?
[11:52:58] <remonvv> It simply says "load memory from address X to X+S"
[11:53:25] <remonvv> OS figures out if that range of the memory mapped file is in memory and if it chooses to (usually based on an MRU scheme) it swaps those file pages to physical memory.
[11:53:41] <remonvv> MongoDB knows where to look for what because it maintains index metadata per collection.
[11:53:41] <kali> or rather "ensure address X to X+S is loaded"
[11:55:18] <Bartzy> And mongo loads that metadata from disk at start up or something ?
[11:55:40] <remonvv> it knows where the b-tree for that colleciton is located based on some metadata (I think .ns files right Derrick?) and it simply addresses that space.
[11:55:47] <NodeX> [12:49:19] <@Derick> and yes, there is an "indx" to teoo mongodb where everything is on disk
[11:55:52] <remonvv> I *think* it loads it when it needs to
[11:56:09] <NodeX> I think translated that means "there is an index to tell mongod where everything is
[11:56:16] <remonvv> But b-tree offsets might be the exception. It would be a good reason why there's a 14k collection limit.
[11:56:31] <remonvv> There is, I think it's the .ns (namespace) file.
[11:56:46] <remonvv> Derrick can tell us. He has the 10gen stamp of approval.
[12:01:10] <kali> i also like to have an easy "scale up" solution: i'm not using the biggest server available (on aws). if i find myself cornered, i can scale up easily and then start to find out what's wrong
[12:01:33] <cmex> noone is using c# driver here? :(
[12:01:46] <NodeX> people in here have brains cmex :P
[12:01:54] <NodeX> best to ask in the google group
[12:02:21] <cmex> i dont want to star holywar just asking the question NodeX
[12:02:35] <NodeX> and I am just saying. Best to ask in the google group
[12:02:57] <NodeX> not alot of people in here do use it
[12:39:55] <remonvv> kali, we use that strategy too. The only concern is that 10 big nodes are much more reliable than 80 small nodes.
[12:40:10] <remonvv> So there is some guesstimation needed to know what to start with.
[12:41:34] <remonvv> cmex, ignore the haters! :) C# is pretty nice. Just not hugely popular around here.
[12:55:27] <gigo19801> what is the best way to format the system
[12:55:59] <gigo19801> sorry, is there an way to get an customFormat of the DateTIme inside mongodb
[13:16:13] <PDani> i have a single-instance mongodb with a collection with 3 fields: _id, block_id, payload. payloads are always 4096byte binaries. _id is an ever-incremented unique integer. there is a secondary index on the collection, { "v" : 1, "key" : { "block_id" : 1, "_id" : -1 }, "ns" : "mongobd.testdev", "name" : "_block_id_id" }
[13:18:15] <PDani> i'm doing many queries like: query: { query: { block_id: 868413 }, orderby: { _id: -1 } } ntoreturn:1 nscanned:1 nreturned:1 reslen:4166 163ms, there's no other query during these. when i read sequentially by block_id, it's 10 times faster than when i query with random block_id
[13:19:53] <PDani> i have low cpu usage, low storage utilization. the collection is 2-3 times bigger than the memory size. I don't know what can be the bottleneck?
[13:30:57] <remonvv> PDane, all else being equal you're looking at the performance difference between reading from a virtualized page that is in physical memory and one that is not. Sequential data is likely to be in memory while randomly accessed data typically is not.
[13:31:30] <remonvv> run mongostat and see if faults/sec rises when you switch to random reads compared to sequential ones.
[14:35:20] <zakg> i am using python version 2.7.2 and pymongo 2.1.1
[14:37:49] <zakg> objectid stays in bson not in pymongo
[15:01:40] <Bilge> If I have a LOT of transaction records should I store precalculated totals in another collection?
[15:02:04] <Bilge> Or must I always calculate aggregate values using queries
[15:03:04] <Bilge> I'm assuming there would be a significant performance difference that would make maintaining stored totals more efficient even though they could be vulnerable to going out of sync
[15:04:55] <PDani> I have as much page faults as queries, and on pagefaults, mongodb seems to read more data than needed (aggregated size of read documents is 10 times smaller than actual data read from disk)
[15:05:20] <PDani> is there some readahead, or whole-page-read in mongodb?
[15:05:47] <Derick> a pagefault will always read a whole page
[15:06:11] <PDani> and how can i set the pagesize to a smaller value?
[15:14:51] <remonvv> and yes, index misses is similar to faults/sec in that it tells you how often a b-tree node is accessed that isn't in physical memory
[15:15:08] <remonvv> unfortunately percentage isn't the most useful statistic ever but hey ;)
[15:15:08] <PDani> is there any readahead in mongodb when i do a query like { $query: { block_id: 685233 }, $orderby: { _id: -1 } } ntoreturn:1 nscanned:1 nreturned:1 reslen:4166 134ms?
[15:16:16] <PDani> because it seems that mongodb reads from disk much more data than the size of documents i query for
[15:16:16] <remonvv> if there are multiple matching document it'll prepare an initial resultset that the returned cursor can iterate over. If it needs more it will issue getMore commands
[15:16:18] <remonvv> well it reads at minimum 1 page
[15:17:26] <remonvv> limit(1) has nothing to do with it, that just tells mongo to only return 1 document. Is very speedy if you're not sorting but you are so it has to sort the resultset before it can determine what the first document is.
[15:17:28] <PDani> linsys, I calculate the size of read documents per sec on client side, and i see the iostat output
[15:17:35] <ranman> zakg: yeah -- it just isn't using the latest version of the python driver.
[15:18:03] <PDani> remonvv, but i have an index for this: blockid:1, _id:-1
[15:18:07] <remonvv> PDani, you're looking at the wrong things. If you have performance issues insight into how MongoDB's mmap engine works isn't going to fix much for you. It's not something you can affect
[15:18:22] <zakg> ranman,are there no other upgraded versions?
[15:18:57] <remonvv> Anyway, i'm off, good luck ;)
[15:19:03] <ranman> not sure -- I guess I could submit a pull request later today
[15:51:53] <addisonj> has anyone noticed that the mongodb distro site is unbearably slow of late?
[16:25:47] <cedrichurst> dumb question… if i wanted to compute the cross-product of keys in two collections, would i need to do that at the application layer
[16:26:02] <cedrichurst> for example if i had one collection containing salesperson and another containing fiscal quarters
[16:26:24] <cedrichurst> and i wanted to create a new collection with a compound key for every salesperson in every fiscal quarter
[16:26:58] <cedrichurst> is that something mongo can handle natively?
[16:58:07] <Bilge> If I have a LOT of transaction documents should I store precalculated totals in another collection or always calculate aggregate values every time?
[17:09:00] <nickswe> Annoying on Windows 7 when trying to run "mongod.exe" it says that I am missing folder /data/db/... but I am having "dbpath=C:\mongodb\data" in my mongod.cfg... why does it not understand this?
[17:45:19] <brynary> hello -- I'm experiencing extremely slow MongoDB reads under what I think is a relatively low write load (few hundred/s). in mongostat, locked % gets very high, faults is 0
[17:46:32] <brynary> I'm running fast hardware, plenty of RAM, 8 CPUs. 15k RPM SAS drives. I'm not sure what sort of perf I should expect. many simple inserts are taking 350ms+
[18:53:53] <Bilge> If I have a LOT of transaction documents should I store precalculated totals in another collection or always calculate aggregate values every time?
[19:17:54] <aster1sk> New to mongo, I'd love some feedback on whether this appears to be a reasonable aggregate object or if I'm totally doing it wrong.
[19:37:22] <federated_life> whats a good way to tell if a replica member was a step down, something about op counters ?
[19:40:13] <crudson> aster1sk: really depends on what problem you are trying to solve, what your input documents are etc.
[19:41:28] <aster1sk> crudson: Thanks. This model is what I want however I'm afraid of the ridiculous nesting.
[19:42:14] <aster1sk> For instance I couldn't (in one query) determine how many android device views from canada... I suppose the frontend could figure that out but I'm not sure the boss will buy it.
[19:43:21] <aster1sk> Also when upsert / incrementing the view counts will require two queries to determine if the indexed array exists.
[19:47:05] <crudson> aster1sk: That epends whether you want this updated in realtime. I wouldn't be afraid to have as many map reduce operations as suits your querying needs, which shoul be your primary concern.
[19:47:22] <crudson> (sorry my cat jumpe on my keyboar and broke my D most of the time)
[19:48:46] <crudson> If you have a specific aggregate question you can get advice here for sure (I have to run out now but there'll be plenty of experts around)
[19:49:04] <aster1sk> Hah, yeah close to realtime would be a plus. The aggregate documents are 'per day per issue'. Doesn't have to be up to the minute but I'm sure they'd be happier with near-realtime.
[19:49:34] <aster1sk> Excellent feedback crudson, much appreciated.
[21:06:56] <ninegrid> anyone have any experience with the haskell driver?
[21:33:07] <wereHamster> I once wrote a 10 line snap server which displays data from mongodb.
[21:33:37] <wereHamster> I literally have no idea how it works. But it works. Which was good enough for me :)
[21:47:31] <fdv> Hi guys. It seems to me that renameCollection is a privileged command, and needs to be run while using the admin db. Now, the command takes two parameters, the old name of the collection and the new name, but can anybody tell me how to specify the database?
[21:51:22] <fdv> when I try to add the db name like db.runCommand({renameCollection: "thedb.foo", to: "thedb.bar"}), I get an error stating "exception: source namespace does not exist"
[21:52:15] <crudson> fdv: renameCollection is for within a single db only
[21:52:54] <crudson> fdv: additionally, it can be run from any db: db.fromCol.renameCollection('toCol')
[21:53:14] <fdv> crudson: but when I 'use thedb' and then try to rename "foo" to "bar", I get another error, { "errmsg" : "access denied; use admin db", "ok" : 0 }
[21:53:40] <fdv> and the docs say "You must run this command against the admin database. and thus requires you to specify the complete namespace (i.e., database name and collection name.)"
[21:54:12] <fdv> but I haven't tried the other syntax..
[21:55:38] <fdv> there's obviously something I don't get here... :p
[21:56:52] <crudson> it's an "administrative helper", so even though it may get execute in that namespace, the translation is done for you. http://www.mongodb.org/display/DOCS/dbshell+Reference#dbshellReference-AdministrativeCommandHelpers
[21:58:13] <fdv> crudson: ok, that makes it a bit clearer. thanks!
[22:12:01] <rossdm> question: I have a MongoDB replica set using IP over Infiniband. RCP'ing a file between mongo servers averages 200MB/s transfer speed. Sticking a file in GridFS and waiting for replication to two replicates yields 140MB/s out (70MB/s in for each server). Any ideas why replication would be slower?
[22:53:40] <Bilge> If I have a lot of transaction documents should I store precalculated totals in another collection or always calculate aggregate values every time?
[23:02:24] <dstorrs> Bilge: it depends on your use cae
[23:02:44] <dstorrs> if you can calculate on the fly (in the browser), do so. that reduces your storage and CPU needs
[23:03:26] <dstorrs> if you can get away with just caching the latest data in (e.g.) memcached and doing the totals from there, fall back to there.
[23:03:34] <dstorrs> if you have to precalc and store in DB, do so.
[23:04:12] <dstorrs> but it's always best to distribute your processing onto the client if that doesn't compromise security and site performance. much cheaper than scaling your own hardware
[23:12:52] <Bilge> It's not heavy processing if you're constantly keeping totals updates
[23:13:03] <Bilge> It could even be just a simple +/-1 each time
[23:13:59] <Bilge> But it's conceivable that if you just maintain totals and records separately that they could somehow become out of sync, particularly if there are bugs in the code or transactions don't happen atomically