[02:11:47] <ferrouswheel> sirpengi, it repeatedly wouldn't work... I tried about a dozen times, and had about 30 prior error emails in my inbox before hand. started working again immediately after restart)
[02:12:28] <ferrouswheel> but it does seem pretty weird, I would've expected it to reconnect without issue. will see if it happens again.
[02:49:12] <dstorrs> anyone know what this means (approximate, as I can't cut/paste it and console just died, so it's from memory)? "DB 0 130004 keys volatile in HT something-or-other"
[02:49:35] <dstorrs> my server just got nuked by something. that message was being spewed to the console over and over.
[02:49:52] <dstorrs> only thing I could do was restart it
[02:50:38] <dstorrs> it looks like it's a Mongo thing, but I can't find anything on Google about it.
[03:00:32] <jstout24> how bad is it to have ~20 queries on a page? (all indexed queries, 95% on _id)
[03:01:05] <jstout24> (of course i'm going to start doing result caching, but if i can go live with something now, i'd rather do that then work on result caching later)
[03:01:17] <jstout24> basically in one request of my app
[03:14:51] <dstorrs> hey all. I've got mongod v2.0.3. The machine just crashed and got restarted (as mentioned above). Now when I start Mongo, I see "listDatabases failed: ... "assertion" : "can't map file memory", ..."
[03:15:26] <dstorrs> what is this telling me? how do I proceed?
[07:30:40] <jondot> NodeX: im looking at that, thanks
[07:31:08] <NodeX> it depends on how you are using it, as with most things in mongo it's on a per use case
[07:31:14] <guest232131> only fools store trees inside a rdbms or a nosql DB
[07:31:51] <jondot> guest232131: graph databases have a class of scale problems of their own.
[07:32:09] <guest232131> trees inside a standard db as well
[07:32:22] <jondot> guest232131: per use case it can be very efficient, actually.
[07:32:34] <jondot> querying for subtrees is very efficient and easy
[07:33:05] <NodeX> by use case I mean your data is different to other peoples and you query it differently, what works for me might not work for you and VV
[07:33:24] <jondot> if you need to walk arbitrary relationships where each step yields thousands of nodes - use graph db
[07:34:57] <jondot> NodeX: well the use case is simple. given a node N, i want to be able to fetch all its parents up to level X
[07:35:26] <NodeX> is that the -only- or -most- way you will query your data though ?
[09:16:22] <robx> Hello! I Have a quick question. I have a document structure where each document has a sub document like: { field : value, subdocument : { field : value } }. I need to do a query where one of the fields in the subdocument is unique. Any suggestions? Or is the only way to do a distinct first, save those results in memory then do findOne foreach of those results??
[09:28:29] <robx> These results are sent to a front-end and we dont want duplicates of each user
[09:29:15] <robx> Say we have 100k results, and 40k users, so some users will exists many times, what we want is to get those out but make sure they are unique
[09:31:04] <robx> No problem I appreciate you taking your time :)
[09:32:23] <_johnny> ok. i'm not sure i undertand your query. you select all the objects with different titles, and pull the users from that? you don't have a userdb in another collection or?
[09:33:04] <robx> Not exactly, lets make this a real world example sec :P
[09:33:48] <robx> Say we have blog posts, and each of those blog posts has a subdocument of the user posting it. We have 20k blog posts, among 100 users. Then we have a "list all bloggers" page, where we want to get the data from the blog posts
[09:34:00] <robx> (since we don't have any other data to go by)
[09:34:33] <_johnny> so, no user lookup/collection?
[09:37:04] <robx> This was just an example :) But the authentication could've been done via form posting over HTTPS and using crypto on the recieving end
[09:37:30] <kandie> guys, I'm really over my head with this I would really appreciate any help - I'm using the C++ driver v2.1.0, and no matter what I do, I get assertions (not exceptions) whenever I try to look up a field that might not exist on a _valid_ BSONObject object
[09:37:48] <_johnny> well, if you don't have the user data anywhere else, i think you'll have to pull all, and do a client (or whichever retrieves the query) unique method.
[09:38:09] <kandie> what's funny is that it works fine so long as the fields are actually defined, but it asserts otherwise (and I can't even use obj.hasField() because that breaks all the same)
[09:38:09] <_johnny> robx: right, but it's "anynomous" then? i mean, my user/pw is not checked anywhere
[09:38:29] <robx> You can save it crypted in memory
[09:39:58] <robx> kandie: are you running mongo under virtualbox?
[09:40:24] <_johnny> hehe, sorry, i really don't understand how you can have users without a user db. i can log on right now with "herp/derp" and write blog posts? if your userdb is in memory, it would be cleared and you'd loose all those 40k users you have
[09:41:28] <kandie> robx: not really, I'm using arch linux x86_64, and this is happening on our live servers which run ubuntu
[09:42:49] <robx> __johnny you could cache it in an encrypted flatfile :)
[09:43:18] <_johnny> heh, true. but you can't use it for lookup?
[09:43:52] <robx> __johnny indeed you can, you read the cache into memory (decrypt it) and validate using that
[09:43:58] <kandie> robx: hey this worked! running BSONObject::getOwned() on the object i get from the cursor made it tick.. but that's strange because the documentation clearly states that BSONObject uses a reference based smart ptrs, why should i need to do this?
[09:44:29] <_johnny> robx: ok, then what stops you from listing the users from that, rather than collecting from your blog posts and running a uniq() based on user id ?
[09:44:59] <robx> __johnny I don't have any listings of user in any file - as I said this was just an example :]
[09:45:36] <_johnny> still, i'd imagine if you put an index on posts.user you should just do a distinct query for posts.user. it should do what you want
[09:45:59] <robx> kandie: I'm not really sure - I would wait a couple of hours and then ask in this room again :)
[09:46:06] <_johnny> where posts { title: "test", user: { user_id: .., username:.. } } you mentioned before
[09:46:26] <robx> _johnny the index wouldn't do anything tho
[09:46:48] <_johnny> what do you mean wouldn't do anything? it would let you query posts.user
[09:48:24] <robx> Okay, lets say this, I've taken out all the user ids, stored em in an array, [100, 101, 102, 103, ...] Now I need to get all users with this, but only want to get each result once.
[09:48:39] <robx> Then I would have to iterate over the array and do findOne (20 times)
[09:48:47] <_johnny> right. i'm setting up a test collection right now to check
[10:05:31] <robx> Then I still need to save those results in memory and then use those to query mongo. Since I need limit, offset and queries
[10:06:00] <NodeX> what are you trying to achieve - I missed the first part
[10:06:16] <robx> Hello! I Have a quick question. I have a document structure where each document has a sub document like: { field : value, subdocument : { field : value } }. I need to do a query where one of the fields in the subdocument is unique. Any suggestions? Or is the only way to do a distinct first, save those results in memory then do findOne foreach of those results??
[10:06:22] <robx> Say we have blog posts, and each of those blog posts has a subdocument of the user posting it. We have 20k blog posts, among 100 users. Then we have a "list all bloggers" page, where we want to get the data from the blog posts
[10:06:23] <robx> <robx> (since we don't have any other data to go by)
[10:07:40] <NodeX> are they always unique (the subdocuments0
[10:08:54] <NodeX> right I see ... but how come you cant query the "users" collection to get this list
[10:09:06] <NodeX> seems very expensive to distinct just for thta
[10:18:44] <_johnny> is that not the unique user objects? :P
[10:18:44] <robx> __johnny if you need to list all those users, including their firstnames lastnames etc. What would the ID's give you?
[10:19:32] <NodeX> and here lay the problem of thinking of Mongo and data storage in terms of relational
[10:19:43] <_johnny> well, that's what i said earlier. just put the key you want. i took user.user_id, but you can just put user if you want the entire object
[10:20:11] <_johnny> robx: the id doesn't give me anything. that was just an example of what you can represent it as. look at my pastebin above
[10:23:39] <_johnny> NodeX: i just tried it, but i can't query on that
[10:24:03] <NodeX> you must have the syntax wrong - that is how I distinct things
[10:24:29] <robx> The only alternative seems to be savign the entire distinct result in memory and then use that to serve the results to the client.
[10:24:45] <NodeX> http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct <--- distinct may also reference a nested key
[10:25:00] <_johnny> NodeX: i just made the example for distincting: db.runCommand({ distinct: 'posts', key: 'user'}).values <- that's what he wants to query lastname: 'Smith' on
[10:25:15] <robx> johnny now you understand my problem! :D
[10:25:23] <NodeX> robx : if it';s just a list of users why are you even querying the blogs collection ?
[10:31:00] <robx> I have an application, people can upload DOCUMENTS, they have registered through a php site. They have a user ID. The document upload server is a node app. Node recieves the user ID with all upload requests. Each user can upload 10 documents, thus meaning the user exists in 10 mongo objects. Now, I want to list all the users who use my application.
[10:31:23] <robx> But I only want to list them once, and I want people to be able to filter the users as well.
[10:31:31] <NodeX> do you want to know the easiest way to do it?
[10:32:31] <NodeX> second piece of advice ... if you want to count the number of documents each without looping all the time - save a count with each user
[10:32:43] <NodeX> $inc : 1 on upload, $inc : -1 on delete
[11:21:59] <augustl> we totally need "dbrotate", that gzips old data.
[11:22:18] <NinjaPenguin> does anyone have an ETA on the next update of the php driver?
[11:22:20] <unsleep> so then mongo is not a scalable distributed document database then.. isnt it? :)
[11:22:43] <augustl> unsleep: I'd say the ability to add more disks to store more data is "scalable"
[11:23:40] <unsleep> but if you add more nodes too shard the same content without addind more space then.. its not scalable.. isnt it?
[11:24:25] <augustl> I wouldn't say that a system that isn't able to store more data than what is available on the disks "isn't scalable"
[11:25:01] <augustl> "scalable" applies to the entire stack, not just the mechanics of the software
[11:25:30] <unsleep> i think its scalable in resources but not space :-?
[11:25:53] <augustl> unsleep: sounds like you're only talking about CPU scaling
[11:26:12] <augustl> so it does seem like in some cases, scaling for better CPU utilization also requires adding more storage
[11:26:29] <wereHamster> unsleep: adding more nodes == adding more space
[11:27:27] <augustl> unsleep: so I think what you're saying is that some times, scaling for more distributed CPU usage also requires adding more disk
[11:27:38] <augustl> I disagree that this means "isn't scalable" though
[11:27:40] <unsleep> so if a add a new node i dont need to do nothing special to store a bigger collection than the first node size?
[11:28:00] <unsleep> im confused about the terms, sorry
[11:28:47] <wereHamster> unsleep: well, depends on your mongodb cluster. If you are using replica sets then all nodes contain the same data (and so the db is restricted by the size of the smallest node). But if you shard, then mongodb distributes the data across all available nodes.
[11:29:21] <augustl> no idea how mongo handles a 200mb large document if you only have db shards that are 100mb in size. Don't think that's a possible scenario though. Disclaimer: I haven't used mongo yet ;)
[11:29:57] <wereHamster> augustl: the size limit on a single document is 16M
[11:31:01] <augustl> and the minimum size of a db is 200mb iirc? So it seems like it is indeed an impossible scenario
[11:44:20] <Derick> that makes little sense actually
[11:44:50] <NodeX> NinjaPenguin : a possible work around is to have your FPM restart after a lower number of requests
[11:44:52] <NinjaPenguin> could there be some issue with cycling the connections in FPM perhaps?
[11:45:13] <NodeX> it's the max_requests paramter iirc
[11:45:18] <Derick> NinjaPenguin: what happens if you set http://uk.php.net/manual/en/mongo.configuration.php#ini.mongo.is-master-interval and http://uk.php.net/manual/en/mongo.configuration.php#ini.mongo.ping-interval to 0
[11:49:21] <Sim0n> Hiya. Im doing maintanance on a 3 node-replica set. I have brought one of the nodes down for reasons not related to my question. Question is: Can I install mongodb 2.2 on the down node, add it to the replica set without complications?
[12:15:15] <FerchoDB> do anyone use the C# mongo driver? I'm trying to do queries and return results as "dynamic". Is that possibe? or do I must use a mapped class?
[12:19:44] <Gabrys> I'm playing with mongodb sharding
[12:19:58] <FerchoDB> it seems that there was a "findOne" method that returned dynamic, but now there is only findOneAs
[12:20:39] <Gabrys> I'm pushing milions of documents to mongos using mongoimport command and it seems it distributes the data to one shard at time, then switch to the second
[12:21:35] <Gabrys> the performance is 10,000 inserts per second (and if pushing data to mongod directly it's 15,000/s)
[12:21:57] <Gabrys> I though sharding should increase insert throughput, not limit it
[12:22:08] <Gabrys> and that shards would be equally loaded during massive import
[12:24:47] <NodeX> I would think by reading that document that an _id shard key would fill up shard 1, then move to shard 2 and so on
[12:26:41] <FerchoDB> correction again: in the official doc, it mentions findOne() as a method although it is not available on the last version of the library
[12:27:51] <Gabrys> NodeX, I tried more "natural" shard key before, but results were similar, I'll try again
[12:29:07] <Gabrys> NodeX, is it possible (and makes sense) to shard by ObjectId hash?
[12:30:00] <NodeX> you're probably better waiting for someone who knows more about sharding than me, certainly write scaling anyway
[12:36:18] <Derick> Gabrys: depends on why you shard
[12:51:38] <Gabrys> ok, I'll try with the countryId, userId, categoryId sharding key
[12:52:00] <NodeX> are you indexing on import aswelll ?
[12:52:09] <Derick> phira: only in the case of an exact full row... you can however add a timestaminto the hash to fix that
[12:53:13] <phira> this is ProbablyTrue(™), in that a cryptographically secure hash should generate results that are roughly indistinguishable from a random output in that case. Non-crypto hashes can give you unbalanced results in that case.
[12:55:30] <Derick> http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633 is a great post on this btw
[13:08:57] <_ollie> hi all, anyone aware of details of the $and operator?
[13:09:55] <_ollie> different documentation pages imply different semantics unfortunately…
[13:11:00] <_ollie> the first argues with multiple constraints on a single key where the latter implies you can also use it for different keys… esp. for the latter I wonder what the difference compared to a plain { foo : "bar", bar : "foo" } is…
[13:37:03] <FerchoLP> has anyone run mongodb on Damn Small Linux?
[14:08:05] <souza> Someone have manipulated dates in C, and mongoDB? i saw in api that exists one type def "bson_date_t", but i didn't find any more documentation about, them if you know some link that talks about, or know about my question, i'll be very happy, and solve my problem! :)
[14:09:24] <kali> souza: as far as i remember, it's a int64, number of millisec since epoch...
[14:10:17] <souza> Humm, and i can set this value using a simple attribuition? like "bson_date_t date = 56468418465;"
[14:11:03] <kali> souza: http://api.mongodb.org/c/current/api/bson_8h_source.html look at line 151
[14:13:00] <souza> kali: humm, well it looks good, but i'm a newbie in C, do you know how can i set this value?
[15:21:42] <guest232131> what "would" be is of zero interested, of interest is what is actually imlemented
[17:01:04] <augustl> seeing some weird behaviour with the mongodb driver for Node.js. A call to db.open() causes the whole process to terminate, after "undefined" is logged
[17:12:37] <anlek> Hello everyone. I'm trying to export a few documents from a collection in one DB into a second DB. Any ideas on what the best way to do this is?
[19:21:32] <dgottlieb> giessel: 16MB is definitely enforced on the server side. The constant can be configured at the compilation stage.
[19:23:07] <giessel> dgottlieb: awesome. ya i'm looking at the pymongo driver and it has a constant- but as written it is still at 4mb, which is obviously old
[19:23:09] <dgottlieb> giessel: as for drivers, I believe the standard is supposed to be that the driver sends a maxBsonSize query to a server after connecting, but I believe not all drivers are written to that spec
[19:23:38] <giessel> ya- there is a class property which i've changed, but i think it is over written at some point in the connection init
[19:23:56] <giessel> it def says 16mb on the error, even tho the constant is defined at 4mb
[19:24:00] <giessel> i will look at the config flag
[19:24:17] <giessel> we're working with large scientific data sets
[19:25:59] <giessel> i've written a workaround that checks for size limits and puts things in a gridfs instead, swapping out the entries in the doc for gridfs refs
[19:26:04] <giessel> but this is fragile and annoying, too
[19:27:29] <dgottlieb> i believe* you can try recompiling mongod with a bigger bson size and it sounds like the pymongo driver would pick up on the change
[19:27:54] <dgottlieb> if everything is coded correctly, there should be no weird dependencies :)
[19:28:01] <giessel> that's exactly what i'm going to try. HAH let's hope, huh?
[19:28:43] <giessel> i'll bbl. thanks for your help!
[21:19:53] <dstorrs> next question -- I've got queries in the currentOp output that say "query not recording (too large)". What does this mean in practical terms? presumably I've hit an unindexed query, or something...?
[21:21:10] <dstorrs> according to the bug in JIRA, there's no way to know which query this is "unless it happens to appear in the log". What should I be looking for?
[21:25:18] <dstorrs> Derick: I've got a pair of map/reduce jobs running over a very large collection. they are taking way too long. If I kill the server, am I putting my data at risk?
[21:25:43] <dstorrs> I have journalling enabled, so I THINK I'm safe, but I'd like to check
[21:33:36] <locojay1> hi anyone running mongo-hadoop-streaming on cdh3u3 or cdh3u4? keeep on getting java.io.IOException: Cannot run program "mapper.py": error=2, No such file or directory
[21:33:57] <locojay1> regular python streaming works fine
[21:35:30] <dstorrs> stupid question -- is that a literal $cmd ?
[21:35:38] <halcyon918> hey folks, I've been digging around the mongo docs and not seeing an answer… I'd like to see if I can connect to a remote replica set from the CLI to test it out (our DBAs set us up a replica set and I'd like to test it before I code into it). do I need a particular CLI command to do that?
[21:53:17] <Derick> https://jira.mongodb.org/browse/DOCS but I think it's 10gen private
[21:55:38] <kchodorow_> it would be cool if we had some shell tool packaging system to "install" js files that weren't built-in
[21:56:44] <tychoish> Derick: the DOCS project is public
[21:57:01] <tychoish> and shell improvements should be filed in SERVER
[22:08:56] <ukd1> Hey guys, I'm doing some ruby and have been using the mongo driver directly, I'd like to use either mongoid or mongomapper - is there any clear reason to use one over the other?
[22:50:48] <syphon7> does anyone know if FindAndModify will create a document if it doesn't exist?
[22:53:48] <giessel> dgottlieb: hello again- finally back at a computer. do you have a reference for setting the max bson document size at compile time?
[22:54:01] <giessel> dgottlieb: i've looked in the Sconstruct file and didn't see any options...
[22:58:33] <dgottlieb> giessel: Looking into things a bit deeper, I believe I was mistaken when I said it was "configurable", but it seems all that's necessary is changing the constant in the source here: https://github.com/mongodb/mongo/blob/master/src/mongo/bson/util/builder.h
[22:59:59] <dgottlieb> another bit of "evidence": https://github.com/mongodb/mongo/commit/b357c3ea89ef9374dd775c326f75d404bebe7f68
[23:00:54] <giessel> dgottlieb: thank you for finding that for me, that's really super kind
[23:01:28] <giessel> let it be known that today, someone got EXACTLY the answer they were looking for when asking on irc ;)
[23:02:40] <dgottlieb> No worries. the max bson size I've heard about being change-able and all that at compile time but I never had a need to try it myself. I'm curious to hear if everything works as expected or if maybe some other limitations pop up.
[23:05:07] <dgottlieb> syphon7: I'm pretty sure it does not, but you can pass in an upsert: true option
[23:06:45] <augustl> for the record, emptying all collections between each test is much faster than deleting the db between each test ;) </randomobservation>
[23:08:57] <dgottlieb> I doubt it would be hard to add to add as a compile option, I think it may just be more of an issue where scary things may happen if say, a replica set or sharded cluster has mongod's with mixed max bson sizes
[23:10:18] <dgottlieb> augustl: dropping a DB actually deletes files on the filesystem so the speed of that depends on which filesystem is being used. Dropping a collection does not reclaim any space on disk, but does have the advantage of being fast regardless of filesystem choice
[23:11:06] <augustl> dgottlieb: I noticed, it created 3 files for each test, one 16mb one 64mb and one 128mb, that will take a little bit of time on most systems I guess :)
[23:11:34] <augustl> it seems to write zeroes to the entire file too, so it's not a sparse file
[23:14:30] <dgottlieb> augustl: yes all new database files get pre-filled with 0s so (once again depending on the filesystem choice) it can take some time for creation
[23:37:23] <giessel> dgottlieb: ya i don't even know how to think about that
[23:38:04] <giessel> dgottlieb: i'm running into some compile issues because i think the large size i changed it to makes some comparisons/assertions fail due to int type issues
[23:43:52] <giessel> dgottlieb: or maybe i just need to change "UncommittedBytesLimit" as well
[23:57:19] <landongn> anyone here help me understand why i've got 240+ active cursors open at all times? it's only one primary on one replica set that just spikes and just sits there.
[23:57:44] <landongn> i can't seem to find any docs on it- and the db.currentOp() detail isn't all that helpful- there's nowhere near 240 active processes