[02:57:44] <pasky_> Hi! Is there an easy way to make a newly inserted object have an _id that is a plain string instead of ObjectId? (Unfortunately, having it ObjectId seems to produce endless pain in conjunction with Meteor.)
[02:57:56] <pasky_> (I'm using pymongo to insert objects.)
[03:05:40] <pasky_> ok, adding '_id': str(bson.objectid.ObjectId()) to the inserted object seems to do the trick \o/
[07:53:12] <TraceRoute> hey all, I'm looking at using mongodb to add redundancy to my infrastructure. Could someone confirm this is possible. I have servers at different locations and a parent mongodb which acts as the master. The servers at different locations should only have data which relates to their location. Is there some way I can use replication to only pull down records that relate to the location and also push local changes back to the master?
[07:56:36] <sag> i think that u have to manage in schema
[07:57:17] <sag> i mean u have to define collections in that way..
[07:59:29] <TraceRoute> ideally thats how the root schema would start…a location has a bunch of objects inside. I only want the one server to have replication of 1 root object
[07:59:32] <sag> i dont think this can be possible with replica sets
[08:02:17] <shmoon> i do db.coll.find()... and also db.coll.find({..}) , assign it to a variable, i need to access _id now n the mongo console but i am failing
[08:02:36] <shmoon> basically i want to get the date and time for each collection from find() off their _id (since i think _id has timestamp)
[08:02:43] <TraceRoute> thx sag, I'll look into it more
[08:03:33] <sag> @TraceRoute : why u need primary mongo..as u already have some store..
[08:03:50] <sag> @TraceRoute and u just want redundancy
[08:13:24] <sag> shmoon: u need all ids from all coll? or all ids from some coll?
[08:14:45] <shmoon> sag: lets do 1 by 1. lets say i do this db.coll.find({table_id:5}).limit() now ow do I get the _id from it and maybe store ina variable in the console?
[08:18:33] <sag> shmoon: fond always returns a document
[08:21:20] <sag> shmoon: so u will always get whole document unless specified as, db.coll.find({table_id:5},{title:1}) , will only return id and title..
[08:59:49] <shmoon> or maybe i am missing somethign
[09:14:13] <untaken> Anyone familar with the CPAN modules for MongoDB? What is the most efficient way to setup paging? When I check the cursor after with has_next, it seems to have pulled all the rows from the collection. There must be a more efficient way, when I set limit and skip? Maybe it's just the perl module, but was hoping someone may know around here?
[09:15:36] <Derick> untaken: that doesn't seem related to just the perl driver
[09:17:12] <untaken> Derick: I know, but wasn't sure where to look next... really was after some pointers :)
[09:17:23] <untaken> bit of channel attack there :/
[09:17:36] <Derick> seems odn is a bit flakey at the moment
[09:20:00] <[AD]Turbo> Sorry, I must post my question again (split sucks)
[09:20:03] <[AD]Turbo> I have a collection (Items) with a 2dsphere index on a 'pos' field (db.Items.getIndexes() returns me "key" : { "pos" : "2dsphere" }, "name" : "pos_2dsphere") but when I query that collection db.Items.find({ 'pos': { $geoWithin: { $box: [ [0, 0], [50, 50] ] } } }).explain() I see that a "BasicCursor" is chosen instead of the index. Is there a reason?
[09:20:42] <Derick> [AD]Turbo: geowithin with box wants a 2d index
[09:21:07] <Derick> "$box" is a flatland feature, not a spherical one
[09:21:16] <[AD]Turbo> ah, but I have need to query that collection with $nearSphere too
[09:21:45] <Derick> [AD]Turbo: try geowithin + geojson where you construct the polygon yourself
[09:22:34] <[AD]Turbo> does the "2d" index support $nearSphere and geoWithin+box atthe same time?
[09:22:55] <[AD]Turbo> if so, i can use a 2d (not a 2dspere)
[09:23:19] <perplexa> http://pastebin.com/2rrzaXrR < can anybody please explain why i get this error? :(
[09:54:13] <moogway> hi, i am trying to use mongodb with python and wondering about ORMs... I prefer to use pymongo over anything else but it doesn't support declaring the model and I'm not sure how to tell mongo to create/ensure an index on a collection using my app. Do I have to create the db and collections manually using mongo shell?
[09:57:21] <shmoon> is this a planned feature, ownder if you know or not
[09:57:56] <moogway> I am using db.test.create_index([("FieldName", pymongo.DESCENDING)]) but isn't that going to create an index every time an instance of the app is called?
[09:58:16] <Derick> moogway: it's going to try - is there an "ensure_index" perhaps?
[10:00:01] <Zelest> Derick, like, I want to store my sessions in mongodb and use a ttl collection for it.. but I don't want to reindex it every page hit :P
[10:00:30] <moogway> thanks Derick, I like mongoengine because it is declarative but prefer to use pymongo for the sake of avoiding third party libraries
[10:00:35] <Derick> Zelest: sorry, ttl is on a collection, isn't it?
[10:01:56] <Derick> Use the expireAfterSeconds option to the ensureIndex method in conjunction with a TTL value in seconds to create an expiring collection
[10:03:34] <Zelest> yet I'm speeded like a freak :D
[10:04:16] <[AD]Turbo> how to remove the limit of 100 documents returned by the $nearSphere, is it possible? my nearSphere queries all have a $maxDistance parameter but Im'm not ensured that results would be less than 100 docs
[10:05:04] <Derick> [AD]Turbo: you can add $limit: 500 f.e.
[10:06:38] <[AD]Turbo> from official documentation only geonear command has an additional 'limit' / 'num' parameter support, anyway I don't see the reason of such limitation (usually .find returns all documents)
[10:07:44] <Derick> let me shut up and try it first
[10:10:20] <[AD]Turbo> I can't really understand, from a developer point of view, introduce such 'limit' limitation for geo queries (and not present for standard non-geo queries)
[10:11:17] <Derick> [AD]Turbo: because it's not a fast operation to have it unbound
[10:12:12] <[AD]Turbo> the same approach for standard queries could not be proposed for such geo-queries? memory pools?
[10:13:06] <[AD]Turbo> if I set limit to, for example, 100000000, does the mongodb allocates some memory of such amount or so?
[10:47:15] <sag> kali: as per release note "he MongoDB can perform a more efficient sort that does not require keeping the entire result set in memory"
[11:51:58] <perplexa> rofl, turn on music, chill with headphones on, suddenly everybody starts looking into my direction, realise speakers are at max volume and headphones unplugged :D
[11:54:09] <borior> hi all, so I'm having trouble configuring a replicaset when the nodes are portmapped (NATed) behind another IP address
[11:54:45] <borior> specifically, in my argument to rs.initiate, I'm specifying the hosts with their "external" IP addresses: 192.168.X.X
[11:54:53] <huahax> ey, someone want to help me with some basic queries in nested arrays? :)
[11:55:42] <borior> then the node that's running initiate runs getMyAddrs(), which returns a list that doesn't include that address, and the replSet initialization bombs out with: "couldn't initiate : can't find self in the replset config"
[11:56:17] <borior> is there any way around this? why on earth does mongo need to know the IP on which it is exposed the rest of the world?
[12:15:53] <perplexa> kali: so, when i have db.auctions.aggregate( { $match : { bId: 893634 } } ) which yields a result >16MB, will it produce an error?
[12:17:06] <kali> perplexa: well, all tools have an application domain
[12:37:12] <borior> so if my interpretation of this is correct, it's basically impossible to run a replicaset in which members refer to one another using NATed IP addresses...
[13:17:29] <dahankzter> Is it possible to have a more complex GeoJSON structure, say each point has time data associated
[13:17:57] <dahankzter> or is it locked to the Type/coordarray structure?
[13:59:45] <bdiu> Anyone interested in a full time gig in Bloomington, In w/ paid relocation? :-)
[14:10:32] <dahankzter> Is it possible to have a more complex GeoJSON structure, say each point has time data associated
[14:11:01] <dahankzter> Noone knows? The online GeoJSON parsers out there give ambigous info
[14:37:03] <richthegeek> I have an array of values [1,2], and I want to find all rows which contain any of those values in their own array... so a row might be {_id: 42, vals: [2, 3]}
[14:37:17] <richthegeek> is {vals: {$in: [1,2]}} the right query?
[14:40:34] <rspijker> richthegeek: from the documentation: If the field holds an array, then the $in operator selects the documents whose field holds an array that contains at least one element that matches a value in the specified array (e.g. <value1>, <value2>, etc.)
[14:41:07] <richthegeek> rspijker: yeah, it's working now - I was using $not instead of $ne !
[15:45:17] <zymogens> Am a complete noob to mongodb. Have a quick question. I have a few object arrays inside a mongodb doc. It seems that mongodb assigns an object_id to all of them. I realise every doc needs an object_id, but does every object inside a doc also need one? Thanks.
[15:45:20] <rodrigofelix> ok, so all replicas have the records, right?
[15:46:46] <rodrigofelix> is this configurable? could I say that I want to have, for instance, only two replicas storing a specific record, even if my cluster has 10 nodes?
[15:47:45] <rodrigofelix> I'm trying to have a similar env to compare mongodb with cassandra
[15:48:06] <rspijker> rodrigofelix: afaik you can't. But I am not sure
[15:48:43] <rodrigofelix> ok. I'll try to figure out if I can change cassandra to work like mongo, replicating all data in all nodes
[15:52:32] <rspijker> zymogens: you shouldn't have to do anything… If you insert a document into mongo and you don't include an _id, mongo will create it for you. For embedded documents this is simply not the case...
[15:52:45] <nickmbailey> rodrigofelix: because it isn't mongo
[15:54:31] <rodrigofelix> :) .. ok, but what is the aspect of cassandra architecture (or strategy) that you are considering to say that cassandra does not perform well when all nodes have that same data?
[15:54:34] <zymogens> rspijker: thanks. there seems to have been an object_id for every object I have in each document. Not sure how they got there… Will look into it some more.
[15:55:17] <rodrigofelix> there are some points that I need to align when benchmarking cassandra and mongodb and I think that replication factor is one of them
[15:55:23] <nickmbailey> well it would probably do fine if you only have 3 nodes (3 is a very common replication factor) but if you are trying to apply that generally (with N nodes) then you are missing the point of cassandra
[15:56:23] <rodrigofelix> I think about varying from 3 to 5 nodes in my experiments, both on cassandra and mongodb
[15:56:36] <rspijker> zymogens: you can easily test it. db.Collection.insert({}) will create a document with an _id. db.Collection.insert({"c":{}}) will create a document with an _id and an empty subdocument "c" inside.
[15:56:44] <rodrigofelix> I'll try to run both with their default config and gather some results
[15:57:10] <rspijker> why do you want such a large number of replicated nodes if you don't mind me asking?
[15:59:15] <zymogens> rspijker: Is that a question for me?
[15:59:47] <rodrigofelix> well, I could have less replicated nodes. my main concern is trying to have a fair comparison among cassandra and mongodb, trying to have similar config although I know this it not that easy, since they have many different strategies
[16:00:33] <rodrigofelix> my idea was changing cassandra, because I can't see (for now) how to change replication factor of mongodb
[16:01:37] <rspijker> if you want to compare them, then you shouldn't look at having the exact same amount of nodes and replication in both cases (imo). You should look at a similar setup in terms of failover...
[16:01:42] <rodrigofelix> but I believe the best I can do is to benchmark with default configs of both and then compare how elastic they are according to some metrics I'm defining
[16:01:59] <zymogens> rspijker: Just tried it out there… Seems to not need an object_id… Thanks
[16:02:01] <rspijker> as in, if you have 9 cassandra nodes with a replication factor of 3 then you could have 3 mongo shards with 3 replica set members
[16:02:50] <rspijker> anyway, I got to go. Good luck :)
[16:03:10] <rodrigofelix> I understood your point. I'm gonna think about it
[16:41:29] <huahax> anyone have experience with nested arrays?
[16:55:39] <zymogens> Hi, Am using Mongoose. Object_IDs are being automatically generated for nested objects in a doc before I do a save() … Any idea how I'd prevent them being generated?
[16:58:12] <huahax> anyone have experience with nested arrays?
[17:06:18] <huahax> is it possible to push to an array in another array..?
[17:08:09] <mmlac-bv> Can a compound index index a -> b -> c -> d replace having another index a -> b -> d?
[17:09:52] <mmlac-bv> And does the order matter to a query? Or is a index a -> b -> c as good as c -> b -> a if the query is i.e. find id: a where b=1 and order by c?
[17:32:54] <huahax> would u say it is practically impossible at the moment to make a nested array dynamic?
[17:32:59] <kali> mmlac-bv: if you want the index to be efficient, you need mongodb to be able to anwser the query without making too many random accesses
[17:33:17] <mmlac-bv> well the issue right now is the indexes are so massive they won't even fit into memory
[17:52:03] <Dark_Sun> "errmsg" : "still syncing, not yet to minValid optime 51dc40a8:49b"
[17:52:16] <Dark_Sun> what's the exact meaning of this ?
[18:12:31] <TheDeveloper___> why does update() 'ing $inc counters on a single node work, but as soon as I switch on sharding it drops most of the writes
[19:34:16] <AndrewD> Hey, I'm trying to use pymongo to create unique IDs for some dictionaries, but when I call " bson.objectid.ObjectId " it always returns the same value. Any ideas?
[20:04:28] <oogabubchub> Anyone know which performs better: compound indeces or embedded doc indeces? It would be on ID's stored in an array vs. an embedded doc for each ID. Typical circumstance is 1 or 2 ID's over millions of docs
[20:05:43] <solars> hi there, the compact docs say: but unlike repairDatabase it does not free space on the file system. - but other sites say that it compacts the db and frees disk space - whats true now?