[01:45:11] <Antiarc> I have an array "values" which contains arrays which are pairs of [timestamp, value]. I'm unwinding values, then I want to sort the unwound set on values[0][0], but $sort: {"v.0.0": 1} doesn't seem to do anything useful
[01:45:37] <Antiarc> So I thought maybe I could $project the fields into date and value fields, but I can't project $value.0 either - it ends up projecting an empty array
[01:47:34] <Antiarc> That is, after unwinding, I end up with something like this: https://gist.github.com/cheald/8c3408b2d609424b9125 (though it's several thousand documents)
[01:48:12] <Antiarc> I want to sort on v[0][0] there, but I can neither project v[0][0] to another field, or $sort on it.
[01:51:05] <Antiarc> https://gist.github.com/cheald/8c3408b2d609424b9125 -- that's effectively what I'm seeing. The first bit there is the result of just the unwind, then the second bit is what I get once I attempt to project from the array.
[01:51:31] <Antiarc> I could switch to using subdocuments here, but since it's always a 2-member array, the array seemed like it'd be more efficient. No real need for field names here.
[01:57:10] <dangayle> Does anyone know the #irc channel for the M101P MongoDB course that starts today?
[02:07:38] <joannac> I didn't know we had irc channels for courses
[02:11:40] <richo> I've got some collections I want to convert to capped. I don't really care about data loss. If I replicate into a slave, then try to cap it on the slave, is that going to upset the master?
[02:12:10] <Antiarc> I may be wrong but I don't think mongo's going to let a slave modify a collection
[02:13:48] <richo> Well, probably the crux of my question is "will a master let a slave upset it?"
[02:13:56] <richo> because if that's fine I can fiddle with my slave with wild impunity
[02:15:42] <joannac> richo: offtopic but do I know you from another network?
[02:16:45] <richo> joannac: halp me unsplode my mongo deployment :D
[02:21:26] <Antiarc> richo: I don't think you're going to be able to do that from a slave - mongo will just refuse to execute it on a slave. Maybe just mapreduce your origin data into your target collection?
[02:22:00] <richo> Antiarc: that sounds massively unpleasant. plus the whole issue is that the master is basically out of space
[02:22:07] <richo> I can migrate to the slave and masterfy it now
[02:22:29] <Antiarc> Ah, yeah. If the concern isn't the write lock then I'd just rs.stepDown the master and convert it on your slave
[02:22:30] <richo> which would give me the required breathing room, but capping a collection without write locking the db doens't seem like it should have that many operational constraints
[02:23:59] <Antiarc> If fragmentation is significant you could regain some breathing room on the master by stepping it down and then just stop mongo, blow away the data dir, start it back up, and let it resync from the replica set
[02:24:12] <Antiarc> Basically performing a repair except without the need to have the extra disk space.
[02:24:57] <joannac> Antiarc: that means putting replication load on your old secondary / new primary
[02:25:17] <richo> that also hinges on us already having a replset
[02:25:23] <Antiarc> It does. But if you're just flat out of disk space on your old primary, you can't repair/compact
[02:25:28] <joannac> richo: hope you're prepared for the initial massive delete thoguh
[02:25:38] <richo> is the behaviour of trying to convert from master-slave to a replset even defined?
[02:27:52] <Antiarc> replsets are a superset of master-slave - it's basically the recommended way to do master-slave setups now. You aren't going to get away from the write lock easily, though - that tends to be a big problem for this kind of thing, in my experience
[02:27:54] <richo> but but the whole "global write lock" thing is a problem
[02:28:08] <richo> well, they're not strictly a superset
[02:28:15] <richo> unless you create nothing but priority 0 readers I guess
[02:28:18] <joannac> you could probably do that when the secondary is in standalone mode, and then when you put it back it'll keep replicating, but I'm not 100% on that
[02:29:02] <richo> I might just see if our dataset is append only on the other tables and just replicate the missing data by hand after I lock the slave for a while
[02:29:17] <richo> can I replicate only some collections in master-slave?
[02:29:43] <richo> I saw the only config option but the docs don't really mention how/if it works with several db's or collections
[02:30:08] <richo> because that'd be easiest, I'll cap the collections I don't care about, then replicate again to pull in the new data in the other collections, then just promote and blwow away the old instance
[02:34:00] <Antiarc> You might want to spin up a miniature deployment to test behavior when you have a replication scenario with a collection capped on one node and not on the other
[02:34:34] <richo> yeah, I've been trying to avoid that if I can but it looks like that's what I'm doing
[02:35:06] <Antiarc> Something else to consider is that based on how long your secondary will be in standalone mode, make sure your primary's oplog size is large enough to hold all transactions during the time it's offline
[02:35:19] <Antiarc> else you'll have to do a full replication from master, which would undo your capping work
[02:36:46] <Antiarc> If this is just trying to get some breathing room, you could run a manual delete to drop any documents that'd be dropped by the capping anyhow. Quick fix to get you some space to work.
[02:37:37] <Antiarc> I suspect that may make the capping faster, too, but I'm not 100% on the internals of how capped conversions happen when you have more data than the collection will allow
[02:42:47] <richo> the docs suggest that to reclaim space from delorted documents you need to run a repair
[02:42:51] <richo> (Which write locks the whole db)
[02:54:21] <Antiarc> If you're using BSON ObjectIds, they're orderable by insert time, since the most significant bytes of the ID is the date the ID was created. So you can just sort by ID desc, find the nth+1 document where n is the capped collection size, and delete where _id: {$lte: whateverID}
[02:55:35] <Antiarc> What's the size of the db/collection in question?
[02:55:49] <richo> the whole mongo dataset is about 480G
[02:56:56] <Antiarc> is that storageSize or fileSize?
[02:57:16] <richo> nfi, it's the mount of disk that lib/mongo is consuming with it's .n files on disk
[02:57:35] <Antiarc> okay, so that's fileSize -- you may be using significantly less actual data. check db.stats() from a console
[02:57:49] <richo> I'm not well versed in mongo, we've always been pretty clear about not putting data we care about into it
[02:57:54] <richo> but evidently some of our devs missed the memo :/
[02:58:14] <richo> almost certainly, I nuked a ton of data the other day to make runway
[02:58:45] <richo> but when that failed to help, I just found some other crap on the same partition to nuke in the meantime. The on disk footprint is growing again, so I'd suggest it's eaten up all that reclaimed space now
[03:02:24] <Antiarc> Well, if you can get your devs on board, you could have them start writing to a different capped collection so that data moving forward is capped, then you could just drop the collections in question once they would have been rolled out of the capped collection, but that may not be viable without more disk space
[03:03:16] <richo> It's pretty trivial to switch collections, but doing it in a coordinated fashion across all the things that want to poke at them won't be that much fun
[03:03:32] <richo> I effectively have tons of space
[03:03:35] <Antiarc> Another option might be to just set TTL on the collection to help limit growth
[03:03:37] <richo> I've got this slave that I can just promote
[03:03:46] <richo> and then I've got a master with ~200G free
[03:03:50] <Antiarc> You can build a background index for the TTL so that won't create a lock
[03:04:06] <richo> but that doesn't really solve the longterm problem
[03:04:32] <richo> I did toy with just sticking it on a 8TB EBS volume and planning to have died of something before it became a problem again though
[03:04:36] <Antiarc> Well, kinda, it's a time-constrained capped collection rather than recordcount-constrained (and doesn't have the write profile of a capped collection obviously)
[03:04:58] <richo> none of these problems actually deal with my underlying problem
[03:05:05] <richo> tl;dr can I tell master slave to only look at a subset of things?
[03:05:10] <richo> if that works, then I can fix this pretty easily
[03:05:35] <Antiarc> not AFAIK, but it's not something I've tried to do before
[03:06:37] <Antiarc> That would typically be more of a sharding concern
[03:06:53] <Antiarc> since replication is designed to permit failover, which would necessitate that both nodes have a complete view of the data
[03:07:20] <richo> so "consistent" is not exactly strongly defined in the first place
[06:58:40] <floatingpoint> how would I go about using the findOne() method in the context of nodeJS?
[07:00:03] <ron> "people ask the oddest questions"
[07:01:03] <floatingpoint> I don't know if that was directed at me, but I can't get findOne() to work in the context of a nodeJS route
[07:01:19] <floatingpoint> however, I can run the exact same query from mongoshell just fine
[07:06:33] <floatingpoint> the following code should return a single document containing a user's password. however, NodeJS is crashing hard when I run the function from a route. Yes, I've cheked the DB connection. Yes, I've checked the input's validity. the findOne() function is the problem. http://pastebin.com/YbusXb4A
[07:33:06] <joannac> floatingpoint: what version of the node driver?
[07:43:54] <floatingpoint> it looks like I am getting the data, but I am having problems rendering it into a nodejs route
[08:02:29] <yati> Hi. This is about mongoengine. I have a Document called User and when I call disconnect() and then connect('some-new-db'), I expect User.objects to be empty. But that is not the case. In short, the Documents are using the old database(the first connect() call is in the settings module). Any insights?
[10:21:20] <tiller> What's the better tree structure to retrieve the full path from a child? I mean, I've a document per node, and I want to retrieve all documents which are within the path to the given child
[10:21:46] <tiller> it looks like there are no structure that would allow me to do that in 1 query
[10:24:19] <Derick> you need to store the paths for this
[10:24:38] <Derick> then you can do a simple anchored regexp search
[10:25:06] <tiller> yup, but I'll have to do this in 2 times, right?
[10:25:14] <tiller> First I find the child to retrieve the path
[10:25:22] <tiller> and then I find all its ancestors
[10:25:28] <cheeser> it'd be interesting to try writing a (dynamic) aggregation pipeline to find those without having to store the whole path on each node.
[10:26:42] <tiller> cheeser> I'll fetch the collections way more often then I'll modify it. So it's better if the search cost is minimized :)
[10:32:13] <tiller> yes but at the end I'll have to first find the child to get the path, and then find again to get its parents (Because I don't want only the ids)
[10:32:31] <tiller> Nodex> I don't really know, it can grow fast
[10:33:08] <Derick> tiller: you can alternatively *also* store all ancesters for each node
[10:33:12] <tiller> the collection may contains millions of nodes, but with at most 4 levels
[10:34:51] <tiller> Well, I think I'll have to do it into 2 requests anyway
[10:35:04] <Derick> there is nothing wrong with that :)
[10:57:45] <tiller> Just to know, can we do "recursive" aggregate with mongodb? I mean, if I've for example: {val: 1, childs: [{val: 2, childs: [{val: 3, childs: []}]}, {val: 4, childs: []}]}]} to find if I've "val = 4" somewhere into the "tree", without knowing the deepth
[11:00:52] <Nodex> personaly I wouldn't use mongo for that, I would use a redis hash
[12:10:08] <tijmen> Hi all, if I restart a running mongo with oplogging enabled can I then still do a mongodump with --oplog? I understand it will only contain oplog data from the moment of restart. If this works, will this also allow mongorestore to read this partial oplog data?
[13:23:52] <iliyas> I'm working on a particular use-case wherein I have imported a JSON log file in MongoDB. The log file has the following format as provided in the reference below,
[13:25:59] <Nodex> for example. db.foo.find({"Records.awsRegion":"us-east-1"});
[13:30:05] <iliyas> Nodex: A single log file is loaded as a single document in Mongo. When I execute the above query it prints the entire document. I'm looking for how I can fetch the respective values for a given key.
[13:31:59] <scrdhrt> iliyas: db.foo.find({}, {"Records.awsRegion": 1}) will extract the value
[13:34:19] <scrdhrt> But that will match all awsRegions in the log file. If you want to match a specific document in the collection you have to match on a unique value
[13:37:05] <Nodex> use $elemMatch and a projection operator
[13:38:04] <Nodex> something like .. db.foo.find({"Regions":{$elemMatch:{awsRegion:"us-east-1"}}},{"Regions.awsRegion.$":1});
[13:45:46] <ckrause> I *think* I want to use the code data type to store functions, but I am not sure. I cannot find an example that uses it. Can somebody point me to info on how to properly use the code data type?
[13:57:08] <Nodex> ckrause : what do you hope to do with the functions?
[13:58:17] <ckrause> Nodex: Let me try to explain. I want to store documents that are "nouns" and documents that are "verbs"
[13:58:44] <ckrause> noun documents will have properties. Imagine an RPG one noun document is a character
[13:59:02] <ckrause> it has strength, hit points, etc as properties
[13:59:02] <Nodex> what does that have to do with functions?
[13:59:09] <ckrause> I want a verb to be a function
[13:59:20] <Nodex> and what do you want to do with the function
[13:59:25] <ckrause> a function might be a spell that reduces hit points by 10%
[13:59:42] <Nodex> it's just text so store it howveer you like
[13:59:43] <ckrause> I want to be able to write different spells/verbs and store them in the db
[13:59:54] <Nodex> if you're hoping that mongodb will execute these functions then think again
[14:00:00] <ckrause> I want to then have the verbs operate on the nouns
[14:00:30] <ckrause> I assumes I can retrieve a function from mongodb and "type cast" it as a function and use it in javascript
[14:00:56] <Nodex> I'm not sure why you would think you couldn't tbh
[14:01:22] <Nodex> your function is just a string simple
[14:01:23] <ckrause> I haven't been able to find a single example of using the code data type
[14:10:15] <kali> ckrause: i'm not surprise. i can not say for sure what it was intended for, i've never heard about anybody using it at the application level
[14:11:45] <ckrause> kali: I don't want to write all kinds of code at my applications level only to find out later that I could just have used the code data type. But at this point, I have no idea what they code type can and cannot do.
[14:11:59] <ckrause> That's why I was hoping to find some examples
[14:14:09] <kali> ckrause: i think you can quite safely consider it deprecated
[14:14:39] <kali> -> don't use it, just forget it exists :)
[14:41:42] <Repox> Hello. I have some time data that I'd like to have converted to an ISODate object. See http://pastie.org/private/pded4zfyvwczr29a1tenw - Is it possible to get the field "updated_at" converted correctly to an ISODate object?
[14:55:30] <paulo_cv> hi, how do I use an AND condition in a remove command? for instance db.collection.remove({condition1: "xyz" **AND** condition2: "abc"}) ?
[15:31:23] <Neptu> I was thinking on running mongo on some arm quads or the new intel atoms quads with 8GB ram unsure if minimalistics nodes works good with mongo
[15:31:43] <Derick> MongoDB is not supported on ARM
[15:32:05] <Neptu> i saw some guys running version 2.1
[15:57:27] <tiller> Is there a way, within an aggregate <hen grouping, to merge arrays? I thought it would work with $addToSet but it creates me a set of arrays :)
[16:03:17] <ckrause> I have made some progress with the code data type.
[16:03:36] <ckrause> I have been able to use the node native driver to create a document with a code data type
[16:03:56] <ckrause> I am able to execute that function on the server using the mongo shell
[16:04:11] <ckrause> I am unable to determine how to do it on the client via the driver.
[16:04:33] <ckrause> Anybody use the code data type before?
[16:12:30] <timhansen> morning all. i'm having an issue getting my rails app to conned to my mongo db. it was working fine a month or two ago, and nothing on the server side has change. was wondering if anyone can help me debug this issue: https://gist.github.com/willc0de4food/403d3a99f668e92c64e3
[16:15:52] <tiller> Any idea for my grouping issue? =/
[16:18:14] <ckrause> I am looking at the source code for the native driver. In particular https://github.com/mongodb/node-mongodb-native/blob/1.4/lib/mongodb/index.js
[16:18:34] <ckrause> I see exports.Code = require('bson').Code;
[16:18:56] <ckrause> but I don't see bson.js in the lib directory. So where is that coming from?
[16:21:56] <tiller> http://pastebin.com/gyRfaExx * sorry. I forgot to remove one pair of parentesis
[16:23:46] <ckrause> looks like bson is a separate module and is a dependency
[16:27:04] <tiller> hmm, I see a way to do it, but it seems to be "the hard way". unwind, before grouping
[16:30:15] <sec^nd> Is there a place I can get an in depth guide of mongodb? From both using the database, best practices, administration, schema design, and maintenance?
[16:32:53] <sec^nd> I want to store massive amounts of data in mongodb (terabytes maybe petabytes eventually). I want to upload a bunch of images and videos to a deduped gridfs like schema, hash each file, store meta data on it and make it easily searchble via api and frontends. Can mongodb support this? Also I need to scale with replication and sharding w/ servers that have about 20G of ram and a lot of hard drive space.
[16:33:43] <Derick> sec^nd: that's quite something you want to do - my (biased) opinion is that MongoDB can do this - but it will need a lot of thinking
[16:35:00] <Derick> I'm not quite sure if IRC as a medium works for such a large question
[16:35:10] <sec^nd> Derick: the dedup storing of the images will use a schema almost exact to gridfs except with deduplication of the blocks, I will also use a chunk size of either 256K or 512K.
[16:35:16] <sec^nd> Derick: which would be better?
[16:35:38] <Derick> you're thinking of doing your own gridfs implementation really?
[16:45:27] <sec^nd> What is the best way to take a backup / snapshot of a large shard / replication setup?
[16:45:37] <du2x> Derick, I have a lot of connections, opening, and closing. I do this when I am importing data. I see the log and there is always one or two opened connections. My question is: do I have to avoid these opening and closing connections? I refer to the intermitent connection failures problem.
[16:45:43] <ckrause> I am now able to create a document with a property of code data type. That function takes two documents and performs math on properties of those two documents. But I can still only do this in the shell. Haven't figured out how to do it with the node native driver.
[16:46:24] <Derick> sec^nd: an extra replica set member that's hidden per shard that you can do backups off, or just use as backup. Also have a look at delayed replica set members.
[16:46:47] <Derick> du2x: one or two shouldn't matter really... do you have a replica set?
[16:47:06] <sec^nd> ahh Derick so I could use the replica sets and take a moment in time snapshot from them?
[16:48:23] <Derick> du2x: something must be connected to it - perhaps the mongo shell?
[16:48:37] <sec^nd> Derick: I'll be using ZFS most likely.
[16:48:56] <Derick> sec^nd: I did look for the gridfs specification, but that doesn't have useful info: http://docs.mongodb.org/manual/reference/gridfs/
[16:49:03] <Derick> sec^nd: can't ZTS do FS snapshots as well?
[16:51:14] <Derick> sec^nd: Not sure how to do that with ZFS, I'm not an expert
[16:51:17] <Derick> http://antydba.blogspot.co.uk/2010/02/mongodb-backup-with-zfs.html has some info
[17:12:34] <du2x> hey Derick, I have some details about my problem. Maybe worth reporting, then. The made a script the treats the connection failure by sleeping one second and try again, and repeat until it gets connected and make stuff. Then I detected a pattern: a lot of connections happens, one connection failure happen, and keeps failing by 5 to 8 seconds, then I get a lot of connections happening again.
[18:40:51] <ron> still can't get over today's interview question - if you didn't use Big Data why did you use MongoDB?
[19:54:37] <bui> hi ! I have a couple of documents, but I want to check if the number of unique('id') from those represents more than X% of unique('id') from documents sharing a property (ie, 'zone') present in the whole collection (from where my documents came from), is x.find(['zone' : zone]).unique('id').count() a proper way to do so ?
[19:55:06] <bui> or maybe the map-reduce mecanism is more adapted ?
[22:16:32] <DanielKarp> Having trouble getting queries that I think should be indexOnly to be indexOnly. Can anyone tell me why this simple example is not explaining to indexOnly? http://pastebin.com/6zaNs9vQ
[22:17:55] <DanielKarp> (obviously, the real-life example is more complicated, but first I want to see if I can get it working in a simple test)
[22:18:48] <joannac> you need to return both "foo" and "_id"
[22:21:18] <astropirate> I got an array filled with ObjectIDs. I want to return all of the corresponding documents, they are all in the same collection. What is the maximum number of IDs I can' pass using the $in operator?
[22:39:38] <cheeser> i prefer the string version because it makes the database contents readable.
[22:39:58] <cheeser> others don't like that because you can't rename them and so prefer to ordinal value.
[22:40:32] <cheeser> i don't like *that* because now you can't reorder them (even if only by adding a new on in the middle) and that's a far more common change than renaming
[22:41:52] <tehroflmaoer> I've got a thing that I'm storing that's got at least 4 enum fields, so I think I'm kind of leaning towards storing ordinal values
[22:42:11] <tehroflmaoer> but it does make them a lot less readable...
[22:43:18] <cheeser> spend enough time debugging queries and their output and you'll wish you had those names, i'll bet.
[22:43:36] <cheeser> but then, again, spend enough time doing that and you'll have the ordinals memorized.
[22:57:41] <floatingpoint> where is the function documentation for the nodejs driver? I can't find anything obvious for .update
[23:22:48] <DanielKarp> joannac (if you see that, never mind, my mistake!)
[23:36:14] <DanielKarp> joannac: Final word: it seems as if I can't get a query to hit indexOnly through PHP when going through findOne and restricting the fields appropriately.