pmxbot IRC Log Viewer

[01:43:37] <Antiarc> How can I $project an array value to a field (or otherwise reference a specific array element in the aggregation framework)?

[01:44:10] <joannac> $unwind

[01:44:16] <Antiarc> No, sorry - let me expound

[01:45:11] <Antiarc> I have an array "values" which contains arrays which are pairs of [timestamp, value]. I'm unwinding values, then I want to sort the unwound set on values[0][0], but $sort: {"v.0.0": 1} doesn't seem to do anything useful

[01:45:37] <Antiarc> So I thought maybe I could $project the fields into date and value fields, but I can't project $value.0 either - it ends up projecting an empty array

[01:47:34] <Antiarc> That is, after unwinding, I end up with something like this: https://gist.github.com/cheald/8c3408b2d609424b9125 (though it's several thousand documents)

[01:48:12] <Antiarc> I want to sort on v[0][0] there, but I can neither project v[0][0] to another field, or $sort on it.

[01:48:23] <Antiarc> err, v[0]

[01:48:26] <Antiarc> Since that's been unwound

[01:51:05] <Antiarc> https://gist.github.com/cheald/8c3408b2d609424b9125 -- that's effectively what I'm seeing. The first bit there is the result of just the unwind, then the second bit is what I get once I attempt to project from the array.

[01:51:31] <Antiarc> I could switch to using subdocuments here, but since it's always a 2-member array, the array seemed like it'd be more efficient. No real need for field names here.

[01:57:10] <dangayle> Does anyone know the #irc channel for the M101P MongoDB course that starts today?

[02:07:38] <joannac> I didn't know we had irc channels for courses

[02:11:40] <richo> I've got some collections I want to convert to capped. I don't really care about data loss. If I replicate into a slave, then try to cap it on the slave, is that going to upset the master?

[02:12:10] <Antiarc> I may be wrong but I don't think mongo's going to let a slave modify a collection

[02:13:48] <richo> Well, probably the crux of my question is "will a master let a slave upset it?"

[02:13:56] <richo> because if that's fine I can fiddle with my slave with wild impunity

[02:15:42] <joannac> richo: offtopic but do I know you from another network?

[02:16:28] <richo> zomg

[02:16:29] <richo> broannac

[02:16:45] <richo> joannac: halp me unsplode my mongo deployment :D

[02:21:26] <Antiarc> richo: I don't think you're going to be able to do that from a slave - mongo will just refuse to execute it on a slave. Maybe just mapreduce your origin data into your target collection?

[02:22:00] <richo> Antiarc: that sounds massively unpleasant. plus the whole issue is that the master is basically out of space

[02:22:07] <richo> I can migrate to the slave and masterfy it now

[02:22:29] <Antiarc> Ah, yeah. If the concern isn't the write lock then I'd just rs.stepDown the master and convert it on your slave

[02:22:30] <richo> which would give me the required breathing room, but capping a collection without write locking the db doens't seem like it should have that many operational constraints

[02:23:59] <Antiarc> If fragmentation is significant you could regain some breathing room on the master by stepping it down and then just stop mongo, blow away the data dir, start it back up, and let it resync from the replica set

[02:24:12] <Antiarc> Basically performing a repair except without the need to have the extra disk space.

[02:24:57] <joannac> Antiarc: that means putting replication load on your old secondary / new primary

[02:25:17] <richo> that also hinges on us already having a replset

[02:25:23] <Antiarc> It does. But if you're just flat out of disk space on your old primary, you can't repair/compact

[02:25:28] <joannac> richo: hope you're prepared for the initial massive delete thoguh

[02:25:38] <richo> is the behaviour of trying to convert from master-slave to a replset even defined?

[02:25:41] <joannac> waaaait

[02:26:04] <Antiarc> wait, how old is your server if you're running master-slave?

[02:26:15] <richo> I'm running master slave as of yesterday

[02:26:20] <richo> it was just an instance running mongo that we put shit into

[02:26:34] <richo> but then I discovered that some of our genius devs put data they actually care about into mongo

[02:26:34] <Antiarc> master-slave has been deprecated for a long while now, AFAIK

[02:26:35] <dangayle> joannac: I know we (the students) created an ad-hoc channel the first time the M101 course was offered last year.

[02:26:46] <Antiarc> Anyhow there's a doc on upgrading master-slave to a replset

[02:26:49] <Antiarc> Let me dig it up

[02:26:56] <richo> well, I don't actually want a replset

[02:27:07] <richo> I just want to do some blocking operations without actually writelocking the whole isntance

[02:27:19] <joannac> richo: actually, i don't know how this will work.

[02:27:27] <joannac> richo: http://docs.mongodb.org/manual/core/capped-collections/#convert-a-collection-to-capped

[02:27:40] <richo> yeah, I've read that

[02:27:52] <Antiarc> replsets are a superset of master-slave - it's basically the recommended way to do master-slave setups now. You aren't going to get away from the write lock easily, though - that tends to be a big problem for this kind of thing, in my experience

[02:27:54] <richo> but but the whole "global write lock" thing is a problem

[02:28:08] <richo> well, they're not strictly a superset

[02:28:15] <richo> unless you create nothing but priority 0 readers I guess

[02:28:18] <joannac> you could probably do that when the secondary is in standalone mode, and then when you put it back it'll keep replicating, but I'm not 100% on that

[02:29:02] <richo> I might just see if our dataset is append only on the other tables and just replicate the missing data by hand after I lock the slave for a while

[02:29:08] <richo> actually

[02:29:17] <richo> can I replicate only some collections in master-slave?

[02:29:43] <richo> I saw the only config option but the docs don't really mention how/if it works with several db's or collections

[02:30:08] <richo> because that'd be easiest, I'll cap the collections I don't care about, then replicate again to pull in the new data in the other collections, then just promote and blwow away the old instance

[02:34:00] <Antiarc> You might want to spin up a miniature deployment to test behavior when you have a replication scenario with a collection capped on one node and not on the other

[02:34:34] <richo> yeah, I've been trying to avoid that if I can but it looks like that's what I'm doing

[02:35:06] <Antiarc> Something else to consider is that based on how long your secondary will be in standalone mode, make sure your primary's oplog size is large enough to hold all transactions during the time it's offline

[02:35:19] <Antiarc> else you'll have to do a full replication from master, which would undo your capping work

[02:36:46] <Antiarc> If this is just trying to get some breathing room, you could run a manual delete to drop any documents that'd be dropped by the capping anyhow. Quick fix to get you some space to work.

[02:37:37] <Antiarc> I suspect that may make the capping faster, too, but I'm not 100% on the internals of how capped conversions happen when you have more data than the collection will allow

[02:42:38] <richo> how does a manual delete work?

[02:42:47] <richo> the docs suggest that to reclaim space from delorted documents you need to run a repair

[02:42:51] <richo> (Which write locks the whole db)

[02:54:21] <Antiarc> If you're using BSON ObjectIds, they're orderable by insert time, since the most significant bytes of the ID is the date the ID was created. So you can just sort by ID desc, find the nth+1 document where n is the capped collection size, and delete where _id: {$lte: whateverID}

[02:54:37] <richo> but to be clear

[02:54:38] <Antiarc> That won't free up disk space, but it'll add usable space for that collection, and will mean less data to move around

[02:55:16] <richo> Yeah, that doesn't really solve my problem though

[02:55:22] <richo> I've got about 11G of runway

[02:55:35] <Antiarc> What's the size of the db/collection in question?

[02:55:49] <richo> the whole mongo dataset is about 480G

[02:56:56] <Antiarc> is that storageSize or fileSize?

[02:57:16] <richo> nfi, it's the mount of disk that lib/mongo is consuming with it's .n files on disk

[02:57:35] <Antiarc> okay, so that's fileSize -- you may be using significantly less actual data. check db.stats() from a console

[02:57:49] <richo> I'm not well versed in mongo, we've always been pretty clear about not putting data we care about into it

[02:57:54] <richo> but evidently some of our devs missed the memo :/

[02:58:14] <richo> almost certainly, I nuked a ton of data the other day to make runway

[02:58:45] <richo> but when that failed to help, I just found some other crap on the same partition to nuke in the meantime. The on disk footprint is growing again, so I'd suggest it's eaten up all that reclaimed space now

[03:02:24] <Antiarc> Well, if you can get your devs on board, you could have them start writing to a different capped collection so that data moving forward is capped, then you could just drop the collections in question once they would have been rolled out of the capped collection, but that may not be viable without more disk space

[03:03:16] <richo> It's pretty trivial to switch collections, but doing it in a coordinated fashion across all the things that want to poke at them won't be that much fun

[03:03:32] <richo> I effectively have tons of space

[03:03:35] <Antiarc> Another option might be to just set TTL on the collection to help limit growth

[03:03:37] <richo> I've got this slave that I can just promote

[03:03:46] <richo> and then I've got a master with ~200G free

[03:03:50] <Antiarc> You can build a background index for the TTL so that won't create a lock

[03:04:06] <richo> but that doesn't really solve the longterm problem

[03:04:32] <richo> I did toy with just sticking it on a 8TB EBS volume and planning to have died of something before it became a problem again though

[03:04:36] <Antiarc> Well, kinda, it's a time-constrained capped collection rather than recordcount-constrained (and doesn't have the write profile of a capped collection obviously)

[03:04:58] <richo> none of these problems actually deal with my underlying problem

[03:05:05] <richo> tl;dr can I tell master slave to only look at a subset of things?

[03:05:10] <richo> if that works, then I can fix this pretty easily

[03:05:35] <Antiarc> not AFAIK, but it's not something I've tried to do before

[03:06:37] <Antiarc> That would typically be more of a sharding concern

[03:06:53] <Antiarc> since replication is designed to permit failover, which would necessitate that both nodes have a complete view of the data

[03:07:10] <richo> well

[03:07:11] <richo> it's capped

[03:07:20] <richo> so "consistent" is not exactly strongly defined in the first place

[06:58:40] <floatingpoint> how would I go about using the findOne() method in the context of nodeJS?

[07:00:03] <ron> "people ask the oddest questions"

[07:01:03] <floatingpoint> I don't know if that was directed at me, but I can't get findOne() to work in the context of a nodeJS route

[07:01:19] <floatingpoint> however, I can run the exact same query from mongoshell just fine

[07:06:33] <floatingpoint> the following code should return a single document containing a user's password. however, NodeJS is crashing hard when I run the function from a route. Yes, I've cheked the DB connection. Yes, I've checked the input's validity. the findOne() function is the problem. http://pastebin.com/YbusXb4A

[07:33:06] <joannac> floatingpoint: what version of the node driver?

[07:43:37] <floatingpoint> hm

[07:43:54] <floatingpoint> it looks like I am getting the data, but I am having problems rendering it into a nodejs route

[08:02:29] <yati> Hi. This is about mongoengine. I have a Document called User and when I call disconnect() and then connect('some-new-db'), I expect User.objects to be empty. But that is not the case. In short, the Documents are using the old database(the first connect() call is in the settings module). Any insights?

[08:35:14] <[AD]Turbo> hi there

[10:18:44] <tiller> Hi there

[10:18:49] <Derick> hi there

[10:21:20] <tiller> What's the better tree structure to retrieve the full path from a child? I mean, I've a document per node, and I want to retrieve all documents which are within the path to the given child

[10:21:46] <tiller> it looks like there are no structure that would allow me to do that in 1 query

[10:24:19] <Derick> you need to store the paths for this

[10:24:24] <Derick> with each node

[10:24:38] <Derick> then you can do a simple anchored regexp search

[10:25:06] <tiller> yup, but I'll have to do this in 2 times, right?

[10:25:14] <tiller> First I find the child to retrieve the path

[10:25:22] <tiller> and then I find all its ancestors

[10:25:28] <cheeser> it'd be interesting to try writing a (dynamic) aggregation pipeline to find those without having to store the whole path on each node.

[10:26:42] <tiller> cheeser> I'll fetch the collections way more often then I'll modify it. So it's better if the search cost is minimized :)

[10:27:00] <tiller> than*

[10:29:33] <Nodex> how many paths are there?

[10:30:42] <Derick> tiller: no, store not only the parent and childer, but the whole path - it's called materialized paths or something

[10:30:53] <Derick> one sec, I'll look it up

[10:31:08] <Derick> http://docs.mongodb.org/manual/tutorial/model-tree-structures-with-materialized-paths/

[10:31:09] <tiller> Derick> http://docs.mongodb.org/manual/tutorial/model-tree-structures-with-materialized-paths/

[10:31:32] <Derick> yes :P

[10:32:13] <tiller> yes but at the end I'll have to first find the child to get the path, and then find again to get its parents (Because I don't want only the ids)

[10:32:31] <tiller> Nodex> I don't really know, it can grow fast

[10:33:08] <Derick> tiller: you can alternatively *also* store all ancesters for each node

[10:33:12] <tiller> the collection may contains millions of nodes, but with at most 4 levels

[10:34:51] <tiller> Well, I think I'll have to do it into 2 requests anyway

[10:35:04] <Derick> there is nothing wrong with that :)

[10:35:38] <tiller> yup :) Thanks guys

[10:57:45] <tiller> Just to know, can we do "recursive" aggregate with mongodb? I mean, if I've for example: {val: 1, childs: [{val: 2, childs: [{val: 3, childs: []}]}, {val: 4, childs: []}]}]} to find if I've "val = 4" somewhere into the "tree", without knowing the deepth

[11:00:52] <Nodex> personaly I wouldn't use mongo for that, I would use a redis hash

[11:01:40] <tiller> hmm m'okay :)

[11:01:53] <Nodex> :)

[11:03:14] <nullsign> mongo <3

[11:33:22] <Peeter> http://stackoverflow.com/questions/20215508/normalized-vs-denormalized-data-in-mongo

[11:35:51] <Derick> Peeter: you're storing tweets directly from their API?

[11:36:08] <Peeter> Nope

[11:36:28] <Peeter> That's just demo data with a simplified schema

[11:36:35] <Peeter> Why?

[11:37:44] <Derick> just wondering :-)

[11:38:02] <Peeter> Now I'm curious why you're wondering :P

[11:38:22] <Derick> it does a similar thing with attachments like this

[11:38:25] <Peeter> Doesn't seem to be in any way related to the question I posted, more like something else :p

[11:38:32] <Derick> and : "serviceName": "twitter"

[11:38:36] <Derick> aaanyway

[11:39:11] <Peeter> The schema isn't copied from twitter if you're wondering that :P?

[11:39:14] <Derick> Peeter: which part author information do you need to show with a post?

[11:39:21] <Peeter> all of it

[11:39:35] <Peeter> well, not all of it

[11:39:39] <Peeter> description can be ignored

[11:39:46] <Peeter> That's it though

[11:40:28] <Derick> because you do a lot more views than updates, it makes littles sense to split this out.

[11:40:36] <Derick> Let me answer in more detail at SO.

[11:41:08] <Peeter> Would greatly appreciate it

[11:44:04] <Derick> there

[11:44:09] <Derick> let me know if you miss something, I can add it

[11:50:29] <Peeter> Could you expand on the data locality point? Wouldn't it require more memory if data is duplicated?

[11:50:44] <Derick> yes, but it's still not a lot of data

[11:50:57] <Derick> if you have post in positionl 400 and author in position 91231

[11:51:09] <Derick> it uses two pages to fetch data - if it's all together, only one

[11:51:16] <Derick> (a page is likely 4096 bytes)

[12:10:08] <tijmen> Hi all, if I restart a running mongo with oplogging enabled can I then still do a mongodump with --oplog? I understand it will only contain oplog data from the moment of restart. If this works, will this also allow mongorestore to read this partial oplog data?

[13:23:51] <iliyas> Greetings !

[13:23:52] <iliyas> I'm working on a particular use-case wherein I have imported a JSON log file in MongoDB. The log file has the following format as provided in the reference below,

[13:23:52] <iliyas> http://docs.aws.amazon.com/awscloudtrail/latest/userguide/eventreference.html

[13:23:53] <iliyas> Can anyone help me in understanding how I can query the inner elements such as awsRegion, eventName etc ?

[13:25:34] <Nodex> Records.awsRegion

[13:25:59] <Nodex> for example. db.foo.find({"Records.awsRegion":"us-east-1"});

[13:30:05] <iliyas> Nodex: A single log file is loaded as a single document in Mongo. When I execute the above query it prints the entire document. I'm looking for how I can fetch the respective values for a given key.

[13:31:59] <scrdhrt> iliyas: db.foo.find({}, {"Records.awsRegion": 1}) will extract the value

[13:33:18] <Nodex> you didn't specify that

[13:33:35] <Nodex> but that ^^ will do it

[13:34:19] <scrdhrt> But that will match all awsRegions in the log file. If you want to match a specific document in the collection you have to match on a unique value

[13:37:05] <Nodex> use $elemMatch and a projection operator

[13:38:04] <Nodex> something like .. db.foo.find({"Regions":{$elemMatch:{awsRegion:"us-east-1"}}},{"Regions.awsRegion.$":1});

[13:38:13] <Nodex> or something to that effect

[13:45:46] <ckrause> I *think* I want to use the code data type to store functions, but I am not sure. I cannot find an example that uses it. Can somebody point me to info on how to properly use the code data type?

[13:57:08] <Nodex> ckrause : what do you hope to do with the functions?

[13:58:17] <ckrause> Nodex: Let me try to explain. I want to store documents that are "nouns" and documents that are "verbs"

[13:58:44] <ckrause> noun documents will have properties. Imagine an RPG one noun document is a character

[13:59:02] <ckrause> it has strength, hit points, etc as properties

[13:59:02] <Nodex> what does that have to do with functions?

[13:59:09] <ckrause> I want a verb to be a function

[13:59:20] <Nodex> and what do you want to do with the function

[13:59:25] <ckrause> a function might be a spell that reduces hit points by 10%

[13:59:42] <Nodex> it's just text so store it howveer you like

[13:59:43] <ckrause> I want to be able to write different spells/verbs and store them in the db

[13:59:54] <Nodex> if you're hoping that mongodb will execute these functions then think again

[14:00:00] <ckrause> I want to then have the verbs operate on the nouns

[14:00:30] <ckrause> I assumes I can retrieve a function from mongodb and "type cast" it as a function and use it in javascript

[14:00:56] <Nodex> I'm not sure why you would think you couldn't tbh

[14:01:22] <Nodex> your function is just a string simple

[14:01:23] <ckrause> I haven't been able to find a single example of using the code data type

[14:10:15] <kali> ckrause: i'm not surprise. i can not say for sure what it was intended for, i've never heard about anybody using it at the application level

[14:10:18] <kali> +d

[14:11:45] <ckrause> kali: I don't want to write all kinds of code at my applications level only to find out later that I could just have used the code data type. But at this point, I have no idea what they code type can and cannot do.

[14:11:59] <ckrause> That's why I was hoping to find some examples

[14:14:09] <kali> ckrause: i think you can quite safely consider it deprecated

[14:14:19] <kali> or internal

[14:14:39] <kali> -> don't use it, just forget it exists :)

[14:41:42] <Repox> Hello. I have some time data that I'd like to have converted to an ISODate object. See http://pastie.org/private/pded4zfyvwczr29a1tenw - Is it possible to get the field "updated_at" converted correctly to an ISODate object?

[14:55:30] <paulo_cv> hi, how do I use an AND condition in a remove command? for instance db.collection.remove({condition1: "xyz" **AND** condition2: "abc"}) ?

[14:55:39] <Derick> jstu like this:

[14:55:51] <Derick> db.collection.remove({condition1: "xyz", condition2: "abc"});

[14:56:20] <Derick> the same way how you would find those documents (so try that first!)

[14:56:36] <paulo_cv> Derick: thank you and what would change to switch to an OR condition?

[14:56:50] <Derick> http://docs.mongodb.org/manual/reference/operator/query/or/

[14:56:58] <paulo_cv> Derick: thanks!

[15:18:32] <Neptu> hej

[15:19:04] <Neptu> if you mongo a secondary machine what was the command to let you utilize it and do some search querys

[15:19:18] <Neptu> do not remember now and is been hard to find something in google

[15:20:15] <Derick> which language?

[15:20:23] <Neptu> on the mongo console

[15:20:28] <Derick> oh

[15:20:30] <Neptu> mongo 10.1.1.10

[15:20:33] <Derick> db.slaveOkay()

[15:20:40] <Neptu> thanks I did not remember it

[15:20:44] <Derick> no

[15:20:47] <Derick> setSlaveOk()

[15:22:12] <Neptu> Derick, rs.slaveOK()

[15:22:23] <Neptu> i forgot the command completely :P

[15:23:10] <Derick> I wonder what db.slaveOkay() does really now

[15:23:34] <Derick> it's not documented

[15:24:23] <Derick> i wonder whether the shell supports read preferences

[15:24:37] <cheeser> db.slaveOk()

[15:24:37] <cheeser> Tue Nov 26 16:22:52.436 TypeError: Property 'slaveOk' of object test is not a function

[15:24:55] <Derick> cheeser: you need rs.

[15:25:23] <cheeser> i know. but you were asking about db :)

[15:25:44] <Derick> yes, which is db.setSlaveOk()

[15:25:50] <Derick> at least in mine :S

[15:29:59] <Neptu> Derick, hehehe

[15:31:23] <Neptu> I was thinking on running mongo on some arm quads or the new intel atoms quads with 8GB ram unsure if minimalistics nodes works good with mongo

[15:31:43] <Derick> MongoDB is not supported on ARM

[15:32:05] <Neptu> i saw some guys running version 2.1

[15:32:10] <Neptu> but not more than that

[15:32:16] <Derick> Neptu: yeah, but that's old - don't do that :P

[15:32:30] <Neptu> I was looking at the new atoms

[15:32:44] <Derick> mongo likes having data backed by memory

[15:32:46] <Neptu> 58$ atom quad up to 8GB ram

[15:33:08] <Derick> you don't really need lots of CPU if you don't do complicated calculations (through f.e. the A/F)

[15:33:11] <Derick> (or M/R)

[15:33:37] <Neptu> Derick, I seen it but I was thinking small nodes with a small SSD

[15:33:47] <Derick> Neptu: what would you use that for?

[15:34:05] <Neptu> Insertions over 1TB a day of meteo data

[15:34:23] <Neptu> throput replication

[15:34:32] <Neptu> if i could runn it on small nodes

[15:34:34] <Neptu> will be nice

[15:34:36] <Derick> are you going to want to read that much too?

[15:35:01] <Neptu> Derick, probaby this is base for other apps

[15:35:14] <Neptu> use geograpical extensions

[15:35:17] <Neptu> and indexing

[15:35:49] <Nodex> 1tb a day

[15:36:10] <Neptu> and cast out in a day aswell

[15:36:20] <Neptu> meteo data gets old quick

[15:36:30] <Neptu> so I was thinking on small unexpensive

[15:36:35] <Neptu> HW

[15:36:49] <Neptu> more than bigger things

[15:37:27] <Neptu> I saw some boards with sata conection GB ethernet and 2GB ram

[15:37:38] <Neptu> as a node for an small disk

[15:37:43] <Neptu> feels good

[15:38:23] <Neptu> donno its an idea

[15:41:58] <Derick> I'd try it out with simulated data

[15:52:23] <du2x> hello there, I would like to know if there are anybody who had problem with mongodb intermittent connection failures.

[15:52:37] <Derick> I've seen rumours

[15:53:49] <du2x> Derick, well, at least, looks like I am not alone, then.

[15:54:15] <du2x> Derick, what about solutions... have you heard about?

[15:54:50] <Derick> du2x: can you please tell a bit more about setup, language, driver, versions, hosting platform?

[15:55:56] <du2x> sure

[15:56:03] <tiller> hi there (again)

[15:57:27] <tiller> Is there a way, within an aggregate <hen grouping, to merge arrays? I thought it would work with $addToSet but it creates me a set of arrays :)

[15:57:36] <du2x> mongodb v2.4.6; ubuntu 13.10; python 2.7; pymongo 2.6.3

[15:57:46] <tiller> when*

[16:03:17] <ckrause> I have made some progress with the code data type.

[16:03:36] <ckrause> I have been able to use the node native driver to create a document with a code data type

[16:03:56] <ckrause> I am able to execute that function on the server using the mongo shell

[16:04:11] <ckrause> I am unable to determine how to do it on the client via the driver.

[16:04:33] <ckrause> Anybody use the code data type before?

[16:12:30] <timhansen> morning all. i'm having an issue getting my rails app to conned to my mongo db. it was working fine a month or two ago, and nothing on the server side has change. was wondering if anyone can help me debug this issue: https://gist.github.com/willc0de4food/403d3a99f668e92c64e3

[16:15:52] <tiller> Any idea for my grouping issue? =/

[16:16:10] <tiller> Tried something like this: db.permissions.aggregate({ "$match" : {"_id": { "$in" : [ ObjectId("...")]}}}, { "$group" : { "_id" : 1 , "roles" : { "$addToSet" : {"$each": "$roles"}}}});

[16:16:14] <tiller> failed :p

[16:16:39] <Derick> tiller: can you put your example in a pastebing - complete with start document, a/f query and expected output?

[16:16:51] <tiller> yup sure

[16:18:14] <ckrause> I am looking at the source code for the native driver. In particular https://github.com/mongodb/node-mongodb-native/blob/1.4/lib/mongodb/index.js

[16:18:34] <ckrause> I see exports.Code = require('bson').Code;

[16:18:56] <ckrause> but I don't see bson.js in the lib directory. So where is that coming from?

[16:20:49] <tiller> Derick> http://pastebin.com/0FhtjBYr

[16:21:38] <Derick> one sec

[16:21:56] <tiller> http://pastebin.com/gyRfaExx * sorry. I forgot to remove one pair of parentesis

[16:23:46] <ckrause> looks like bson is a separate module and is a dependency

[16:27:04] <tiller> hmm, I see a way to do it, but it seems to be "the hard way". unwind, before grouping

[16:30:15] <sec^nd> Is there a place I can get an in depth guide of mongodb? From both using the database, best practices, administration, schema design, and maintenance?

[16:30:23] <cheeser> mongodb.org

[16:31:00] <cheeser> also, http://www.mongodb.org/books

[16:31:42] <Derick> tiller: yes, you need the $unwind

[16:32:00] <tiller> hmm m'okay

[16:32:53] <sec^nd> I want to store massive amounts of data in mongodb (terabytes maybe petabytes eventually). I want to upload a bunch of images and videos to a deduped gridfs like schema, hash each file, store meta data on it and make it easily searchble via api and frontends. Can mongodb support this? Also I need to scale with replication and sharding w/ servers that have about 20G of ram and a lot of hard drive space.

[16:33:43] <Derick> sec^nd: that's quite something you want to do - my (biased) opinion is that MongoDB can do this - but it will need a lot of thinking

[16:34:34] <sec^nd> Derick: where should I start?

[16:35:00] <Derick> I'm not quite sure if IRC as a medium works for such a large question

[16:35:10] <sec^nd> Derick: the dedup storing of the images will use a schema almost exact to gridfs except with deduplication of the blocks, I will also use a chunk size of either 256K or 512K.

[16:35:16] <sec^nd> Derick: which would be better?

[16:35:38] <Derick> you're thinking of doing your own gridfs implementation really?

[16:36:07] <sec^nd> Derick: yar, I want dedup.

[16:36:13] <sec^nd> I may add transparent compression too.

[16:36:24] <Derick> okay - there are some guidelines on tihs - let's see if I can find it

[16:36:30] <Derick> when my browser unfreezes...

[16:45:27] <sec^nd> What is the best way to take a backup / snapshot of a large shard / replication setup?

[16:45:37] <du2x> Derick, I have a lot of connections, opening, and closing. I do this when I am importing data. I see the log and there is always one or two opened connections. My question is: do I have to avoid these opening and closing connections? I refer to the intermitent connection failures problem.

[16:45:43] <ckrause> I am now able to create a document with a property of code data type. That function takes two documents and performs math on properties of those two documents. But I can still only do this in the shell. Haven't figured out how to do it with the node native driver.

[16:46:24] <Derick> sec^nd: an extra replica set member that's hidden per shard that you can do backups off, or just use as backup. Also have a look at delayed replica set members.

[16:46:47] <Derick> du2x: one or two shouldn't matter really... do you have a replica set?

[16:47:06] <sec^nd> ahh Derick so I could use the replica sets and take a moment in time snapshot from them?

[16:47:18] <du2x> Derick, no replica set

[16:47:58] <sec^nd> Derick: is there a better way to do this than using mongodb?

[16:48:13] <Derick> sec^nd: LVM might help too

[16:48:23] <Derick> du2x: something must be connected to it - perhaps the mongo shell?

[16:48:37] <sec^nd> Derick: I'll be using ZFS most likely.

[16:48:56] <Derick> sec^nd: I did look for the gridfs specification, but that doesn't have useful info: http://docs.mongodb.org/manual/reference/gridfs/

[16:49:03] <Derick> sec^nd: can't ZTS do FS snapshots as well?

[16:49:19] <sec^nd> Derick: yep

[16:49:35] <du2x> pymongo

[16:49:39] <Derick> well, there you go then - you do need to "freeze" the member then.

[16:49:59] <du2x> Derick, I use pymongo

[16:50:03] <sec^nd> Derick: how would I restore it then?

[16:50:28] <du2x> Derick, oh I see

[16:51:14] <Derick> sec^nd: Not sure how to do that with ZFS, I'm not an expert

[16:51:17] <Derick> http://antydba.blogspot.co.uk/2010/02/mongodb-backup-with-zfs.html has some info

[17:12:34] <du2x> hey Derick, I have some details about my problem. Maybe worth reporting, then. The made a script the treats the connection failure by sleeping one second and try again, and repeat until it gets connected and make stuff. Then I detected a pattern: a lot of connections happens, one connection failure happen, and keeps failing by 5 to 8 seconds, then I get a lot of connections happening again.

[17:13:17] <du2x> it's repetitive

[17:13:34] <du2x> it seems the db goes down for some seconds.

[17:21:45] <Derick> du2x: I think the python driver uses connection pooling, so it will then create a few connections if it can.

[17:22:05] <Derick> du2x: what you do need to find out is why the server breaks connections. Does something show up in the MongoDB log?

[17:28:36] <du2x> Derick, well, the log says only connections opening and connections closing. So, can I conclude that it's a python driver issue?

[17:29:36] <Derick> du2x: it sounds like that, but I have too little evidence. I do have to leave now though, but I'll be back tomorrow.

[17:30:05] <du2x> no problem. thanks!

[18:40:51] <ron> still can't get over today's interview question - if you didn't use Big Data why did you use MongoDB?

[19:54:37] <bui> hi ! I have a couple of documents, but I want to check if the number of unique('id') from those represents more than X% of unique('id') from documents sharing a property (ie, 'zone') present in the whole collection (from where my documents came from), is x.find(['zone' : zone]).unique('id').count() a proper way to do so ?

[19:55:06] <bui> or maybe the map-reduce mecanism is more adapted ?

[22:16:32] <DanielKarp> Having trouble getting queries that I think should be indexOnly to be indexOnly. Can anyone tell me why this simple example is not explaining to indexOnly? http://pastebin.com/6zaNs9vQ

[22:17:55] <DanielKarp> (obviously, the real-life example is more complicated, but first I want to see if I can get it working in a simple test)

[22:18:48] <joannac> you need to return both "foo" and "_id"

[22:19:04] <DanielKarp> hmmm

[22:19:08] <DanielKarp> let me try that.

[22:19:16] <joannac> No, you misunderstood.

[22:19:41] <joannac> Your query returns both fields, and you can't satisfy that by just an index

[22:20:13] <joannac> you probably want {foo:1, _id:0} as your projection argument

[22:20:16] <DanielKarp> I still don't understand. how would an indexOnly ever work?

[22:20:30] <DanielKarp> Ahh

[22:20:48] <DanielKarp> That might do the trick!

[22:21:18] <astropirate> I got an array filled with ObjectIDs. I want to return all of the corresponding documents, they are all in the same collection. What is the maximum number of IDs I can' pass using the $in operator?

[22:21:36] <astropirate> is there a size limit?

[22:21:40] <astropirate> performance considerations?

[22:22:02] <astropirate> this is an analystics tool running against production DB

[22:22:09] <astropirate> analytics*

[22:37:04] <tehroflmaoer> hey, what would be the best way to store an enum? I feel like using a string would be pretty unnecessary

[22:39:15] <cheeser> "best" is subjective

[22:39:38] <cheeser> i prefer the string version because it makes the database contents readable.

[22:39:58] <cheeser> others don't like that because you can't rename them and so prefer to ordinal value.

[22:40:32] <cheeser> i don't like *that* because now you can't reorder them (even if only by adding a new on in the middle) and that's a far more common change than renaming

[22:40:38] <cheeser> in my experience.

[22:40:53] <cheeser> otoh, storing just the ordinal values takes much less space

[22:41:17] <tehroflmaoer> ah right yeah

[22:41:52] <tehroflmaoer> I've got a thing that I'm storing that's got at least 4 enum fields, so I think I'm kind of leaning towards storing ordinal values

[22:42:11] <tehroflmaoer> but it does make them a lot less readable...

[22:43:18] <cheeser> spend enough time debugging queries and their output and you'll wish you had those names, i'll bet.

[22:43:36] <cheeser> but then, again, spend enough time doing that and you'll have the ordinals memorized.

[22:43:44] <tehroflmaoer> lol

[22:44:14] <tehroflmaoer> hmmmm I'll think about it some more, thanks cheeser

[22:44:21] <cheeser> np

[22:49:55] <DanielKarp> joannac: hey, I was able to get that query to work indexOnly in the mongo command line, but not through the php driver.

[22:50:33] <DanielKarp> joannac: paste coming:

[22:52:33] <DanielKarp> http://pastebin.com/VSwySKmP

[22:57:41] <floatingpoint> where is the function documentation for the nodejs driver? I can't find anything obvious for .update

[23:22:48] <DanielKarp> joannac (if you see that, never mind, my mistake!)

[23:36:14] <DanielKarp> joannac: Final word: it seems as if I can't get a query to hit indexOnly through PHP when going through findOne and restricting the fields appropriately.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 26th of November, 2013