[05:23:27] <zeus13i> I'm still interested in knowing how to achieve at-rest encryption without application-layer complications, in case anyone can help. Thanks
[06:35:14] <Viesti> I'm using mongodb basically as kv-storage with a 3 node replica set, writing to master and reading from secondaries
[06:36:11] <Viesti> daily I'm updating the dataset by writing to the master node (currently ~75GB worth of json documents)
[06:36:24] <maisapride786> how to implement eager loading at db level
[06:36:41] <Viesti> using mongoimport for writing takes now ~13hours
[06:37:29] <Viesti> was able to cut that into 6 hours in a development cluster which doesn't have the read activity that the production cluster has
[06:38:01] <Viesti> but haven't been able to make same progress in the production replica set :/
[06:38:13] <Viesti> suspecting that I'd need document level locking...
[06:49:11] <Repox> Hello. I have this query: db.members.find({"list_id": 57, "created_at": { "$gt": ISODate("2014-05-24T00:00:00.000Z") }, "$where": "created_at == updated_at", } ).count(); This gives me this error: 16722 ReferenceError: created_at is not defined near 't == updated_at'
[06:50:43] <Nodex> mongodb has no concept of "this"
[06:51:29] <Nodex> ergo you will have to give it "this.created_at"
[08:18:12] <danijoo> hey, is it possible to have an index in mongodb thats unique for two fileds?
[08:18:45] <danijoo> for example if i have a name and a country field and want to reject all documents where theres already an entry with same name+country
[10:02:01] <remonvv> Meaning, what is your question? Can MongoDB be used as a caching solution? Does MongoDB cache query results?
[10:03:23] <tilerendering> remonvv: ok my question was if there existed a mongodb cache. and yes the 2nd question would have been: does mongodb cache queries. and then: can it be made a distributed cache
[10:05:37] <remonvv> tilerendering: It doesn't cache query results but due do its storage engine being mmap based it will usually have "hot" parts of collections in memory (provided there is enough of it). That would result in pretty good query performance although it's not strictly caching. If that is good enough for you then sharding your MongoDB cluster would make it effectively a distributed "cache".
[10:06:15] <remonvv> All that said, if you want caching I'd use a caching solution.
[10:07:27] <tilerendering> mh. nothing is better than caching in the same process as the db server
[10:07:49] <tilerendering> well at least for some cases
[10:16:50] <_boot> that works on a mongos? cool, thanks
[10:17:12] <Nopik> not sure if it works on mongos, might be
[10:40:08] <Nomikos> In the docs I'm reading "mongoimport and mongoexport do not reliably preserve all rich BSON data types because JSON can only represent a subset of the types supported by BSON"
[10:40:20] <Nomikos> but also "JSON can only represent a subset of the types supported by BSON. To preserve type information, mongoimport accepts strict mode representation for certain ptypes."
[10:40:53] <Nomikos> and the same for mongoexport - does that mean it's safe to use these, as long as they only use eachother's data? together
[10:42:15] <Nomikos> reason I was wondering (and not using mongodump/restore) was if I could use mongoexport/import to copy a collection to another db using pipe '|' on the commandline
[11:14:21] <Viesti> remonvv: I'm also seeing a slowdown when writing into a replicate set master compared to writing into single node
[11:28:34] <rspijker> Viesti: what is your writeConcern?
[11:55:56] <Viesti> rspijker: hmm, seems I'm using the default that pymongo gives, which is lookep from the collection...
[12:04:23] <rspijker> ok, I don’t know what the writeConcern of your collections is… That’s a driver specific thing, I suppose...
[12:04:27] <rspijker> anyway, that’s an unfair comparison
[12:04:37] <rspijker> if you use writeconcern: 2, it will be slower of course
[12:04:51] <rspijker> it has to do more work, it has to replicate to another machine before it tells you it’s done
[12:05:15] <rspijker> if you use writeconcern: 1, it should theoretically do the same amount of work before telling you it;s completed
[12:05:28] <rspijker> that is, acknowledge the write on the primary (or on the single node, if not a RS)
[12:10:24] <Viesti> yep, and with w:0 I'm probably just fropping data into primary memory
[12:13:35] <Viesti> what I'm wondering is that both secondaries get ~20/s queries
[12:27:13] <Viesti> and actually there's one other process doing updates continuously at ~30 updates/s into the same database but different collection
[12:28:30] <Viesti> mongo 2.6 doesn't have collection level locking, I guess?
[12:28:41] <Viesti> but 2.8 will have document level locks, right?
[12:38:19] <Viesti> but the reads & writes during this larger import don't seem that big to me...
[12:39:34] <Viesti> and write concern gets me somewhat similar control as say in riak probably (to choose to how many nodes to write)
[12:41:53] <Viesti> well anyway, the other problem is that my testing replica set data set and read/write activity don't quite match with production and testing in production isn't too neat... :P
[13:21:41] <AlexejK> Hi.. I've got a requirement recently to ensure that we only keep x-month worth of data (meaning we want to backup and remove the data that is older than this threshhold). TTL Collections don't work for this, and It'r really some maintenance task that we could schedule via cron to e.g backup data, and then remove.. But is it a good idea to just do a delete of data that is older than specific period? This DB we are operating on will get regular
[13:21:41] <AlexejK> writes all the time and i'm assuming they will be "waiting for lock" during the delete operation..
[13:22:05] <AlexejK> any ideas/tips/things to think about when trying to achieve this?
[13:22:48] <kali> AlexejK: the long running delete will yield the lock
[13:23:08] <kali> AlexejK: so the insert can take place
[13:23:32] <kali> AlexejK: unless the combination of the deletion and the regular insertion overwhelm your system, it should work
[13:24:50] <adamcom> the most efficient thing to do is segregate date into one DB per timeframe (e.g one per month), then when you want to delete, just drop the database - granted it only works if you can deal with changing DB names over time, but it's by far the most efficient way to do it (from a locking perspective, and a disk space use perspective)
[13:25:51] <adamcom> also makes restoration from backups pretty easy, assuming you take file based backups
[13:26:07] <AlexejK> thanks.. so the splitting data per DB would be good.. but i think right now we are unable to achieve this, even though this probably is the smartes way indeed.. this could be solution v2 :-)
[13:26:36] <cheeser> one recommendation from mongoworld yesterday was similar but to use separate collections instead of DBS
[13:26:56] <AlexejK> so as our writes to this specific DB are Unacknowledged, It should be relatively safe for our application when mongo has a long lock
[13:27:02] <adamcom> separate collections is good, but doesn't get you the space efficiency
[13:27:28] <adamcom> drop collection = extents freed within files (files stay the same)
[13:28:24] <adamcom> AlexejK: would also recommend adding a compact/repair cycle to your cron (run after delete) - compact will help with fragmentation and space reuse, repair will reclaim space
[13:28:49] <AlexejK> @adamcom: thanks, was about to ask about the space reuse etc
[13:29:05] <adamcom> usual caveats apply - best to run on a secondary when out of the set, then return secondary to set then repeat for each member of set until all are done
[13:29:28] <adamcom> you can also just wipe a secondary, resync from scratch and repeat rather than repari
[13:30:20] <AlexejK> Got it.. Will see if we can get that working (repair is a must i think) and then also see if we can do the DB split of some kind in near future
[13:30:38] <AlexejK> if we wouldve sharded by e.g time/month.. would that achieve same thing?
[13:31:01] <adamcom> note - for repair you need a lot of free disk space - it rewrites the files in a temp folder while the old ones are there
[13:31:19] <adamcom> if you resync a secondary you are down a member, but no such free space requirement
[13:31:29] <AlexejK> we didnt start sharding yet (i kow, we shouldve done it from day 1)... so considering different things to shard on.. would time be smart considering this case?
[13:32:05] <adamcom> if you shard using time, then you will generally get hot chunks and get no scaling benefits from a write perspective
[13:32:36] <adamcom> basically Google for monotonically increasing shard key or similar - plenty of detailed explanations available on that one
[13:32:47] <AlexejK> A lot of free disk space = correct to assume tht as long as we have at least the size of current data free it should be safe?
[13:32:48] <adamcom> and far easier to see in a vid than to explain via text :)
[13:33:07] <adamcom> AlexejK: to be completely safe, current free space + 2GB
[13:33:42] <adamcom> err, make that current data + 2GB free
[13:33:52] <adamcom> my original sentence made no sense :)
[13:34:48] <rspijker> sharding on time might be good for your use case.. It depends really. Like adamcom said, you won;t get write scaling. But in return you’ll get good query isolation, which might be important for you, I don;t know…
[13:36:34] <AlexejK> Got it.. really didnt get a full grip on sharding yet, but was interesting if this may be smart to do based on time for this specific case.. Will read up more and see how we can solve this short-termna dn then long-term
[13:36:34] <adamcom> agree, can be suitable for particular use cases as long as you are aware of the down sides - can also be important for certain types of sharding strategies - I've seen people have a short term storage (recent data) set of shards and a long term storage (old data) set of shards. To achieve that with shard tags you really need time as a piece of the shard key
[13:37:08] <rspijker> yeah, and in your case you would have to be on top of it in your app layer
[13:37:14] <rspijker> sicne you have the ‘rotation’ factor in the mix
[13:38:11] <rspijker> basically you have to update your maxRange to rotate the oldest stuff out and route the new stuff to that server again
[13:40:11] <AlexejK> I see.. thanks :-) Sharding is on my next to-do list so will have to really consider this things and see now if i can solve this short-term and apply a smarter & better way the same time we do sharding :-)
[14:10:48] <richthegeek> hi - I'm using the aggregation framework to get the number of interactions of each type over a timeframe (like so, plus some date filtering: http://pastie.org/9326749)
[14:11:20] <richthegeek> is there any way to at the same time get the total number of interactions (ie, in this case 948) included in the results?
[14:18:04] <rspijker> richthegeek: can’t think of any way to do that in the same aggregation
[14:18:34] <richthegeek> rspijker: no worries, figured it was a long shot!
[14:18:58] <richthegeek> perhaps there's a way with a group (_id: null), project, and then second group? probably not very efficient though
[14:21:18] <theRoUS> tscanausa: by the way, i forgot to say 'thank you' for your help. thanks!
[14:21:33] <rspijker> well.. once you do the initial group, your pipeline will only have 1 document in the stream (the grouped result) and you won’t be able to ‘dis-aggregate’ it anymore to get the groups you really want
[14:22:10] <rspijker> afaik, there’s no way to get a ‘side-result’ or add a document in the pipeline
[14:22:20] <rspijker> which is kind of what you;re after
[14:39:14] <tscanausa> theRoUS: what did I do I have been at mongo world so it been like 5 days
[14:40:05] <theRoUS> tscanausa: equivalent of SELECT foo,COUNT(*) FROM bar GROUP BY foo
[14:40:29] <tscanausa> That was me hinting with the aggregations right?
[16:19:06] <og01> if i have a replica set with two mongod instances, a is primary, b is secondary, and the network connection between the two drops, b elect itself as a primary? and a will stay as a primary, what happens when the connection is re-established?
[16:19:46] <nukulb> og01: I think b stays as primary in that case but should really test it out
[17:14:59] <jkhl> new, am I doing this right? http://pastebin.com/PjRpr5MP
[17:15:28] <jkhl> (or should all the node server code be within the mongo.Db.Connect() { } body ?
[17:31:29] <cozby> hi, I'm running into a weird problem. I launched a mongodb instance I had with data, I wiped out all the data by removing the files in /data /journal /log
[17:31:44] <cozby> In my config I'm starting a replica set
[17:32:08] <cozby> when I connect using mongo and then switch to use admin
[17:32:25] <cozby> when I try and create a user it says "couldn't add user, not master"
[17:32:47] <cozby> do I need to initialize the rs before adding users to admin?
[17:45:49] <mattt_> Is it possilbe to create a compound index on a set of sub-document keys and properties of the sub-documents? For example, given the document { ‘stuff’: {‘foo’: {‘id’: 123}, ‘bar’: {‘id’:456}, ‘baz’: {‘id’:789} } } is it possible to create a compound index on the keys of ‘stuff’ and the key ‘id’ of each sub-doc? What I’m struggling with is how to refer to foo, bar, baz.
[17:50:02] <tscanausa> generally if you have a replica set you do
[17:53:13] <q85> og01: you cannot perform write operations on a secondary. you'd likely receive an error of some kind.
[18:43:16] <alexgorbatchev> curious if anyone had issues recently getting mongodb to work in vagrant?
[22:59:48] <schimmy> So I've got an interesting mystery
[22:59:58] <schimmy> I am making a fairly large query (~4.3 MB) to mongo 2.6.1, and am receiving the error:
[23:00:05] <schimmy> "BSONObj size: 17630996 (0x10D0714) is invalid. Size must be between 0 and 16793600(16MB)"
[23:00:09] <schimmy> This indicates that either the query is over the 16MB limit, or the response is
[23:00:15] <schimmy> However I verified that the request is ~4MB over the wire with wireshark, and
[23:00:20] <schimmy> I have tried changing the ids to not match anything in the database, so that no data is returned, still get the error
[23:00:24] <schimmy> the query is along the lines of:
[23:00:37] <schimmy> db.cars.find({'owner': {'$in': ['aaaaaaaaaaaaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbb' ... x 200k
[23:00:42] <schimmy> I have also tried against 2.4.10, and that works- no error is thrown.
[23:00:46] <schimmy> It seems as though my 4MB request somehow gets blown up into 17MB, but that sounds ridiculous.
[23:01:13] <schimmy> What has changed between 2.4.10 and 2.6.1 that would throw this error, and how is this error possible?
[23:01:32] <schimmy> sorry, that was a wall of text, but it is an interesting problem!
[23:05:23] <schimmy> oh - one correction, the find also has a limit(1) for the purposes of debugging, still get the error
[23:05:52] <schimmy> but that should not matter as even when returning nothing I get the error
[23:06:09] <cheeser> sounds like probably your query is too large
[23:06:44] <schimmy> but queries can be up to 16 MB, unless I read wrong