pmxbot IRC Log Viewer

[00:19:33] <Guest1828> Hi all. What are my options for at-rest encryption of my MongoDB?

[00:21:27] <cheeser> encrypt your data before writing. save it as a byte[]

[00:22:17] <zeus13i> Thanks cheeser. I was hoping to avoid that though. Any other options?

[00:25:05] <cheeser> nothing comes to mind. except something like truecrypt but that still leaves things visible running on the machine.

[00:25:19] <cheeser> but it at least encrypts things once you power off.

[00:55:44] <talus46> greetings

[01:27:42] <zeus13i> Okay, thanks cheeser.

[03:39:56] <MacWinner> f###, i just upgraded from 2.6.2 to 2.6.3.. a simple 2 node rpelica set..both nodes are stuck as secondary

[03:40:17] <MacWinner> oh wait.. fine now.. maybe

[03:40:17] <joannac> rolling upgrade?

[03:48:05] <gogolook> how could I delete massive data in mongodb without hurting server performance?

[03:48:31] <joannac> instead of remove, drop the collection or the database

[03:55:01] <gogolook> so the best way is to copy data from old collection to new collection?

[03:56:37] <joannac> how much data are we talking about?

[03:57:46] <gogolook> 22 million

[03:57:57] <joannac> how mnay documents in total?

[03:59:08] <gogolook> it's around 20 million documents

[03:59:33] <gogolook> should be deleted...

[04:00:34] <gogolook> There are 80% documents in this collection should be deleted.

[04:02:06] <joannac> move the other ones out, and drop this collection

[04:02:58] <gogolook> okay...Thanks!

[05:23:27] <zeus13i> I'm still interested in knowing how to achieve at-rest encryption without application-layer complications, in case anyone can help. Thanks

[06:35:14] <Viesti> I'm using mongodb basically as kv-storage with a 3 node replica set, writing to master and reading from secondaries

[06:36:11] <Viesti> daily I'm updating the dataset by writing to the master node (currently ~75GB worth of json documents)

[06:36:24] <maisapride786> how to implement eager loading at db level

[06:36:41] <Viesti> using mongoimport for writing takes now ~13hours

[06:37:29] <Viesti> was able to cut that into 6 hours in a development cluster which doesn't have the read activity that the production cluster has

[06:37:58] <Viesti> s/cluster/replica set/

[06:38:01] <Viesti> but haven't been able to make same progress in the production replica set :/

[06:38:13] <Viesti> suspecting that I'd need document level locking...

[06:49:11] <Repox> Hello. I have this query: db.members.find({"list_id": 57, "created_at": { "$gt": ISODate("2014-05-24T00:00:00.000Z") }, "$where": "created_at == updated_at", } ).count(); This gives me this error: 16722 ReferenceError: created_at is not defined near 't == updated_at'

[06:49:16] <Repox> How kan I fix this?

[06:50:43] <Nodex> mongodb has no concept of "this"

[06:51:29] <Nodex> ergo you will have to give it "this.created_at"

[08:18:12] <danijoo> hey, is it possible to have an index in mongodb thats unique for two fileds?

[08:18:45] <danijoo> for example if i have a name and a country field and want to reject all documents where theres already an entry with same name+country

[08:22:28] <danijoo> nvm i found it in the docs :

[08:52:20] <_boot> can mongos log all queries?

[09:56:18] <tilerendering> hi

[09:56:47] <tilerendering> what about cahing in mongodb ?

[09:56:51] <tilerendering> caching that is

[10:01:33] <remonvv> What about it?

[10:02:01] <remonvv> Meaning, what is your question? Can MongoDB be used as a caching solution? Does MongoDB cache query results?

[10:03:23] <tilerendering> remonvv: ok my question was if there existed a mongodb cache. and yes the 2nd question would have been: does mongodb cache queries. and then: can it be made a distributed cache

[10:05:37] <remonvv> tilerendering: It doesn't cache query results but due do its storage engine being mmap based it will usually have "hot" parts of collections in memory (provided there is enough of it). That would result in pretty good query performance although it's not strictly caching. If that is good enough for you then sharding your MongoDB cluster would make it effectively a distributed "cache".

[10:06:15] <remonvv> All that said, if you want caching I'd use a caching solution.

[10:07:27] <tilerendering> mh. nothing is better than caching in the same process as the db server

[10:07:49] <tilerendering> well at least for some cases

[10:08:10] <tilerendering> performance-wise

[10:08:17] <tilerendering> no interprocess communication required

[10:16:22] <Nopik> _boot: yes, http://docs.mongodb.org/manual/reference/method/db.setProfilingLevel/

[10:16:50] <_boot> that works on a mongos? cool, thanks

[10:17:12] <Nopik> not sure if it works on mongos, might be

[10:40:08] <Nomikos> In the docs I'm reading "mongoimport and mongoexport do not reliably preserve all rich BSON data types because JSON can only represent a subset of the types supported by BSON"

[10:40:20] <Nomikos> but also "JSON can only represent a subset of the types supported by BSON. To preserve type information, mongoimport accepts strict mode representation for certain ptypes."

[10:40:53] <Nomikos> and the same for mongoexport - does that mean it's safe to use these, as long as they only use eachother's data? together

[10:40:59] <Nomikos> *^W

[10:42:15] <Nomikos> reason I was wondering (and not using mongodump/restore) was if I could use mongoexport/import to copy a collection to another db using pipe '|' on the commandline

[11:14:21] <Viesti> remonvv: I'm also seeing a slowdown when writing into a replicate set master compared to writing into single node

[11:28:34] <rspijker> Viesti: what is your writeConcern?

[11:55:56] <Viesti> rspijker: hmm, seems I'm using the default that pymongo gives, which is lookep from the collection...

[11:56:11] <rspijker> ehm… what?

[11:58:03] <Viesti> http://api.mongodb.org/python/current/examples/bulk.html#write-concern

[12:03:14] <Viesti> I'll try using say, 2 (write to master and one secondary)

[12:04:18] <Viesti> still meh..

[12:04:23] <rspijker> ok, I don’t know what the writeConcern of your collections is… That’s a driver specific thing, I suppose...

[12:04:27] <rspijker> anyway, that’s an unfair comparison

[12:04:37] <rspijker> if you use writeconcern: 2, it will be slower of course

[12:04:51] <rspijker> it has to do more work, it has to replicate to another machine before it tells you it’s done

[12:05:15] <rspijker> if you use writeconcern: 1, it should theoretically do the same amount of work before telling you it;s completed

[12:05:28] <rspijker> that is, acknowledge the write on the primary (or on the single node, if not a RS)

[12:10:24] <Viesti> yep, and with w:0 I'm probably just fropping data into primary memory

[12:13:35] <Viesti> what I'm wondering is that both secondaries get ~20/s queries

[12:27:13] <Viesti> and actually there's one other process doing updates continuously at ~30 updates/s into the same database but different collection

[12:28:30] <Viesti> mongo 2.6 doesn't have collection level locking, I guess?

[12:28:41] <Viesti> but 2.8 will have document level locks, right?

[12:38:19] <Viesti> but the reads & writes during this larger import don't seem that big to me...

[12:39:34] <Viesti> and write concern gets me somewhat similar control as say in riak probably (to choose to how many nodes to write)

[12:41:53] <Viesti> well anyway, the other problem is that my testing replica set data set and read/write activity don't quite match with production and testing in production isn't too neat... :P

[13:21:41] <AlexejK> Hi.. I've got a requirement recently to ensure that we only keep x-month worth of data (meaning we want to backup and remove the data that is older than this threshhold). TTL Collections don't work for this, and It'r really some maintenance task that we could schedule via cron to e.g backup data, and then remove.. But is it a good idea to just do a delete of data that is older than specific period? This DB we are operating on will get regular

[13:21:41] <AlexejK> writes all the time and i'm assuming they will be "waiting for lock" during the delete operation..

[13:22:05] <AlexejK> any ideas/tips/things to think about when trying to achieve this?

[13:22:48] <kali> AlexejK: the long running delete will yield the lock

[13:23:08] <kali> AlexejK: so the insert can take place

[13:23:32] <kali> AlexejK: unless the combination of the deletion and the regular insertion overwhelm your system, it should work

[13:24:50] <adamcom> the most efficient thing to do is segregate date into one DB per timeframe (e.g one per month), then when you want to delete, just drop the database - granted it only works if you can deal with changing DB names over time, but it's by far the most efficient way to do it (from a locking perspective, and a disk space use perspective)

[13:25:51] <adamcom> also makes restoration from backups pretty easy, assuming you take file based backups

[13:26:07] <AlexejK> thanks.. so the splitting data per DB would be good.. but i think right now we are unable to achieve this, even though this probably is the smartes way indeed.. this could be solution v2 :-)

[13:26:36] <cheeser> one recommendation from mongoworld yesterday was similar but to use separate collections instead of DBS

[13:26:39] <cheeser> DBs

[13:26:56] <AlexejK> so as our writes to this specific DB are Unacknowledged, It should be relatively safe for our application when mongo has a long lock

[13:27:02] <adamcom> separate collections is good, but doesn't get you the space efficiency

[13:27:13] <adamcom> drop DB = files removed

[13:27:28] <adamcom> drop collection = extents freed within files (files stay the same)

[13:28:24] <adamcom> AlexejK: would also recommend adding a compact/repair cycle to your cron (run after delete) - compact will help with fragmentation and space reuse, repair will reclaim space

[13:28:49] <AlexejK> @adamcom: thanks, was about to ask about the space reuse etc

[13:29:05] <adamcom> usual caveats apply - best to run on a secondary when out of the set, then return secondary to set then repeat for each member of set until all are done

[13:29:28] <adamcom> you can also just wipe a secondary, resync from scratch and repeat rather than repari

[13:29:35] <adamcom> that will also reclaim space

[13:30:20] <AlexejK> Got it.. Will see if we can get that working (repair is a must i think) and then also see if we can do the DB split of some kind in near future

[13:30:38] <AlexejK> if we wouldve sharded by e.g time/month.. would that achieve same thing?

[13:31:01] <adamcom> note - for repair you need a lot of free disk space - it rewrites the files in a temp folder while the old ones are there

[13:31:19] <adamcom> if you resync a secondary you are down a member, but no such free space requirement

[13:31:26] <adamcom> that's the trade off

[13:31:29] <AlexejK> we didnt start sharding yet (i kow, we shouldve done it from day 1)... so considering different things to shard on.. would time be smart considering this case?

[13:32:05] <adamcom> if you shard using time, then you will generally get hot chunks and get no scaling benefits from a write perspective

[13:32:36] <adamcom> basically Google for monotonically increasing shard key or similar - plenty of detailed explanations available on that one

[13:32:47] <AlexejK> A lot of free disk space = correct to assume tht as long as we have at least the size of current data free it should be safe?

[13:32:48] <adamcom> and far easier to see in a vid than to explain via text :)

[13:33:07] <adamcom> AlexejK: to be completely safe, current free space + 2GB

[13:33:15] <adamcom> but yes, you should be fine

[13:33:42] <adamcom> err, make that current data + 2GB free

[13:33:52] <adamcom> my original sentence made no sense :)

[13:34:48] <rspijker> sharding on time might be good for your use case.. It depends really. Like adamcom said, you won;t get write scaling. But in return you’ll get good query isolation, which might be important for you, I don;t know…

[13:36:34] <AlexejK> Got it.. really didnt get a full grip on sharding yet, but was interesting if this may be smart to do based on time for this specific case.. Will read up more and see how we can solve this short-termna dn then long-term

[13:36:34] <adamcom> agree, can be suitable for particular use cases as long as you are aware of the down sides - can also be important for certain types of sharding strategies - I've seen people have a short term storage (recent data) set of shards and a long term storage (old data) set of shards. To achieve that with shard tags you really need time as a piece of the shard key

[13:37:08] <rspijker> yeah, and in your case you would have to be on top of it in your app layer

[13:37:14] <rspijker> sicne you have the ‘rotation’ factor in the mix

[13:38:11] <rspijker> basically you have to update your maxRange to rotate the oldest stuff out and route the new stuff to that server again

[13:40:11] <AlexejK> I see.. thanks :-) Sharding is on my next to-do list so will have to really consider this things and see now if i can solve this short-term and apply a smarter & better way the same time we do sharding :-)

[14:10:48] <richthegeek> hi - I'm using the aggregation framework to get the number of interactions of each type over a timeframe (like so, plus some date filtering: http://pastie.org/9326749)

[14:11:20] <richthegeek> is there any way to at the same time get the total number of interactions (ie, in this case 948) included in the results?

[14:18:04] <rspijker> richthegeek: can’t think of any way to do that in the same aggregation

[14:18:34] <richthegeek> rspijker: no worries, figured it was a long shot!

[14:18:58] <richthegeek> perhaps there's a way with a group (_id: null), project, and then second group? probably not very efficient though

[14:21:18] <theRoUS> tscanausa: by the way, i forgot to say 'thank you' for your help. thanks!

[14:21:33] <rspijker> well.. once you do the initial group, your pipeline will only have 1 document in the stream (the grouped result) and you won’t be able to ‘dis-aggregate’ it anymore to get the groups you really want

[14:22:10] <rspijker> afaik, there’s no way to get a ‘side-result’ or add a document in the pipeline

[14:22:20] <rspijker> which is kind of what you;re after

[14:39:14] <tscanausa> theRoUS: what did I do I have been at mongo world so it been like 5 days

[14:40:05] <theRoUS> tscanausa: equivalent of SELECT foo,COUNT(*) FROM bar GROUP BY foo

[14:40:29] <tscanausa> That was me hinting with the aggregations right?

[16:19:06] <og01> if i have a replica set with two mongod instances, a is primary, b is secondary, and the network connection between the two drops, b elect itself as a primary? and a will stay as a primary, what happens when the connection is re-established?

[16:19:46] <nukulb> og01: I think b stays as primary in that case but should really test it out

[16:19:57] <og01> how does the data get merged?

[16:20:05] <og01> if both b and a have new data

[16:20:42] <nukulb> how can both a and b have data? do you mean the oplog wasn't applied to b before the network connection dropped

[16:21:11] <og01> no, i mean wouldnt bot a and b think they are the primary for a while?

[16:21:16] <og01> *both

[16:21:29] <nukulb> ahhh I c what you mean

[16:21:31] <nukulb> interesting

[16:21:58] <nukulb> b will only elect itself as primary I think till it knows that its the only primary

[16:22:11] <nukulb> so I don't think there can be two primaries at the same time

[16:22:27] <og01> how will it know its the only primary?

[16:22:32] <q85> b will elect it's self if it has enough votes

[16:22:59] <q85> It doesn't care about what else might be primary.

[16:24:04] <nukulb> q85: so if the hearbeat between the two is lost, then both a and b can think they are primary?

[16:25:18] <og01> no i think he's saying they will stay the same, because there is no way to win a voting process

[16:25:31] <nukulb> og01: I c

[16:25:45] <nukulb> that makes sense

[16:25:56] <nukulb> also in line with this - http://docs.mongodb.org/manual/core/replica-set-elections/

[16:26:00] <og01> what happens when b (which is secondary) receives writes when there is no connection to the primary

[16:26:03] <q85> nukulb: They will probably both stepdown to secondary status.

[16:26:22] <q85> Neither member has enough votes to take primary status.

[16:26:38] <og01> q85: ok

[16:27:03] <og01> again what happens if a secondary with no access to a primary receives a write operation?

[16:27:35] <og01> does the operation just error?

[16:33:54] <daidoji> hello, I have a field stored as an ISODate() but I'm having trouble searching it

[16:34:21] <daidoji> would someone mind pointing me to an example?

[16:51:46] <saml> daidoji, db.docs.find({date: {$lt: ISODate("2014-01-1")}})

[17:14:59] <jkhl> new, am I doing this right? http://pastebin.com/PjRpr5MP

[17:15:28] <jkhl> (or should all the node server code be within the mongo.Db.Connect() { } body ?

[17:31:29] <cozby> hi, I'm running into a weird problem. I launched a mongodb instance I had with data, I wiped out all the data by removing the files in /data /journal /log

[17:31:44] <cozby> In my config I'm starting a replica set

[17:32:08] <cozby> when I connect using mongo and then switch to use admin

[17:32:25] <cozby> when I try and create a user it says "couldn't add user, not master"

[17:32:47] <cozby> do I need to initialize the rs before adding users to admin?

[17:45:49] <mattt_> Is it possilbe to create a compound index on a set of sub-document keys and properties of the sub-documents? For example, given the document { ‘stuff’: {‘foo’: {‘id’: 123}, ‘bar’: {‘id’:456}, ‘baz’: {‘id’:789} } } is it possible to create a compound index on the keys of ‘stuff’ and the key ‘id’ of each sub-doc? What I’m struggling with is how to refer to foo, bar, baz.

[17:50:02] <tscanausa> generally if you have a replica set you do

[17:53:13] <q85> og01: you cannot perform write operations on a secondary. you'd likely receive an error of some kind.

[18:43:16] <alexgorbatchev> curious if anyone had issues recently getting mongodb to work in vagrant?

[22:59:48] <schimmy> So I've got an interesting mystery

[22:59:58] <schimmy> I am making a fairly large query (~4.3 MB) to mongo 2.6.1, and am receiving the error:

[23:00:05] <schimmy> "BSONObj size: 17630996 (0x10D0714) is invalid. Size must be between 0 and 16793600(16MB)"

[23:00:09] <schimmy> This indicates that either the query is over the 16MB limit, or the response is

[23:00:15] <schimmy> However I verified that the request is ~4MB over the wire with wireshark, and

[23:00:20] <schimmy> I have tried changing the ids to not match anything in the database, so that no data is returned, still get the error

[23:00:24] <schimmy> the query is along the lines of:

[23:00:37] <schimmy> db.cars.find({'owner': {'$in': ['aaaaaaaaaaaaaaaaaaaaaa', 'bbbbbbbbbbbbbbbbbbbbb' ... x 200k

[23:00:42] <schimmy> I have also tried against 2.4.10, and that works- no error is thrown.

[23:00:46] <schimmy> It seems as though my 4MB request somehow gets blown up into 17MB, but that sounds ridiculous.

[23:01:13] <schimmy> What has changed between 2.4.10 and 2.6.1 that would throw this error, and how is this error possible?

[23:01:32] <schimmy> sorry, that was a wall of text, but it is an interesting problem!

[23:05:23] <schimmy> oh - one correction, the find also has a limit(1) for the purposes of debugging, still get the error

[23:05:52] <schimmy> but that should not matter as even when returning nothing I get the error

[23:06:09] <cheeser> sounds like probably your query is too large

[23:06:44] <schimmy> but queries can be up to 16 MB, unless I read wrong

[23:08:12] <schimmy> http://docs.mongodb.org/manual/reference/limits/

[23:08:25] <wfaulk> today I just happened to notice looking at logs that one of our config servers was corrupt and thus the config set was out of sync

[23:08:41] <wfaulk> is there any way to test for sync status?

[23:17:58] <joannac> erm

[23:18:12] <joannac> the mongos will complain like crazy if your config servers are out of sync

[23:18:48] <wfaulk> it had been that way for a week and a half and we hadn't noticed

[23:21:21] <joannac> did the mongos complain?

[23:22:09] <wfaulk> not apparently

[23:23:22] <wfaulk> wait. which one is "s"?

[23:24:44] <wfaulk> okay, that's the client-side server

[23:25:19] <wfaulk> "cluster .... pinged successfully" in the mongos log

[23:25:54] <wfaulk> at the same time that the mongoc on one of those systems was producing:

[23:26:15] <wfaulk> "local.oplog.$main Assertion failure !loc.isNull() src/mongo/db/pdfile.cpp 1816"

[23:26:21] <wfaulk> and a stack trace and

[23:27:05] <wfaulk> "update config.lockpings query ... exception: assertion src/mongo/db/pdfile.cpp:1816 locks(micros) w:6203 3ms"

[23:28:07] <joannac> it produced a stacktrace but didn't go down?

[23:29:14] <wfaulk> hang on; let me pastebin this

[23:31:21] <wfaulk> http://pastebin.com/3hy7wesx

[23:32:24] <wfaulk> not sure about where the error begins and ends, so that's several of them

[23:32:45] <joannac> that looks like disk corruption

[23:34:53] <wfaulk> got 1253240 of those over 12d1h31m

[23:35:42] <wfaulk> it's on a VM; probably not disk corruption

[23:36:00] <wfaulk> nothing in system logs to indicate that either

[23:37:28] <wfaulk> was rebooted earlier that morning for a maintenance window

[23:37:51] <wfaulk> only thing that happened within an hour of it starting was an 8s jump in time as NTP was fixed

[23:38:18] <wfaulk> but that was 30m earlier

[23:42:31] <wfaulk> ah, the very first one says:

[23:42:35] <wfaulk> "couldn't make room for new record (len: 336) in capped ns local.oplog.$main"

[23:45:46] <joannac> what

[23:46:47] <wfaulk> i take it that makes no sense?

[23:47:45] <joannac> looks exactly like this, tbh SERVER-12413

[23:47:52] <joannac> which was SAN / storage problems

[23:50:13] <wfaulk> says it might take a while to show up

[23:50:19] <wfaulk> lemme check earlier logs

[23:54:47] <wfaulk> looks like the system was not shut down properly. fsck not forced, but possibly should have been

[23:54:59] <wfaulk> could potentially be some corruption

[23:55:26] <wfaulk> still, it'd be nice if it actually *did* something

[23:56:24] <wfaulk> looks like it did in the case of the SERVER-12413 reporter

[23:56:56] <wfaulk> "update not consistent"

Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 26th of June, 2014