[09:55:14] <kas84> aha, what does a compact, then?
[09:55:17] <jordana> kas84, it will compact your data but your file size may stay the same.
[09:56:24] <jordana> i.e internally in the file it compacts any padding that is not needed allowing you to store more in the file
[09:57:14] <jordana> Think of those files like a book case of a certain size, it may have books in it it may not but it stays the same size when you take and put books in and out
[09:57:25] <jordana> until you have too many books for the book case.
[11:26:38] <jordana> Anyone a fan of Rage Against the Machine?
[11:26:48] <jordana> compact will Take The Power Back!
[11:29:55] <tscanausa> I am in a situation in which my inserts will out grow the maximum rate I can delete objects. Is there a something I can do?
[12:24:50] <ayprograms> i am using mongodb GridFS to store files greater than 16mb but i wish to determine the file types first before storing in java can anyone help me?
[12:25:22] <ayprograms> if a file is an image, text, audio, or video
[13:35:52] <davipt> hello. Is there a easy and performant way to insert documents into mongo, using java, when I already have the json documents as strings? The oficial Json to Bson/Document converter is awfully slow (more than 20 times slower than a full Jsondecode+jsonencode via json-smart)
[14:38:16] <davipt> I assume the ideal scenario would be to use the bulk method and to try to pass the json messages as-is without any BSON convertion. My struggle is that I can either use the bulk and bson, but the bson is awfully slow, or the original json messages and the single insert is awfully slow. I’m about to try a System.exec() and the command line tool
[14:47:44] <saml> can I only do updates on primary?"
[14:48:05] <saml> RROR (<class 'pymongo.errors.OperationFailure'>, OperationFailure(u'Not primary while updating www.docs')
[15:00:43] <cheeser> saml: yes. all writes must go to a primary.
[15:02:24] <saml> i think pymongo.MongoClient can't reconnect when mongod is down
[15:16:52] <avaq> Hi. How do I maintain atomicity when I have entries:tags (1:N), and I want every tag to keep track of the amount of entries it is used in?
[15:17:08] <avaq> The problem here is the cross-reference the two collections now have. So they can only be updated with two queries. Any thoughts?
[15:17:57] <avaq> I'm thinking that once I feel the need for transactions in MongoDB I must be doing something wrong.
[15:35:34] <tscanausa> avaq, I general find that is a bad way to go.
[15:35:59] <tscanausa> I would create an index and when you need the count query for it
[15:36:49] <avaq> That's what I have been doing, but it made requests last an unacceptably long time.
[15:37:08] <avaq> So I've only encountered this problem now that I'm optimizing my app.
[15:40:38] <avaq> Adding an index gave me a rough 50% performance boost. Pre-calculating a 97% performance boost.
[15:42:32] <cheeser> to be honest, i'd just tag an entry then have a scheduled job to recalc the usage counts. you'd get a slight delay in updates but unless you're running a stock trading site i doubt anyone would notice or care.
[16:02:22] <avaq> cheeser: Good point. It's not the perfect solution but it got me thinking and brainstorming. I might keep track of which tags are "dirty" and only recount them when I need them clean or something along those lines.
[16:18:41] <culthero> Anyone got any idea of picking a shard key? I am sharding some mongos for tweets. I'd like to distribute the tweets evenly across all replicasets, tweets expire, and there is a fulltext index on the tweets. Would the ID_STR + hash of the tweet be a good choice of shard key?
[16:30:54] <culthero> jeeze you need 9 vps's to shard
[17:21:29] <ianblenke> I just did an upgrade from 1.8 -> 2.0 -> 2.2 -> 2.4 -> 2.6... all of the steps up to 2.4 seemd to be happy and painless, but updating to 2.6 started spewing collection index warnings on _id:
[17:21:34] <ianblenke> "WARNING: the collection 'xxx.yyy.mr.mapreduce_1310341420_7489_inc' lacks a unique index on _id. This index is needed for replication to function properly"
[17:21:48] <ianblenke> the problem is that we have many of those mapreduce collections.
[17:22:27] <ianblenke> (this is NOT a production cluster, we were testing the upgrade process, so there has been no data loss yet)
[17:23:24] <jordana> ianblenke, I may be wrong, but I believe _id indexes were not added by default in prior versions of mongo to 2.4/6
[17:50:05] <ianblenke> yeah. it looks like I can remove these.
[17:51:44] <ianblenke> I wonder if a db.tmp.mr.drop() would break mapreduce
[17:56:22] <ianblenke> http://stackoverflow.com/questions/4163157/mongodb-remove-mapreduce-collection looks like an answer to my quandry.
[18:02:28] <cheeser> are those the collections where you're not seeing _id indexes? or some other collection?
[18:03:08] <ianblenke> I see _id indexes on _most_ of the mapreduce collections, but there are quite a number of them that don't have indexes for whatever reason
[18:04:14] <ianblenke> there were so many of those temporary collections that this drop script did take a while to run... even on an m3.2xlarge with 4000 IOPs.
[18:05:56] <ianblenke> such collections. many index.
[18:44:45] <jekle> trying to install mongodb on ubuntu via the 10gen ppa. I can start manually a daemon process successfully but the init script (i think upstart) doesn´t seem to work: I get "initctl: Unknown job: mongod". google gives only one hit :| where some guy hits the same problem.
[18:47:46] <jekle> when I invoke the init start script this is written to syslog: mongod main process (4318) terminated with status 100
[18:48:05] <jekle> status 100 seems a uncaught error, google says
[18:48:21] <jekle> How could I debug this problem further down?
[19:34:45] <culthero> hm, maybe I should rephrase my question.. based on my desire to fan out on read, would a hashed shard key of user_id + timestamp be beneficial, if searching by full text field ?
[20:23:19] <quickdry21> Hey there... so I fubared one of my shards. All the members went down. I've recovered all the data from a backup using mongorestore, but when starting them back up, they all say "initial sync need a member to be primary or secondary to do our initial sync." It looks like none of them are able to recognize the data I've restored via mongorestore
[20:46:44] <skot> mongorestore only restores the user data, not the replica set config, unless you did something special.
[20:47:00] <skot> (or really I should say, mongodump only backups up user data)
[20:48:08] <skot> quickdry21: Also, a single shard backup isn't nec. useful since data moves between shards normally and an old backup won't have the data that is currently on shard when it went down.
[20:48:49] <skot> in general you need to restore the whole sharded system when you lose a shard with sharded collections. Do you have sharded collections?
[20:50:01] <skot> so, unless you have tuned off the balancer or no balancing has been done since your backup you may not be able to get back to a good system.
[20:50:16] <skot> (by just restoring the damaged shard)
[20:51:07] <quickdry21> so is there nothing i can do if i have a snapshot of the data directory when the shard went down?
[20:51:42] <skot> Well, a snapshot of the filesystem and a mongodump are very different things. Which do you have?
[20:52:15] <quickdry21> the actualy data directory of one of the primaries
[20:52:42] <skot> Are you running with journaling on (the default config)
[20:53:05] <skot> Can you describe the events that happened? Why did the node go down?
[20:53:38] <skot> and, what happened to the rest of the nodes in the replica set?
[20:54:57] <quickdry21> we were resyncing members as part of maintenance (to release disk space back to the os), was doing 2 members at a time, when the two remaining members that were seeding the resync crashed. the resync had just completed on one of the nodes (the primary), but they couldn't elect a primary
[20:55:55] <quickdry21> this happened this morning - there hasn't been much db activity, as our whole app is down
[20:56:12] <skot> oh, if you have a good copy of data then just copy it over to the other nodes and restart them.
[20:58:10] <skot> in general it might be a better approach for you to do a resync on one node, then copy the data files to the other nodes; file system copies are generally faster than a resync, but do require a little more work.
[21:00:16] <quickdry21> i tried something similiar to this earlier, but each node was stuck in startup2 state
[21:00:29] <quickdry21> so unable to vote in a primary
[21:08:05] <jdj_dk> has anybody tried implementing client-side image resizing before uploading?
[21:36:08] <dgarstang> can someone tell me why I get an error "SyntaxError: Unexpected identifier at /var/lib/mongo/setup.js:5"? http://pastebin.com/EBn3G37A
[22:35:48] <tipsford> I shared a collection that already has loads of documents, and it seems to be migrating the chunks very very slowly, especially compared to what I've seen in the same cluster when we drain a shard. When I check currentOp(), the migrations are almost always on "step 5/6", which is just updating the metadata in the config db.
[22:36:12] <tipsford> That seems weird to me. I looked in MMS and the lock % on config is not high at all, so I'm not sure what's going on there
[22:36:58] <tipsford> Anyhow, if anyone has any insight on what might be so slow about that, I'd be grateful
[22:43:50] <babykosh> I need some wisdom…what is the best way to import a csv file with one collection that was created via mongoexport into a postgres database table?
[23:02:10] <tipsford> @babykosh: pgadmin has a CSV import feature that may be what you need
[23:02:47] <tipsford> It's in the "tools" menu when you have the target table selected. I have not tried it, but it does support CSVs
[23:17:36] <motaka2> how to remove mongodb lock permanentely
[23:29:41] <babykosh> @tipsford yeah I’ve tried it and it is a bit problematic…I was wondering if there might be another way…even from cli