[00:35:46] <modulus^> will document level locking increase performance in my case?
[00:42:15] <Boomtime> modulus^: it's hard to say, you have a 28 second single write - it doesn't matter how parallel the database is if it can't write to disk
[00:43:13] <Boomtime> you say you're only getting 40% lock, but that just means that we haven't observed the problem - a 28 second write that hits only a single document is ridiculous
[00:45:15] <modulus^> the higher the queued reads, the higher the active writes count
[00:45:48] <modulus^> maybe it's better to have more smaller sized drives in the raid1
[00:47:16] <Boomtime> modulus^: capture the output from mongostat for a minute or so, and pastbin/gist it
[00:47:28] <Boomtime> needs to be at least a minute actually
[01:01:25] <Boomtime> i'm a little surprised it is as bad as we're seeing.. i suspect it all comes down to seek time
[01:01:53] <Boomtime> the trouble with seek time is that it doesn't improve no matter how many disks you add
[01:02:19] <Boomtime> (well, unless you add so many disks that their individual onboard caches collectively store everything)
[01:03:10] <modulus^> i guess 7.2k is crapping out on us
[01:03:56] <Boomtime> it may also be partly the controller, a RAID controller can, at best, make the seek time for read/write unchanged - or, it can make it worse
[01:04:20] <modulus^> let me see what raid controller...
[01:23:04] <modulus^> Boomtime: would 14k spindle drives improve io alot?
[01:26:21] <Boomtime> a disk IO testing tool - one that will read/write some random data, in varying sized blocks, in varying places, and give a performance report
[01:28:33] <Boomtime> that means little unfortunately
[01:29:55] <Boomtime> raid controllers are highly configurable (does yours even have the same bios revision as his?) and that is without considering the disks, the OS, other software or motherboard that backs it all
[01:30:14] <Boomtime> unless you have done performance tests yourself, on your own hardware, you have no idea what your performance actually is
[01:54:47] <HMill> I'm assuming no one in here is using Meteor?
[01:55:00] <HMill> modulus^: Are you devops ninja?
[01:55:17] <modulus^> HMill: no just a regular ninja
[01:55:51] <HMill> modulus^: that's cool. i'm down with regular ninjas
[01:58:39] <modulus^> real ninjas look like harmless fuzzy bunny rabbits
[01:58:50] <modulus^> until chuck norris pisses one off
[03:57:45] <kataracha> if querying on _id should there be any difference between the time it would take to query a very large collection compared to a small one or are they both constant lookup regardless?
[04:10:28] <GothAlice> kataracha: Indexes are generally stored as b-trees. This gives them predictable performance, ref: http://stackoverflow.com/questions/4694574/database-indexes-and-their-big-o-notation
[04:19:20] <bmillham> Hi all. I have some questions about the best way to keep a remote MongoDB in sync with a local copy.
[04:19:40] <bmillham> Some background, I currently have a MySQL database (local and remote)
[04:20:10] <bmillham> New records are added locally, and I run a script to update the remote.
[04:20:58] <bmillham> It's a site for listeners to my internet radio show to make requests on.
[04:21:39] <Boomtime> be sure to make the priority zero for the remote secondary so it is never a valid option for primary
[04:21:43] <bmillham> So when I'm DJing, they make requests at the site, which I'd like to have updated locally
[04:21:51] <GothAlice> bmillham: At work we use a pair of replica secondaries, one in the office, one in my apartment. The one in my apartment is purely for backup purposes (it's actually delayed 24 hours to assist with data recovery in the event of deletion) and the one in the office gets queried.
[04:22:20] <bmillham> And my local system is NOT connectible from the outside world
[04:28:00] <GothAlice> bmillham: Just make sure your "oplog" size is sufficient to cover the period of time between "syncs" (plus a fair bit of head room for safety), and your in-office secondary can be spun up when you need it.
[04:28:13] <bmillham> With a SSH tunnel, I can't open any ports. That's just how HughesNet works.
[04:28:24] <GothAlice> Then SSH can't be used in your case.
[04:28:36] <GothAlice> One might suggest finding a better host, since… that's kinda basic.
[04:28:59] <bmillham> Opps, I miss-stated that. An SSH tunnel would work. I ment without a tunnel.
[04:29:09] <bmillham> And the only other host here is dialup
[04:32:05] <bmillham> As you have have guessed, I live out in the country. But it's sad that I live 60 miles from Washington DC, and there is no high speed internet available here other than satellite.
[04:34:30] <bmillham> Unless I want to pay an insane amount for a DS1/3
[04:34:58] <GothAlice> Ah, not quite what I meant.
[04:35:04] <bmillham> (And I not sure if even those are available from the local central office)
[04:41:28] <bmillham> OK, looking at replica-sets, it looks like that won't work for what I'm hoping for. The remote replica can't accept updates. And I need that.
[04:43:44] <bmillham> (Updates from the app running on the server that the replica is located on)
[07:21:44] <linocisco> what programming language is best to work with mongodb?
[07:21:44] <borjagvo> Hi. It seems I found a bug: http://stackoverflow.com/questions/27381041/text-search-not-working?noredirect=1#comment43219076_27381041. The $search doesn't work with the word "mesías" for example. Both words are not stop words.
[07:22:26] <borjagvo> Interestingly, I didn't see none of these two words in the spanish stop words list: https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/stop_words_spanish.txt
[07:25:22] <linocisco> logic, is it complete programming language? Node.js is one form of javascript. It is confusing
[07:25:54] <logic> borjagvo, create a text index in a multi language
[07:26:36] <borjagvo> logic: I though I just did that: http://stackoverflow.com/questions/27381041/text-search-not-working?noredirect=1#comment43219076_27381041
[07:27:53] <logic> linocisco, learning more about fullstack will make it clearer to you.
[07:28:30] <linocisco> logic, fullstack is the name of programming languages? i have never heard
[09:58:30] <linocisco> kali, why? not stable or not safe?
[10:03:06] <kali> linocisco: mongodb atomicity is limited to one single document update. so typical banking transaction scenario get difficult to implement correctly
[10:15:21] <linocisco> kali, what is one single document update? is it only ok for one transaction at a time?
[10:28:37] <kali> linocisco: how deep is your understanding of mongodb so far ?
[12:41:07] <borjagvo> Hi. It seems I found a bug: http://stackoverflow.com/questions/27381041/text-search-not-working?noredirect=1#comment43219076_27381041. The $search doesn't work with the word "mesías" for example. Both words are not stop words.
[12:41:23] <borjagvo> Interestingly, I didn't see none of these two words in the spanish stop words list: https://github.com/mongodb/mongo/blob/master/src/mongo/db/fts/stop_words_spanish.txt
[12:57:35] <borjagvo> Any help please from any person involved in the project? I posted some more details on the comments: http://stackoverflow.com/questions/27381041/text-search-not-working?noredirect=1#comment43219076_27381041
[13:36:58] <rioch> I have a document like so: {'field1' : 'value', 'field2': [ {'key': 'a', 'name': 'a'}, {'key': 'b', 'name': 'b'}, {'key': 'c', 'name': 'c'}]}. How can I update it so that all key fields are set to null?
[15:10:43] <Constg> Good afternoon, I have a question: How would you update ($increase) a value in an object which is nested in a array?
[15:13:12] <Constg> haaaa $ operator! I forgot this one :(
[17:19:51] <harttho> Anyone have any tips for finding slow queries?
[17:20:11] <harttho> Specifically, we have some slow queries run and the performance of un-related ones are affected too
[17:20:22] <harttho> So the logs/profiling show them all as slow
[17:26:15] <Constg> harttho, db.currentOp() will list queries
[17:26:29] <Constg> And you have a params to see for how long they're running
[17:36:11] <rchickenman> I am running a very simple query that returns 20 or so simple documents with no filters, and it's taking 30 seconds in Node.JS, but only a few milliseconds when I do it from the Mongodb shell.
[17:36:28] <rchickenman> Can anyone please help explain this behavior?
[17:41:39] <GothAlice> rchickenman: Is that MongoDB shell running on the same host as the server itself? Also, what are you doing with that data, and is it overly large? (I.e. is each document multi-megabyte?)
[17:41:54] <GothAlice> Network transfer is a likely culprit.
[17:42:23] <rchickenman> No, each document has roughly five simple string fields.
[17:42:32] <rchickenman> And the mongodb shell is running on the same server as the node.js code.
[17:46:56] <rchickenman> I tried using the "lean" feature because my understanding is that it removes whatever overhead mongoose usually imposes.
[17:50:45] <GothAlice> rchickenman: Hmm. If it's working from the shell, but freaking out in Mongoose, sure sounds like a Mongoose issue. Have you asked in #mongoosejs? (They'd be more likely to be able to assist… I don't JS, so my help here is limited.)
[17:56:22] <rchickenman> Okay, before I go there I might take some time to try it with the vanilla node.js mongodb-native driver to isolate it as a node or a mongoose issue.
[18:09:45] <GothAlice> Welp, I now have a complete caching decorator library (both general memoize and Document-aware method decorator) for Python built on MongoDB ready for extraction from my work codebase. Mondays suck, but Tuesdays are very productive. :3
[19:58:47] <GothAlice> Does anyone know if MMS can be told to not stream-backup specific collections? I'd like to not have my cache (which can be rebuilt in its entirety or lazily) pointlessly transferred around. :/
[20:01:25] <cheeser> from one of the devs: on the backup page in the gear box of options on the right there should be a "Manage excluded namespaces". adding to that should take effect shortly
[20:02:52] <cheeser> it's still going to read changes to those collections from the oplog collection, but the agent will stop forwarding them onward
[20:33:17] <mike_edmr> are unordered writes faster than ordered?
[20:36:10] <michaelq> Quick Mongoose.js question: User.count({}) is returning [Object, object]. How do I get it to instead return the results of the query?
[20:39:28] <ejb> Can anyone recommend some mongo based job queues? I'd like to queue jobs from meteor and process them on some other server(s)
[20:39:58] <Synt4x`> is there an easy way from mongo shell to output my .find (about 2,500 results) into a CSV or something that's easier to scroll through and look at?
[20:40:09] <mike_edmr> ive used monq but its not exactly.. top o the line
[20:40:32] <mike_edmr> there is a bit of cruft in the "schema"
[20:41:12] <Synt4x`> cheeser: thanks I'll look into it, I thought mongodump was to save a whole DB
[20:42:42] <ejb> mike_edmr: can monq be used across servers?
[20:43:11] <mike_edmr> sure. it marks a job as in-progress when you pop it off the queue.
[21:05:50] <GothAlice> ejb: Task queues in MongoDB are easy to roll for yourself: https://gist.github.com/amcgregor/4207375 is an extract from a presentation I gave on the process, with full Python implementation (supporting immediate and scheduled tasks) linked in the comments.
[21:07:43] <joannac> Synt4x`: you probably actually want mongoexport
[21:09:18] <cheeser> yeah. i think dump dumps to bson. export will do json/csv/tsv
[21:09:42] <joannac> just don't try and insert it again, it might not preserve types
[21:14:35] <d-snp> hi, we have a high-write low-read environment, and I'm thinking of splitting our databases up to a database per customer, will this mess with the write throughput?
[21:15:17] <d-snp> yeah? what are the reasons for it improving? is it because of the smaller indexes?
[21:15:22] <GothAlice> d-snp: MongoDB currently has a collection-level write lock. (Future versions will have document-level write locks.) Splitting into separate DBs = spreading the locking around.
[21:15:53] <GothAlice> (No contention when simultaneously updating records for different clients in your case.)
[21:16:14] <d-snp> right, that sounds like it makes sense
[21:17:06] <GothAlice> d-snp: Smaller indexes will help update/find performance, too. Even lookups by ID should be faster, though depending on data size the difference might not be really measurable.
[21:17:13] <d-snp> I'm actually not sure if locking is an issue, it's not really important that the writes go through immediately, as long as the final throughput is optimal
[21:18:20] <d-snp> well, I was fearing maybe because the writes wouldn't be to the same file anymore, there would be less sequential writes which might reduce performance
[21:18:49] <d-snp> is it normal that locking is a bigger bottleneck than the ssd?
[21:20:06] <d-snp> I'm not a db expert, so maybe I'm not making sense :P
[21:21:33] <GothAlice> Locking is an issue if you have high "waiting for write lock" times or write lock percentages. Read-intensive databases have low percentages here, write-intensive databases (where a write is likely to happen before a previous one finished) will have a high percentage. (Sometimes >50%.)
[21:22:21] <GothAlice> When you're spending so much time literally doing nothing and waiting, splitting across DBs can eliminate much of the waiting, and thus utilize disk IO more efficiently.
[21:32:28] <GothAlice> (This middleware also swaps out master templates for CNAME whitelabling purposes.)
[21:34:55] <d-snp> hmm do you plan on having the databases all hosted by the same mongod/s instances, or instances per database?
[21:35:13] <GothAlice> Same mongo cluster, but with authentication controls enabled.
[21:35:34] <d-snp> so you'd have databases with unique names right?
[21:35:48] <GothAlice> Aye. Based on the top and second-level domain. I.e. example_com
[21:36:03] <GothAlice> Where the CNAME is set up as something like "jobs.example.com" or "careers.example.com".
[21:36:04] <d-snp> right, that makes it pretty easy
[21:36:34] <d-snp> any specific reason for not doing a cluster per customer? or just sysadmin overhead?
[21:36:53] <GothAlice> Most clients don't have datasets worthy of that level of isolation and independent scaling.
[21:37:19] <GothAlice> (I.e. we have some clients with 10,000 jobs registered… and they're still on the shared infrastructure, since that comes to about 20MB of data. ;)
[21:38:05] <d-snp> our data is in the 10s of gb's per customer, and that's the small ones unfortunately
[21:40:14] <GothAlice> You mentioned a write-heavy load. Since your dataset for a single client won't fit in RAM anyway, the only benefit of splitting into a cluster-per-client is to have cache locality. (I.e. data cached in RAM for client A that client B's data can't push out if split, but can if shared.)