[00:00:47] <dandv> How does one truly *understand* what all the information dumped by system.profile means? What sorts of numbers should raise flags? I've read all the http://docs.mongodb.org/manual/administration/monitoring/ pages and am still in the dark.
[00:00:51] <zamnuts> dandv, it could be a number of reasons, is it consistently slow or just on a couple of occasions?
[00:01:32] <dandv> zamnuts: I'm not even sure how to figure out how often it's slower than it should be.
[00:03:15] <dandv> zamnuts: it's slower than 10 seconds (!) on more than one occasion: db.system.profile.find({"ns": "stockbase-prod.companies", millis: {$gt: 300}}).limit(5).sort( { millis : -1 } )
[00:04:18] <dandv> Here's the query being slow consistently: http://pastebin.com/uwgbvfca
[00:05:59] <zamnuts> dandv, well the yields look ok, that would've been my first guess
[00:06:34] <dandv> Here's just that "FSL" query: http://pastebin.com/uEP9gDSU
[00:08:28] <zamnuts> joannac, dandv says that when running the same query in shell the query runs quickly at 27ms
[00:10:21] <joannac> okay, so where's the long one coming from?
[00:13:39] <dandv> joannac: without index, running the query takes <100ms: http://pastebin.com/S45M2Xqd
[00:16:04] <dandv> Also, this is on Digital Ocean. SSD disks. I can't even begin to understand why scanning 16k records would take more than NINE seconds. Is Mongo a mature technology?
[00:16:17] <dandv> a 2-CPU server with 2GB of RAM.
[00:19:20] <dandv> joannac: the slow query is generated by a Meteor app
[00:20:06] <joannac> dandv: to be honest, if it works okay in the shell, but poorly through Meteor, I would be pointing the finger at meteor
[00:20:37] <dandv> But it's Mongo that executes the query sent by Meteor. "millis" are as recorded by the Mongo profilers.
[00:21:17] <dandv> presumably the profiler records the query exactly as received, with the projection and limit and sort?
[00:21:40] <joannac> repeatable through Meteor? If you run the query 10 times, it's always 27 seconds?
[00:30:10] <dandv> not repeatable, right now it took only ~200ms
[00:39:06] <dandv> would high CPU due to other processes explain that?
[00:41:57] <zamnuts> dandv, based on the screenshot you sent me, your cpu usage isn't "high" - even if your process are not multithreaded, you still don't go over 50%, the kernel would allocate accordingly
[00:42:55] <zamnuts> dandv, in other words, i think mongod gets a good amount of CPU allocation when it needs it and still has room left over, if it was fighting for resources, your usage would be over 50% (you said you had 2 cores, right?)
[00:42:56] <dandv> htop often shows 90%+ spikes, maybe DigitalOcean averages over ~20 seconds or something?
[00:48:11] <zamnuts> shows how "hungry" a process is, if it is climbing at a good rate, the process is using cycles
[00:48:48] <dandv> 29 hours vs. a lot less, http://snag.gy/apXAR.jpg
[00:49:16] <dandv> where would I learn more about "climbing" and using cycles?
[00:50:43] <dandv> So right now, that machine runs a Meteor frontend that accesses the DB. About 10 minutes ago, I stopped a backend application that continuously populated that DB, parsing feeds (so write activity). Maybe that explains the pattern? Now I'm going to run frontend queries and see how the system behaves.
[00:54:06] <zamnuts> dandv, where to learn about it? not sure, you could google 'linux cpu time'
[00:54:46] <dandv> not familiar with the cycles. So does it seem normal to have that discrepancy in TIME+ between mongod and other processes?
[00:55:04] <zamnuts> dandv, your lock acquire times are "high" but not relatively so, they're at ~4 seconds (timeAcquireLock)
[00:55:28] <zamnuts> dandv, judging by that screenshot, mongodb is using _a lot_ of cpu
[00:56:02] <dandv> thanks zamnuts, that's what I thought, but I'm just beginning to learn about profiling mongodb. Trying to understand what might cause that high CPU usage.
[00:57:26] <zamnuts> however, your timeLockedMicros is at ~45 seconds, so the acquire time is just a fraction - this indicates that the slow queries are not consistent, otherwise the acquire time would be just as high (waiting for other slow query reads to finish)
[00:58:07] <zamnuts> which is the good news in this debacle...
[00:58:25] <zamnuts> albeit harder to troubleshoot
[00:58:50] <zamnuts> dandv, i'm not sure why the high cpu usage - like you said, its only 16k records
[00:59:57] <zamnuts> what's your typical document size? i'm not sure of the inner working of mongod, but if its gotta do a full table scan with large documents that could account for the slowness, but IMO that would be disk-bound, not cpu-bound
[01:01:17] <dandv> in that collection with 16k records, 1k-2k/document.
[06:08:42] <inad922> How can I be sure that database operation has finished with either pymongo or mongodb?
[06:10:03] <inad922> I would like to write a test where a celery application writes the database and in another process I check if it's finished. It seems finished but when I try to check for the data in the other process it's not in the db yet. So I just added a fix delay but it would be better if I could check if the operation ha finished or not
[07:25:02] <Sam____> Does anyone know if we can support Centralized Authentication in Mongodb's Basic/Standard Edition subscription? Is that part of the mongo's base build, or an add on?
[07:29:27] <cheeser> kerberos/gssapi is available but not in the community builds, iirc
[07:29:50] <stefuNz> Hi everybody, i asked here yesterday about performance improvements. It seems that MongoDB saturates the Disk IO (iostat %util is 100%) every minute and i guess that this is caused by the sync, because the syncdelay is on its default value of 60 seconds. i saw a peak for 10-20 seconds every minute, so this seems to be the issue. how do i go from here? i have a write-most read-some workload.
[07:31:08] <Sam____> Thanks Cheeser. The new website has dropped that information. There used to be a nice table comparing the 3 edition(B/S/E)... not anymore
[07:31:34] <cheeser> Sam____: you should file a docs bug on that.
[07:32:29] <stefuNz> cheeser: i didn't set it. i'm not using journals, because i'm working with a replicaset. how do i save IO?
[07:34:20] <cheeser> i'm not sure what would cause an IO spike like that.
[07:36:38] <Sam____> stefuNz, we have seen that too. I was doing a heavy write load for initial update... we finally had to set syncDelay to something ridiculously low(20 sec) .. that caused frequent disk flushes with lesser data. If you see that happen often, then I am afraid you might have to look at either faster disks(SSDs) or shard out our writes.
[07:37:45] <Sam____> how much data are you writing per sec(MB/sec. ops/sec)
[07:38:05] <stefuNz> Sam____: That's a good question, i don't know yet. How can i see that?
[07:41:00] <stefuNz> it's varying from 100-300 updates a second. mostly updates. i'd say average 150
[07:41:53] <Sam____> I would do iostats(written blocks), mongostat for Inserts/sec. Also look at average col stats(db.coll.stats to get the document average doc size)... that times the insert/sec will give you a good idea
[07:42:13] <Sam____> what the average doc size for that collection?
[07:42:26] <stefuNz> Sam____: Hang on, i'll look it up
[07:42:51] <stefuNz> Sam____: avgObjSize is 1179.045547
[07:43:36] <stefuNz> Sam____: I'd say 99% of all the writes are upserts.
[07:44:24] <Sam____> Hmm... I was writing about 1200-1800 docs per sec... when I reached that saturation... so you are reaching there too early.
[07:44:41] <stefuNz> Sam____: What could cause this?
[07:44:49] <Sam____> What;s the storage subsystem like? SAN? Local?
[07:45:11] <Sam____> Have you stripped your data volume?
[07:45:39] <stefuNz> It's a local EXT4 on an LVM on a RAID1 with HW Controller. The same logical volume is shared with a MySQL instance and with a SOLR instance.
[07:45:58] <stefuNz> System is Ubuntu 12.04 64 bit (3.2.0 kernel)
[07:46:33] <stefuNz> Would it be a good idea to do a separate LVM-volume for MongoDB?
[07:46:49] <stefuNz> I have no way to get a new harddrive
[07:50:23] <stefuNz> the PV is the RAID-System from the hardware raid controller using two harddrives (7200rpm)
[07:50:56] <Sam____> I think you should consider getting more disks under the lvm... and stripe your data.. With just one pv(mirrorred), you are limited by the speed of one moving spindle
[07:50:59] <Sam____> that's going to slow you down
[07:53:44] <stefuNz> Sam____: I'll ask my provider... Would an additional LV do benefit, too?
[07:54:25] <Sam____> If it's from the VG, which has only so many pv ... you don't have much to gain from that
[07:55:44] <Sam____> You could consider creating a datavg just for mongo... add quite a few disks and stripe your data... Mongo recommends RAID10, but atleast stripe it
[08:03:05] <Tr4d3r> Hello, all ... i have a question, maybe someone here can help ... i'm trying to make a find() where if a column have a bigger number then that one is gonna return ... for example, i have 2 fields, file and version ... file can be repeted but version is a number ... i need to get the files with mayor version ... this is possible? ... thanks
[08:07:54] <rspijker> Tr4d3r: you want the maximum?
[08:08:16] <rspijker> maybe an example of some documents and what you would expect the query to return (in a pastebin) might be enlightening
[08:11:19] <rspijker> each file + version combination where version is the maximum within that file?
[08:11:46] <Tr4d3r> as you can see, i have 3 files with version 1 and 1 with version 2 ... i need to get the one that have version 1 except the one that have version 2 because i need the version 2
[08:12:13] <rspijker> ok, you can probably use the aggregation framework for that
[08:12:50] <Tr4d3r> ok, i'm gonna read about it thanks
[08:13:35] <rspijker> is there other stuff in the document?
[08:13:44] <rspijker> or do you only need a list of files and versions?
[08:17:15] <rspijker> nocturn: there are big differences between SSDs as well in terms of IOPS… ranging from 10K/s to over 1M/s
[08:17:49] <nocturn> rspijker, thanks for that, I'm looking for a typical setup for enterprise usage of MongoDB
[08:18:32] <rspijker> nocturn: I think the vertex drives we used for a while did about 100K IOPS
[08:19:26] <rspijker> that’s probably near the maximum you’d get over a SATA interface
[08:22:38] <Tr4d3r> ok, i test the aggregate and work ok ... but, there is a way to pass the exact version in the max command? for example, max:2 or max:5
[08:28:05] <rspijker> Tr4d3r: not to the max command, but that would just be a query…
[08:28:39] <rspijker> you can add a $match before the group, because I assume that if you say version=3, you’d want files with version no higher than 2 as well?
[09:56:52] <urban_> I was wondering if any has got this to work with Unity
[09:57:09] <rspijker> BaNzounet: well… it indexes on _id first, and those are unique.. so a compound index on that doesn;t make a loit of sense to me...
[09:57:41] <BaNzounet> rspijker: in fact is "somethingElseId"
[09:57:42] <urban_> Unity as in http://unity3d.com/
[10:39:09] <ctp> hi folks. does anyone have a hint for me: http://stackoverflow.com/questions/24801234/mongodb-reading-from-slaves-when-synced? reading from secondaries doesnt work in the manner i expected :)
[14:33:44] <cheeser> remonvv: i don't see any but maybe you should open one either way. it'd mean more coming from a user and then at least you'd get a definitive response from the server team
[14:40:55] <rspijker> for one, azure is crap (we tried it, it’s crap). For another, mongo just fits our use case really well. We also tested some other noSQL DBs and mongo came out on top
[14:41:30] <rspijker> In all fairness, I arrived after the choice for mongo was already made, but I have faith in the guys that did the comparison and I’m fairly happy with mongo so far
[14:41:37] <bangbang> Interesting... I'm curious why you think it's crap. We use the shit of it.
[14:41:44] <rspijker> some quirks/bugs, but pretty good all over
[14:42:11] <bangbang> Not trying to turn this into a "mine is better than yours" debate... I'm genuinely curious.
[14:42:25] <rspijker> We used it for a very high volume/frequency messaging app (millions of transactions per second) and the platform just wasn’t stable
[14:45:38] <bangbang> I'm just comparing my experience, and can actually relate to a certain extent
[14:45:39] <rspijker> Also I, personally, don’t like the lock-in that comes with OTT services from Azure, or for that matter AWS. I’m buying a commodity from them and a big advantage of that is that I can go elsewhere without having to redesign/change anything. As soon as I start using their specific brand of service, I’m committing myself far more
[14:45:56] <rspijker> This was not too long ago, actually… Like 6 to 12 months I’d say
[14:48:04] <bangbang> We also thought the same about having to redesign/change anything. In our case, we aren't attached to their platform because of how our application is architected.
[14:48:46] <bangbang> I haven't met many people who have really dug into Azure.
[14:49:11] <rspijker> another thing is the fact that they’re a US company…
[14:49:21] <rspijker> that’s a big thing for certain types of data in the EU
[14:49:55] <remonvv> That's not necessarily an issue. Amazon is a US company but their EU datacenters fall under EU privacy law
[14:51:00] <remonvv> We went for AWS well before Azure even existed but due to customer pressure (read: heavily sponsored by Microsoft) we have repeatedly looked at Azure. It's a very clean platform but its uptime track record is a little dodgy. They also used to have a bizarre pricing model for stopped instances.
[14:51:29] <wolfen> umm guys can i ask a stupid question please?
[14:51:38] <remonvv> Those are my favorite questions
[14:52:30] <rspijker> anyway, it doesn’t even really matter whether it is or it isn’t. What matters is how my customers percieve it. And I can tell you, it can be a deal breaker :)
[14:52:33] <remonvv> rspijker: Amazon Inc is but their EU datacenter is officially under a legal entity in Luxembourg iirc
[14:52:37] <wolfen> set @var = select count(*) from (select 1 as A, 2 as B where A=1) data
[14:53:38] <wolfen> right now i have to put the record in a collection and query it back, which is SLOOOW
[14:53:43] <remonvv> Anyway, I suppose my point is that Amazon/Azure data security is still miles ahead of anything you or a smaller hosting provider would likely offer.
[14:54:24] <wolfen> at least in financial services
[14:54:41] <wolfen> so that excludes azure and aws for us :(
[14:55:19] <luca3m> wolfen: even on crypted devices?
[14:55:36] <luca3m> you can crypt mongodb partition
[14:55:37] <wolfen> jip completely. no customer personal data may be stored on foreign servers
[14:57:01] <wolfen> so, anyone have a brilliant insight on how to write this one line of sql as javascript ?
[14:57:06] <luca3m> some times I have to think about these problems too. for important data is a big deal where to store them
[14:57:11] <wolfen> only way i can think of is to go implement .find myself
[14:58:02] <wolfen> here i was begging the wider internet for help.. but no joy so far: http://stackoverflow.com/questions/24725950/mongodb-determine-if-a-record-would-pass-findone
[14:58:55] <rspijker_> wolfen: I don’t really know SQL that well… so maybe it would help if you could explain what you are actually after, without the SQL :)
[15:02:14] <remonvv> SQL to MongoDB isn't a matter of rewriting queries. A lot of things are different. Basically the overlap ends with "we both store stuff somewhere"
[15:02:29] <wolfen> and doing this 500+ times a second is serious load i wish to avoid
[15:02:39] <remonvv> wolfen: Well let me first point out that it's probably a design flaw that makes you need this in the first place.
[15:02:51] <remonvv> wolfen: So perhaps share with us what you want to do rather than how you'd like to do it
[15:06:39] <wolfen> it does, but im forced into this solution
[15:07:17] <remonvv> wolfen : It's very hard to comment without more insight which is probably beyond the scope of this channel. What you wanted to do (arbitraryDoc.find()) is not possible. You can either look at reworking your system or scaling up the performance.
[15:07:24] <wolfen> right now im considering making 1000 databases, and then adding the collection into that
[15:07:36] <wolfen> and then doing the insert and remove solution
[15:07:48] <wolfen> at least then i remove the write contention
[15:08:33] <wolfen> o, is cool, at least i now know that there is no trivial solution
[15:08:36] <remonvv> wolfen : That sounds properly dodgy ;)
[15:09:41] <remonvv> wolfen : There's almost certainly a good solution for this but it's hard to determine what that might be and how much legacy code/schemas you have to work with
[15:10:00] <wolfen> yeah, i think i;ll work with support
[15:10:22] <remonvv> As a general rule I'd not start using different logical databases just to avoid write contention.
[15:10:33] <remonvv> Yeah but we're so much more awesome.
[15:10:40] <wolfen> yeah me too, its just my desperation speaking
[15:11:37] <remonvv> wolfen : Well, sticking multiple shard on the same machine also reduces write contention without the drawback of having to manage 1000 databases. And there's only so many concurrent writes you want before something else becomes a bottleneck
[15:12:50] <remonvv> And you can always look at batching. We've gained quite a bit of performance by using 1ms batching for very high throughput queries/updates
[15:12:50] <wolfen> at the moment they are.. but perhaps i should rebuild find for the trivial cases
[15:13:54] <remonvv> Basically what we do is park certain requests (not quite but for easier discussion) for 1ms and create work batches that we throw to mongo or batch messages on our message queue.
[15:14:15] <remonvv> In some cases we gained huge jumps in throughput and it's never worse.
[15:14:44] <remonvv> When we get the batch back we compile responses to the parked requests and push them back.
[15:15:40] <remonvv> FYI, we make systems for apps that provide interactivity with large TV shows so hitting 100,000-200,000 req/sec is not an exception. Hence optimizing this particular sort of use case.
[15:16:31] <wolfen> at the moment im only achieving about 150 req/sec
[15:16:38] <wolfen> so looking for places to optimise
[15:20:10] <remonvv> Note that that throughput isn't all going to mongo (it's throttled by MQ throughput as most per-user persistence is async). I think a typical peak towards the db is about 30k-40k/sec
[15:20:39] <remonvv> mongod scales up almost linearly with the amount of shards, only headache is balancing and chunk splitting difficulties
[15:21:35] <remonvv> yeah but that's mostly mongod, we don't do much more than stick all the work on MQs and send the updates to mongo as fast as it will take it.
[15:22:49] <remonvv> Anyway, off topic a bit. Point being, clean solutions > dodgy ones so only go for things like multiple databases/collections for a single logical one if you've exhausted other options
[17:14:15] <poeticninja> Question. Creating users for a large customer service application. When creating the user id, should I use the mongoose created id or use a separate UUID that is on file that a customer service rep would look up?
[17:15:19] <poeticninja> The reason I ask here is because part of me feels that the user id should not be based on the document _id. This is a discussion at work.
[17:16:09] <poeticninja> From a mongodb point of view, will that document id ever change, if we update the schema, or other reasons?
[18:19:17] <dan___> Hi all! I would like to know how to access to my apache log (access.log and error.log) from a running container?
[18:26:23] <mitc0185> I have to do a query with a given date range, but my documents all have a string representation of a date rather than a date object (my mistake)
[18:26:41] <mitc0185> is there a way I can convert to a date object in the query so I can search on this attribute?
[18:55:13] <loki97123> is there a specific irc for mongodb in the cloud?
[19:04:04] <xueye> Hi guys, I have a quick question about ensuring indexes; I'm using the C# driver to interface with mongo and noticed that Collection.CreateIndex will cause an index to be built, even if the index already exists -- is there any equivalent of EnsureIndex anymore?
[19:24:53] <nicolas_leonidas> hi, I have a collection holding documents that look like this http://paste.debian.net/110315/
[19:25:22] <nicolas_leonidas> I'd like to query for documents that have pid = 78112 for example, but since the keys for history array are urls, I'm facing difficulties
[19:25:37] <nicolas_leonidas> is there anyway around this?
[20:04:36] <jekle> nicolas_leonidas: I am no mongo expert. but maybe with the aggregate, unwind and match functions you can do you query.
[20:46:12] <shin_> hi - i'm working on a data migration library similar to mongeez but doesn't use db.eval() to run migrations and i've got some questions about concurrent migrations. For example, if I have a cluster of application nodes starting up and all trying to run a migration, what kind of things am I going to have to look out for?
[20:54:47] <kali> shin_: why don't you just use a 'magic' document in a well known collection and _id as a lock ?
[20:57:39] <shin_> kali: thanks for the suggestion and sounds like a good one because I think this is what liquibase does also. Sometimes the lock gets left behind if a problem occured and you have to clean it out manually but I don't see a way around that.
[20:58:57] <kali> shin_: you can use a "locked_until" scheme... if the lock is forgotten somehow, when it becomes obsolete, another locker can take it
[20:59:24] <kali> shin_: in that case, you need to make sure you lock for long enough, or have the lock refreshed on a regular basis
[21:01:06] <shin_> kali: yeah i like that, i could refresh the timeout as I execute each change set
[21:04:45] <kali> shin_: https://github.com/kali/extropy/blob/master/core/src/main/scala/MongoLock.scala . no rocket science, but a few corner cases to think about
[21:12:02] <shin_> kali: thanks...now i've got more work to do :0