pmxbot IRC Log Viewer

[00:00:47] <dandv> How does one truly *understand* what all the information dumped by system.profile means? What sorts of numbers should raise flags? I've read all the http://docs.mongodb.org/manual/administration/monitoring/ pages and am still in the dark.

[00:00:51] <zamnuts> dandv, it could be a number of reasons, is it consistently slow or just on a couple of occasions?

[00:01:32] <dandv> zamnuts: I'm not even sure how to figure out how often it's slower than it should be.

[00:03:15] <dandv> zamnuts: it's slower than 10 seconds (!) on more than one occasion: db.system.profile.find({"ns": "stockbase-prod.companies", millis: {$gt: 300}}).limit(5).sort( { millis : -1 } )

[00:04:18] <dandv> Here's the query being slow consistently: http://pastebin.com/uwgbvfca

[00:05:59] <zamnuts> dandv, well the yields look ok, that would've been my first guess

[00:06:34] <dandv> Here's just that "FSL" query: http://pastebin.com/uEP9gDSU

[00:07:50] <joannac> unindexed query

[00:08:28] <zamnuts> joannac, dandv says that when running the same query in shell the query runs quickly at 27ms

[00:10:21] <joannac> okay, so where's the long one coming from?

[00:13:39] <dandv> joannac: without index, running the query takes <100ms: http://pastebin.com/S45M2Xqd

[00:16:04] <dandv> Also, this is on Digital Ocean. SSD disks. I can't even begin to understand why scanning 16k records would take more than NINE seconds. Is Mongo a mature technology?

[00:16:17] <dandv> a 2-CPU server with 2GB of RAM.

[00:19:20] <dandv> joannac: the slow query is generated by a Meteor app

[00:20:06] <joannac> dandv: to be honest, if it works okay in the shell, but poorly through Meteor, I would be pointing the finger at meteor

[00:20:37] <dandv> But it's Mongo that executes the query sent by Meteor. "millis" are as recorded by the Mongo profilers.

[00:21:17] <dandv> presumably the profiler records the query exactly as received, with the projection and limit and sort?

[00:21:40] <joannac> repeatable through Meteor? If you run the query 10 times, it's always 27 seconds?

[00:30:10] <dandv> not repeatable, right now it took only ~200ms

[00:31:39] <joannac> hrm

[00:31:44] <zamnuts> dandv, are you running a standalone mongod instance, or are there secondaries involved?

[00:31:59] <dandv> zamnuts: standalone, no oplog

[00:32:12] <zamnuts> also, are these queries taking place during high volume times?

[00:32:44] <dandv> nope, there's no volume. app is basically in private beta, yet you can go to http://stockbase.com/company/FSL

[00:32:51] <zamnuts> what does your disk i/o look like? are you runnin stats on the server? what about mms?

[00:33:32] <dandv> stats like http://docs.mongodb.org/manual/reference/program/mongostat/ ? I have DigitalOcean's disk graph

[00:33:48] <zamnuts> yea, disk graph, looks good? no spikes?

[00:34:17] <zamnuts> i'm talking about something like zabbix or nagios or w/e, i'm asusming DO's disk graph is similar.

[00:35:36] <dandv> Disk IO is basically flatline, http://snag.gy/E4jsl.jpg, but CPU is quite high... htop was listing it close to 100% often

[00:37:48] <zamnuts> dandy, :| meh. indexing 'symbol' will help, but that doesn't explain the random slow query tendencies

[00:37:56] <zamnuts> dandv, ^^

[00:39:06] <dandv> would high CPU due to other processes explain that?

[00:41:57] <zamnuts> dandv, based on the screenshot you sent me, your cpu usage isn't "high" - even if your process are not multithreaded, you still don't go over 50%, the kernel would allocate accordingly

[00:42:55] <zamnuts> dandv, in other words, i think mongod gets a good amount of CPU allocation when it needs it and still has room left over, if it was fighting for resources, your usage would be over 50% (you said you had 2 cores, right?)

[00:42:56] <dandv> htop often shows 90%+ spikes, maybe DigitalOcean averages over ~20 seconds or something?

[00:43:06] <dandv> 2 cores, correct

[00:43:30] <zamnuts> yea, maybe DO is a rolling average or something

[00:43:43] <zamnuts> htop is always spot-on

[00:44:58] <zamnuts> what's your CPU time for mongod (relative to other processes)?

[00:46:41] <dandv> how would I determine that? In htop I've seen from <1% (most of the time) to spikes nearing 98%

[00:46:52] <dandv> That was in the CPU% column

[00:47:23] <zamnuts> TIME+ column

[00:48:11] <zamnuts> shows how "hungry" a process is, if it is climbing at a good rate, the process is using cycles

[00:48:48] <dandv> 29 hours vs. a lot less, http://snag.gy/apXAR.jpg

[00:49:16] <dandv> where would I learn more about "climbing" and using cycles?

[00:50:43] <dandv> So right now, that machine runs a Meteor frontend that accesses the DB. About 10 minutes ago, I stopped a backend application that continuously populated that DB, parsing feeds (so write activity). Maybe that explains the pattern? Now I'm going to run frontend queries and see how the system behaves.

[00:54:06] <zamnuts> dandv, where to learn about it? not sure, you could google 'linux cpu time'

[00:54:46] <dandv> not familiar with the cycles. So does it seem normal to have that discrepancy in TIME+ between mongod and other processes?

[00:55:04] <zamnuts> dandv, your lock acquire times are "high" but not relatively so, they're at ~4 seconds (timeAcquireLock)

[00:55:28] <zamnuts> dandv, judging by that screenshot, mongodb is using _a lot_ of cpu

[00:56:02] <dandv> thanks zamnuts, that's what I thought, but I'm just beginning to learn about profiling mongodb. Trying to understand what might cause that high CPU usage.

[00:56:22] <zamnuts> dandv, sorry timeAcquiringMicros*

[00:57:26] <zamnuts> however, your timeLockedMicros is at ~45 seconds, so the acquire time is just a fraction - this indicates that the slow queries are not consistent, otherwise the acquire time would be just as high (waiting for other slow query reads to finish)

[00:58:07] <zamnuts> which is the good news in this debacle...

[00:58:25] <zamnuts> albeit harder to troubleshoot

[00:58:50] <zamnuts> dandv, i'm not sure why the high cpu usage - like you said, its only 16k records

[00:59:57] <zamnuts> what's your typical document size? i'm not sure of the inner working of mongod, but if its gotta do a full table scan with large documents that could account for the slowness, but IMO that would be disk-bound, not cpu-bound

[01:01:17] <dandv> in that collection with 16k records, 1k-2k/document.

[01:01:45] <dandv> would mms.mongodb.com help?

[01:01:45] <zamnuts> tiny

[01:02:50] <zamnuts> of course it would

[01:03:29] <zamnuts> pretty easy to set up: create an account, get an api key, install the agent, add your host, collect analytics

[01:06:18] <zamnuts> dandv, i'm out, good luck!

[01:44:03] <netameta> guys whats a good way to localize/save dates ?

[02:01:49] <dandv> netameta: localize/save? use ISODate, but not sure what you mean

[02:02:16] <netameta> well i will just save new Date object i guess

[02:04:48] <dandv> what language? you should be more specific

[02:56:51] <netameta> node - sails - js

[06:08:27] <inad922> hi

[06:08:42] <inad922> How can I be sure that database operation has finished with either pymongo or mongodb?

[06:10:03] <inad922> I would like to write a test where a celery application writes the database and in another process I check if it's finished. It seems finished but when I try to check for the data in the other process it's not in the db yet. So I just added a fix delay but it would be better if I could check if the operation ha finished or not

[06:35:17] <joannac> getLastError?

[07:24:09] <Sam____> Hello

[07:25:02] <Sam____> Does anyone know if we can support Centralized Authentication in Mongodb's Basic/Standard Edition subscription? Is that part of the mongo's base build, or an add on?

[07:29:27] <cheeser> kerberos/gssapi is available but not in the community builds, iirc

[07:29:50] <stefuNz> Hi everybody, i asked here yesterday about performance improvements. It seems that MongoDB saturates the Disk IO (iostat %util is 100%) every minute and i guess that this is caused by the sync, because the syncdelay is on its default value of 60 seconds. i saw a peak for 10-20 seconds every minute, so this seems to be the issue. how do i go from here? i have a write-most read-some workload.

[07:31:08] <Sam____> Thanks Cheeser. The new website has dropped that information. There used to be a nice table comparing the 3 edition(B/S/E)... not anymore

[07:31:34] <cheeser> Sam____: you should file a docs bug on that.

[07:31:53] <cheeser> stefuNz: iirc, syncDelay really isn't recommended

[07:32:29] <stefuNz> cheeser: i didn't set it. i'm not using journals, because i'm working with a replicaset. how do i save IO?

[07:34:20] <cheeser> i'm not sure what would cause an IO spike like that.

[07:36:38] <Sam____> stefuNz, we have seen that too. I was doing a heavy write load for initial update... we finally had to set syncDelay to something ridiculously low(20 sec) .. that caused frequent disk flushes with lesser data. If you see that happen often, then I am afraid you might have to look at either faster disks(SSDs) or shard out our writes.

[07:37:45] <Sam____> how much data are you writing per sec(MB/sec. ops/sec)

[07:38:05] <stefuNz> Sam____: That's a good question, i don't know yet. How can i see that?

[07:40:27] <stefuNz> oh i see now

[07:41:00] <stefuNz> it's varying from 100-300 updates a second. mostly updates. i'd say average 150

[07:41:53] <Sam____> I would do iostats(written blocks), mongostat for Inserts/sec. Also look at average col stats(db.coll.stats to get the document average doc size)... that times the insert/sec will give you a good idea

[07:42:13] <Sam____> what the average doc size for that collection?

[07:42:26] <stefuNz> Sam____: Hang on, i'll look it up

[07:42:51] <stefuNz> Sam____: avgObjSize is 1179.045547

[07:43:36] <stefuNz> Sam____: I'd say 99% of all the writes are upserts.

[07:44:24] <Sam____> Hmm... I was writing about 1200-1800 docs per sec... when I reached that saturation... so you are reaching there too early.

[07:44:41] <stefuNz> Sam____: What could cause this?

[07:44:49] <Sam____> What;s the storage subsystem like? SAN? Local?

[07:45:11] <Sam____> Have you stripped your data volume?

[07:45:39] <stefuNz> It's a local EXT4 on an LVM on a RAID1 with HW Controller. The same logical volume is shared with a MySQL instance and with a SOLR instance.

[07:45:58] <stefuNz> System is Ubuntu 12.04 64 bit (3.2.0 kernel)

[07:46:33] <stefuNz> Would it be a good idea to do a separate LVM-volume for MongoDB?

[07:46:49] <stefuNz> I have no way to get a new harddrive

[07:46:55] <stefuNz> *additionally

[07:49:03] <Sam____> yes.. it's always better to isolate out these workloads...

[07:49:25] <Sam____> Do you know how many PVs you have under the lv

[07:49:29] <stefuNz> Sam____: Would another logical LVM volume bring better performance for mongodb?

[07:49:33] <stefuNz> it's one PV

[07:50:23] <stefuNz> the PV is the RAID-System from the hardware raid controller using two harddrives (7200rpm)

[07:50:56] <Sam____> I think you should consider getting more disks under the lvm... and stripe your data.. With just one pv(mirrorred), you are limited by the speed of one moving spindle

[07:50:59] <Sam____> that's going to slow you down

[07:53:44] <stefuNz> Sam____: I'll ask my provider... Would an additional LV do benefit, too?

[07:54:25] <Sam____> If it's from the VG, which has only so many pv ... you don't have much to gain from that

[07:55:44] <Sam____> You could consider creating a datavg just for mongo... add quite a few disks and stripe your data... Mongo recommends RAID10, but atleast stripe it

[08:03:05] <Tr4d3r> Hello, all ... i have a question, maybe someone here can help ... i'm trying to make a find() where if a column have a bigger number then that one is gonna return ... for example, i have 2 fields, file and version ... file can be repeted but version is a number ... i need to get the files with mayor version ... this is possible? ... thanks

[08:07:54] <rspijker> Tr4d3r: you want the maximum?

[08:08:16] <rspijker> maybe an example of some documents and what you would expect the query to return (in a pastebin) might be enlightening

[08:08:37] <Tr4d3r> ok, thanks

[08:10:36] <Tr4d3r> here is the pastebin

[08:10:40] <Tr4d3r> http://pastie.org/9398438

[08:11:07] <rspijker> and you would expect…?

[08:11:19] <rspijker> each file + version combination where version is the maximum within that file?

[08:11:46] <Tr4d3r> as you can see, i have 3 files with version 1 and 1 with version 2 ... i need to get the one that have version 1 except the one that have version 2 because i need the version 2

[08:12:13] <rspijker> ok, you can probably use the aggregation framework for that

[08:12:50] <Tr4d3r> ok, i'm gonna read about it thanks

[08:13:35] <rspijker> is there other stuff in the document?

[08:13:44] <rspijker> or do you only need a list of files and versions?

[08:14:02] <Tr4d3r> is another stuff

[08:14:04] <Tr4d3r> like modified

[08:14:21] <Tr4d3r> i'm trying to get a filesystem version

[08:14:28] <Tr4d3r> with file versions

[08:14:43] <Tr4d3r> for example if the user want to see the file system in version 2 or in version 3

[08:15:27] <rspijker> hmmm, ok

[08:15:39] <rspijker> you can get a list of fileNames and latest versions like this:

[08:15:40] <rspijker> db.coll.aggregate({$group:{"_id":"$rpath", "v":{$max:"$fs_version"}}})

[08:15:46] <nocturn> Hi guys, could anyone point me to how many IOPS a storage solution should provide to be equivalent to an SSD for MongoDB?

[08:16:01] <Tr4d3r> testing

[08:17:15] <rspijker> nocturn: there are big differences between SSDs as well in terms of IOPS… ranging from 10K/s to over 1M/s

[08:17:49] <nocturn> rspijker, thanks for that, I'm looking for a typical setup for enterprise usage of MongoDB

[08:18:32] <rspijker> nocturn: I think the vertex drives we used for a while did about 100K IOPS

[08:19:26] <rspijker> that’s probably near the maximum you’d get over a SATA interface

[08:22:38] <Tr4d3r> ok, i test the aggregate and work ok ... but, there is a way to pass the exact version in the max command? for example, max:2 or max:5

[08:22:40] <Tr4d3r> ?

[08:22:44] <Tr4d3r> i'm crazy?

[08:22:46] <Tr4d3r> lol

[08:28:05] <rspijker> Tr4d3r: not to the max command, but that would just be a query…

[08:28:39] <rspijker> you can add a $match before the group, because I assume that if you say version=3, you’d want files with version no higher than 2 as well?

[08:30:40] <Tr4d3r> yes

[08:39:24] <nocturn> Thank you rspijker

[08:39:37] <rspijker> np nocturn

[08:40:20] <rspijker> Tr4d3r: then you can do a $match:{“fs_version”:{$lte:theAmountYouPassIn}} in the aggregation

[08:40:34] <rspijker> that ay it only considers version at or below what you are specifying

[08:40:38] <rspijker> the max will still work after that

[08:41:02] <rspijker> it’s just db.coll.aggregate({$match:{…}},{$group:{…}})

[08:45:22] <Tr4d3r> ok, testing ... (aggregate is really power feature)

[08:53:01] <Tr4d3r> Works perfect!!! thanks rspijker

[08:57:53] <rspijker> np

[09:53:40] <BaNzounet> Hey guys, An index with something like this { "id": 1, "createdAt" : -1 } mean it index with the newer on top ?

[09:56:37] <urban_> hi

[09:56:52] <urban_> I was wondering if any has got this to work with Unity

[09:57:09] <rspijker> BaNzounet: well… it indexes on _id first, and those are unique.. so a compound index on that doesn;t make a loit of sense to me...

[09:57:41] <BaNzounet> rspijker: in fact is "somethingElseId"

[09:57:42] <urban_> Unity as in http://unity3d.com/

[09:57:53] <rspijker> sorry, meeting :)

[10:00:35] <Nodex> BaNzounet : that will created a descending ordered index

[10:00:42] <og01> how important is it that an arbiter doesnt run on the same machine as the primary?

[10:00:57] <Nodex> a compound on id asc with createdAt desc sorry

[10:01:04] <kali> og01: well, if the box fails, your cluster is down

[10:01:34] <og01> (presume 2 instances + 1 arbiter) the secondary wont switch to primary?

[10:01:36] <og01> and thats it?

[10:01:41] <BaNzounet> Nodex: so If I'm not mistaken createAd desc means the recent are on top ?

[10:01:46] <BaNzounet> at the*

[10:01:53] <og01> in theory it would otherwise work?

[10:02:04] <kali> og01: yes

[10:39:09] <ctp> hi folks. does anyone have a hint for me: http://stackoverflow.com/questions/24801234/mongodb-reading-from-slaves-when-synced? reading from secondaries doesnt work in the manner i expected :)

[11:56:26] <remonvv> \o

[11:58:54] <cheeser> o/

[13:49:03] <wolfen> good evening mongodb experts

[13:49:09] <wolfen> :)

[13:55:38] <luca3m> more or less ^_^

[14:26:05] <remonvv> cheeser: Any updates or open JIRA tickets regarding the write performance issue? I'd like to actively track it.

[14:29:41] <bangbang> Is there something I need to do to get mongodb accessible from a remote server?

[14:30:54] <saml> no

[14:30:56] <rspijker> nope

[14:31:04] <saml> open up the port in firewall

[14:31:13] <bangbang> I added it to the IPTables

[14:31:20] <saml> you still can't access?

[14:31:23] <bangbang> correct.

[14:31:25] <saml> show me iptables

[14:31:29] <saml> rules

[14:31:32] <bangbang> one moment, let me create a pastebin

[14:31:37] <saml> and which port does your mongod listen to

[14:32:14] <rspijker> is mongod bound to the correct interface(s)?

[14:32:53] <bangbang> http://pastebin.com/1F95KVTj

[14:33:21] <bangbang> rspijker: let me check that

[14:33:44] <cheeser> remonvv: i don't see any but maybe you should open one either way. it'd mean more coming from a user and then at least you'd get a definitive response from the server team

[14:34:54] <bangbang> rspijker: BRILLIANT!

[14:35:05] <bangbang> rspijker: you sir, are awesome

[14:36:17] <rspijker> glad it’s sorted :)

[14:36:20] <remonvv> cheeser : "it'd mean more coming from a user". I would hope that's not actually true ;)

[14:36:23] <remonvv> And alright.

[14:36:51] <bangbang> rspijker: we're switching from the .NET / MSSQL world, to the nodejs / mongo realm :P

[14:37:06] <remonvv> Oh dear.

[14:37:08] <bangbang> lol

[14:38:02] <cheeser> remonvv: well, it probably is and for good reason. :)

[14:38:22] <cheeser> real world usage > coworker grumbling about contrived use cases :D

[14:38:25] <rspijker> bangbang: we’re halfway in between, .net and mongo :)

[14:39:21] <remonvv> cheeser: Well, if you put it like that I suppose it makes sense ;)

[14:39:35] <bangbang> We love .NET... but we like to keep it interesting ha ha

[14:39:59] <bangbang> rspijker: If you're on the .NET side of things, why not go with Azure table storage, instead of Mongo?

[14:40:05] <bangbang> I mean

[14:40:14] <bangbang> why mongo, instead of azure table storage

[14:40:19] <bangbang> :P

[14:40:55] <rspijker> for one, azure is crap (we tried it, it’s crap). For another, mongo just fits our use case really well. We also tested some other noSQL DBs and mongo came out on top

[14:41:30] <rspijker> In all fairness, I arrived after the choice for mongo was already made, but I have faith in the guys that did the comparison and I’m fairly happy with mongo so far

[14:41:37] <bangbang> Interesting... I'm curious why you think it's crap. We use the shit of it.

[14:41:44] <rspijker> some quirks/bugs, but pretty good all over

[14:42:11] <bangbang> Not trying to turn this into a "mine is better than yours" debate... I'm genuinely curious.

[14:42:25] <rspijker> We used it for a very high volume/frequency messaging app (millions of transactions per second) and the platform just wasn’t stable

[14:42:31] <bangbang> ah

[14:42:34] <rspijker> we saw loads of random reboots, nodes dropping etc

[14:42:53] <rspijker> we moved the entire instance to another supplier, same architecture, and the issues stopped

[14:43:30] <rspijker> now we’re hosting it ourselves in an environment we virtualize over a couple of blades

[14:43:41] <remonvv> Also, azure table store vs mongodb is a bit of an apples and oranges comparison.

[14:43:52] <rspijker> we can still bring in worker machines from a public cloud if we need to

[14:43:58] <bangbang> brb - phone call

[14:44:17] <rspijker> (we never actully used table store, we discounted azure on the basis of platform stability before we ever even got that far)

[14:44:48] <remonvv> rspijker: Justified

[14:45:22] <bangbang> how long ago was this?

[14:45:38] <bangbang> I'm just comparing my experience, and can actually relate to a certain extent

[14:45:39] <rspijker> Also I, personally, don’t like the lock-in that comes with OTT services from Azure, or for that matter AWS. I’m buying a commodity from them and a big advantage of that is that I can go elsewhere without having to redesign/change anything. As soon as I start using their specific brand of service, I’m committing myself far more

[14:45:56] <rspijker> This was not too long ago, actually… Like 6 to 12 months I’d say

[14:45:59] <rspijker> little closer to 6

[14:46:13] <bangbang> k

[14:46:33] <bangbang> I've actually seen them improve in the last 6 months.

[14:46:46] <bangbang> We had some hiccups along the way, but things seem to be fairly stable.

[14:46:57] <rspijker> too little too late for us :)

[14:47:00] <bangbang> ya

[14:47:00] <bangbang> lol

[14:47:18] <rspijker> also, fairly stable is not a term I’d ever want to hear about my production platform :P

[14:47:27] <bangbang> ok

[14:47:30] <bangbang> it's stable enough!

[14:47:30] <bangbang> ha ha

[14:48:04] <bangbang> We also thought the same about having to redesign/change anything. In our case, we aren't attached to their platform because of how our application is architected.

[14:48:46] <bangbang> I haven't met many people who have really dug into Azure.

[14:49:11] <rspijker> another thing is the fact that they’re a US company…

[14:49:21] <rspijker> that’s a big thing for certain types of data in the EU

[14:49:23] <bangbang> lol

[14:49:32] <bangbang> Where are you located?

[14:49:34] <rspijker> haha

[14:49:38] <rspijker> Netherlands

[14:49:55] <remonvv> That's not necessarily an issue. Amazon is a US company but their EU datacenters fall under EU privacy law

[14:51:00] <remonvv> We went for AWS well before Azure even existed but due to customer pressure (read: heavily sponsored by Microsoft) we have repeatedly looked at Azure. It's a very clean platform but its uptime track record is a little dodgy. They also used to have a bizarre pricing model for stopped instances.

[14:51:29] <wolfen> umm guys can i ask a stupid question please?

[14:51:38] <remonvv> Those are my favorite questions

[14:51:41] <wolfen> :)

[14:51:42] <bangbang> if you want a stupid answer. ;)

[14:51:53] <wolfen> lol i can live with any answer, even if it is go away

[14:51:57] <rspijker> remonvv: that’s dodgy at best… Their still under the patriot act…

[14:52:01] <wolfen> i want to know how to do:

[14:52:30] <rspijker> anyway, it doesn’t even really matter whether it is or it isn’t. What matters is how my customers percieve it. And I can tell you, it can be a deal breaker :)

[14:52:33] <remonvv> rspijker: Amazon Inc is but their EU datacenter is officially under a legal entity in Luxembourg iirc

[14:52:37] <wolfen> set @var = select count(*) from (select 1 as A, 2 as B where A=1) data

[14:52:40] <wolfen> in mongodb

[14:52:51] <wolfen> i.e. create an in memory record and check if it passes a where clause

[14:53:10] <wolfen> something like {a :1, b:2}.find({b:2})

[14:53:19] <wolfen> can you guys help?

[14:53:38] <wolfen> right now i have to put the record in a collection and query it back, which is SLOOOW

[14:53:43] <remonvv> Anyway, I suppose my point is that Amazon/Azure data security is still miles ahead of anything you or a smaller hosting provider would likely offer.

[14:54:24] <wolfen> at least in financial services

[14:54:41] <wolfen> so that excludes azure and aws for us :(

[14:55:19] <luca3m> wolfen: even on crypted devices?

[14:55:36] <luca3m> you can crypt mongodb partition

[14:55:37] <wolfen> jip completely. no customer personal data may be stored on foreign servers

[14:56:00] <luca3m> ah ok

[14:56:04] <wolfen> we have to store local & crypt it

[14:56:10] <luca3m> just for curiosity, what's your country?

[14:56:19] <wolfen> its actually not bad system considering snowden stuff

[14:56:22] <wolfen> south africa

[14:57:01] <wolfen> so, anyone have a brilliant insight on how to write this one line of sql as javascript ?

[14:57:06] <luca3m> some times I have to think about these problems too. for important data is a big deal where to store them

[14:57:11] <wolfen> only way i can think of is to go implement .find myself

[14:58:02] <wolfen> here i was begging the wider internet for help.. but no joy so far: http://stackoverflow.com/questions/24725950/mongodb-determine-if-a-record-would-pass-findone

[14:58:55] <rspijker_> wolfen: I don’t really know SQL that well… so maybe it would help if you could explain what you are actually after, without the SQL :)

[14:59:01] <wolfen> ok

[14:59:09] <wolfen> basically what i want to do is:

[14:59:12] <wolfen> construct a document

[14:59:28] <wolfen> var d = { a : 1, b : 2 }

[14:59:50] <remonvv> wolfen : Sorry, disconnected

[14:59:52] <wolfen> then determine if that document would be "found" if theoretically inserted into a table

[14:59:57] <wolfen> lol ok

[15:00:00] <wolfen> leme repeat

[15:00:03] <wolfen> construct a document

[15:00:07] <wolfen> var d = { a : 1, b : 2 }

[15:00:12] <wolfen> then determine if that document would be "found" if theoretically inserted into a table

[15:00:14] <remonvv> Anyone an opinion on this one by the way : https://jira.mongodb.org/browse/SERVER-12694 Sounds broken to me.

[15:00:34] <wolfen> something like [d].find({b :2 });

[15:00:43] <wolfen> something like [d].findOne({b :2 });

[15:00:54] <wolfen> obviously .findOne doesnt work on Arrays

[15:01:01] <remonvv> Right, so you want to match an arbitrary document against a query without having to store it first

[15:01:06] <wolfen> yes!!!!

[15:01:11] <remonvv> no!!!!

[15:01:28] <wolfen> it is trivial in sql

[15:01:31] <remonvv> I can't think of an effective way to do that.

[15:01:36] <wolfen> in mongodb i cannot find a way

[15:01:47] <remonvv> I'm sure it is. And moving forward is trivial in a car but MongoDB wasn't designed to do either ;)

[15:01:52] <wolfen> right now, im generating a guid, sticking it in a table

[15:02:08] <wolfen> then findOne that table

[15:02:08] <wolfen> and then removing it

[15:02:14] <wolfen> 3 queries

[15:02:14] <remonvv> SQL to MongoDB isn't a matter of rewriting queries. A lot of things are different. Basically the overlap ends with "we both store stuff somewhere"

[15:02:29] <wolfen> and doing this 500+ times a second is serious load i wish to avoid

[15:02:39] <remonvv> wolfen: Well let me first point out that it's probably a design flaw that makes you need this in the first place.

[15:02:51] <remonvv> wolfen: So perhaps share with us what you want to do rather than how you'd like to do it

[15:03:00] <wolfen> yeah true

[15:03:00] <wolfen> ok

[15:03:01] <wolfen> basically i have to go and do a bunch of work, subject to a user determined where clause

[15:03:25] <wolfen> if the data i have matches the where clause, i need to do the work

[15:03:33] <wolfen> else i can give up safely

[15:04:33] <wolfen> for example: i have to go and calculate age, but if the person is < 20 then dont bother

[15:04:34] <rspijker_> but the data isn’t in the DB?

[15:04:43] <wolfen> the data is spread over many records

[15:04:49] <wolfen> the data is spread over many records

[15:05:03] <wolfen> and i assemble the data with lots of effort into 1 record

[15:05:31] <wolfen> then i have to run a user where clause, then a user query and then save the result into another field

[15:05:59] <remonvv> And you have to do this thousands of times per second?

[15:06:06] <rspijker> sounds like a job for the aggregation framework…

[15:06:06] <wolfen> many thousands

[15:06:39] <wolfen> yeah

[15:06:39] <wolfen> it does, but im forced into this solution

[15:07:17] <remonvv> wolfen : It's very hard to comment without more insight which is probably beyond the scope of this channel. What you wanted to do (arbitraryDoc.find()) is not possible. You can either look at reworking your system or scaling up the performance.

[15:07:24] <wolfen> right now im considering making 1000 databases, and then adding the collection into that

[15:07:36] <wolfen> and then doing the insert and remove solution

[15:07:48] <wolfen> at least then i remove the write contention

[15:08:33] <wolfen> o, is cool, at least i now know that there is no trivial solution

[15:08:36] <remonvv> wolfen : That sounds properly dodgy ;)

[15:08:45] <wolfen> yeah it does

[15:08:51] <wolfen> :(

[15:09:31] <wolfen> ok, well thanks for the time on your screen guys :) appreciate it

[15:09:37] <rspijker> good luck :)

[15:09:41] <remonvv> wolfen : There's almost certainly a good solution for this but it's hard to determine what that might be and how much legacy code/schemas you have to work with

[15:10:00] <wolfen> yeah, i think i;ll work with support

[15:10:10] <wolfen> we pay them for a reason :D

[15:10:22] <remonvv> As a general rule I'd not start using different logical databases just to avoid write contention.

[15:10:33] <remonvv> Yeah but we're so much more awesome.

[15:10:40] <wolfen> yeah me too, its just my desperation speaking

[15:11:37] <remonvv> wolfen : Well, sticking multiple shard on the same machine also reduces write contention without the drawback of having to manage 1000 databases. And there's only so many concurrent writes you want before something else becomes a bottleneck

[15:11:54] <wolfen> yeah you are right

[15:12:10] <wolfen> but 3 unneeded db queries just sounds like a problem right there

[15:12:26] <remonvv> Depends on if they're unneeded.

[15:12:39] <wolfen> yeah thats the magic q

[15:12:50] <remonvv> And you can always look at batching. We've gained quite a bit of performance by using 1ms batching for very high throughput queries/updates

[15:12:50] <wolfen> at the moment they are.. but perhaps i should rebuild find for the trivial cases

[15:13:02] <wolfen> ok thanks for the tip

[15:13:54] <remonvv> Basically what we do is park certain requests (not quite but for easier discussion) for 1ms and create work batches that we throw to mongo or batch messages on our message queue.

[15:14:12] <wolfen> wow clever

[15:14:14] <wolfen> i love that idea

[15:14:15] <remonvv> In some cases we gained huge jumps in throughput and it's never worse.

[15:14:44] <remonvv> When we get the batch back we compile responses to the parked requests and push them back.

[15:15:40] <remonvv> FYI, we make systems for apps that provide interactivity with large TV shows so hitting 100,000-200,000 req/sec is not an exception. Hence optimizing this particular sort of use case.

[15:16:03] <wolfen> wow hectic

[15:16:05] <wolfen> and well done

[15:16:09] <wolfen> you give me hope

[15:16:31] <wolfen> at the moment im only achieving about 150 req/sec

[15:16:38] <wolfen> so looking for places to optimise

[15:20:10] <remonvv> Note that that throughput isn't all going to mongo (it's throttled by MQ throughput as most per-user persistence is async). I think a typical peak towards the db is about 30k-40k/sec

[15:20:36] <wolfen> wow still awesome performance

[15:20:39] <remonvv> mongod scales up almost linearly with the amount of shards, only headache is balancing and chunk splitting difficulties

[15:21:35] <remonvv> yeah but that's mostly mongod, we don't do much more than stick all the work on MQs and send the updates to mongo as fast as it will take it.

[15:22:49] <remonvv> Anyway, off topic a bit. Point being, clean solutions > dodgy ones so only go for things like multiple databases/collections for a single logical one if you've exhausted other options

[16:48:00] <inad922> hello

[16:48:10] <inad922> Anyone aware of a filefield like field in mongodb?

[16:48:15] <inad922> I mean mongoalchemy

[17:14:15] <poeticninja> Question. Creating users for a large customer service application. When creating the user id, should I use the mongoose created id or use a separate UUID that is on file that a customer service rep would look up?

[17:15:19] <poeticninja> The reason I ask here is because part of me feels that the user id should not be based on the document _id. This is a discussion at work.

[17:16:09] <poeticninja> From a mongodb point of view, will that document id ever change, if we update the schema, or other reasons?

[18:19:17] <dan___> Hi all! I would like to know how to access to my apache log (access.log and error.log) from a running container?

[18:23:05] <dan___> Anyone?

[18:25:14] <cheeser> um. what? doesn't really sound like a mongo question...

[18:25:33] <dan___> Oups wrong channel :)

[18:25:35] <dan___> Sorry

[18:26:10] <cheeser> :D

[18:26:23] <mitc0185> I have to do a query with a given date range, but my documents all have a string representation of a date rather than a date object (my mistake)

[18:26:41] <mitc0185> is there a way I can convert to a date object in the query so I can search on this attribute?

[18:54:15] <loki97123> hello!

[18:55:13] <loki97123> is there a specific irc for mongodb in the cloud?

[19:04:04] <xueye> Hi guys, I have a quick question about ensuring indexes; I'm using the C# driver to interface with mongo and noticed that Collection.CreateIndex will cause an index to be built, even if the index already exists -- is there any equivalent of EnsureIndex anymore?

[19:24:53] <nicolas_leonidas> hi, I have a collection holding documents that look like this http://paste.debian.net/110315/

[19:25:22] <nicolas_leonidas> I'd like to query for documents that have pid = 78112 for example, but since the keys for history array are urls, I'm facing difficulties

[19:25:37] <nicolas_leonidas> is there anyway around this?

[19:33:48] <nicolas_leonidas> hello?

[20:03:02] <jekle> nicolas_leonidas: i often read about a schema rule of never using values as keys. I guess to avoid such difficulties.

[20:03:24] <nicolas_leonidas> jekle: right

[20:04:36] <jekle> nicolas_leonidas: I am no mongo expert. but maybe with the aggregate, unwind and match functions you can do you query.

[20:46:12] <shin_> hi - i'm working on a data migration library similar to mongeez but doesn't use db.eval() to run migrations and i've got some questions about concurrent migrations. For example, if I have a cluster of application nodes starting up and all trying to run a migration, what kind of things am I going to have to look out for?

[20:54:47] <kali> shin_: why don't you just use a 'magic' document in a well known collection and _id as a lock ?

[20:57:39] <shin_> kali: thanks for the suggestion and sounds like a good one because I think this is what liquibase does also. Sometimes the lock gets left behind if a problem occured and you have to clean it out manually but I don't see a way around that.

[20:58:57] <kali> shin_: you can use a "locked_until" scheme... if the lock is forgotten somehow, when it becomes obsolete, another locker can take it

[20:59:24] <kali> shin_: in that case, you need to make sure you lock for long enough, or have the lock refreshed on a regular basis

[21:01:06] <shin_> kali: yeah i like that, i could refresh the timeout as I execute each change set

[21:04:45] <kali> shin_: https://github.com/kali/extropy/blob/master/core/src/main/scala/MongoLock.scala . no rocket science, but a few corner cases to think about

[21:12:02] <shin_> kali: thanks...now i've got more work to do :0

[21:12:04] <shin_> :)

[22:00:57] <bluesm> Could you give me a hint what's could be a problem: https://gist.github.com/bluesm/eea5a0b741fe073d0b6a

[22:14:19] <bluesm> bluesm: http://stackoverflow.com/questions/19668570/cannot-start-mongodb-stdexception-localefacet-s-create-c-locale-name-not

Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 17th of July, 2014