PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 5th of April, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[04:24:26] <bz-> how come a simple command to count the number of rows (> db.products.find().toArray().length;
[04:24:27] <bz-> ) returns.... "Killed" mere seconds after the command is issued?
[05:02:55] <macwinne_> not sure.. i just do .find().count()
[05:03:44] <joannac> My suspicion would be OOM killer, from turning the whole collection of documents into an array
[07:21:16] <tylkas> Hello. Can I ask someone? Is repo https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.2/x86_64/ repo files (from repomd.xml) are not presented in listing
[07:21:27] <tylkas> *is repo feels ok?
[07:24:52] <mkozjak> hi
[07:26:05] <mkozjak> is it a normal behaviour to use mongodb just as a temporary document store and another system for persistent storage? i want to use elasticsearch for persistent storage, but am yet to find out which library supports insert syncs and ignore the deletes...
[08:30:30] <krion> hello
[08:30:51] <krion> i got three config srv, one as an unclean shutdown, i'm running mongo 2.6
[08:31:02] <krion> can i just do a --repair ? should be enough i guess...
[09:02:17] <krion> i can ;)
[09:02:34] <Derick> :D
[09:43:18] <Ange7> Hello
[09:43:46] <Ange7> I don't understand why for insert in mongo i have error : Uncaught exception 'MongoCursorTimeoutException' with message 'localhost:27017: Read timed out after reading 0 bytes, waited for 30.000000 seconds
[09:44:20] <Derick> it meant that PHP waited 30 seconds for a reply from the server, but it didn't come
[09:44:27] <Derick> ie - your query is taking to long
[09:45:14] <Derick> I would advice to set the cursor time out (through the connection string) to something akin to 5 minutes (300 seconds), and use maxTimeMS with a query option to get better control over what the server does. A cursor timeout is client side, and will also abort the connection.
[09:45:32] <Ange7> Derick: ok but it's insert with j option
[09:45:52] <Derick> it's possible that that takes more than 30 seconds... what does your mongodb log file say?
[09:46:02] <Derick> (also, don't set the j option, it's not needed)
[09:46:19] <Ange7> I setted j option to insert async
[09:46:46] <Derick> That sentence makes no sense. How did you come to that conclusion?
[09:47:59] <Ange7> I readed the PHP doc, i would like my insert was async. so i setted J option
[09:48:15] <Derick> sorry, that makes again no sense
[09:48:19] <Derick> j is for journalling
[09:48:28] <Derick> you can't async insert from PHP
[09:48:33] <Ange7> ok ... lol
[09:48:43] <Ange7> even with w => 0 option ?
[09:48:59] <Derick> it's not async. You just never *see* the response, the driver still needs to wait for it
[09:49:03] <Derick> w=0 is something from 3 years ago
[09:49:27] <Ange7> it's impossible to make async with PHP Driver
[09:49:43] <Derick> correct
[09:49:51] <Ange7> ok...
[09:50:13] <Derick> where does it say this in the PHP docs? Cause if it really does, I'll fix that right away.
[09:51:47] <Ange7> When using unacknowledged writes (w=0) and the replica set has failed over, there will be no way for the driver to know about the change so it will continue and silently fail to write.
[09:51:54] <Ange7> i understood that it async.
[09:56:04] <Ange7> so i need to find an async driver
[10:00:33] <Derick> Ange7: why? For most apps, it's not needed.
[10:01:51] <kurushiyama> Derick The driver is strange, as far as I read it. When w = 9 && ! j , it handles the queries async, as far as I can see.
[10:02:04] <kurushiyama> s/9/0/
[10:03:47] <Ange7> Derick: cause i have 1500 insert per second (~2M / day)
[10:03:56] <kurushiyama> Derick Could you explain why j is not needed from your point of view?
[10:04:02] <kurushiyama> Ange7 Your math lacks
[10:04:38] <Ange7> It's my stats
[10:04:55] <kurushiyama> Ange7 one of them is wrong
[10:05:08] <Ange7> it's average. i don't have same insert the night / the day
[10:05:30] <kurushiyama> Ange7 2M/86400 is a bit more tan 23.
[10:06:45] <Derick> kurushiyama: j doesn't do anything anymore IIRC
[10:06:51] <Ange7> actualy i have 1600 insert /sec
[10:07:04] <Derick> kurushiyama: at least not with WiredTiger
[10:07:17] <Derick> Ange7: the amount of inserts/sec does not change if you use async
[10:07:27] <Derick> you're concerned about throughput
[10:07:57] <Ange7> yes but my script is not blocked
[10:08:07] <Ange7> actually my script waiting that insert is done
[10:08:16] <Derick> so batch up your inserts
[10:09:16] <Ange7> i can't
[10:09:26] <Derick> why not?
[10:09:38] <Derick> do you spawn a script for each insert?
[10:10:06] <Ange7> Yes, cause it's a front script, each insert for each user connected
[10:10:26] <Derick> that's going to be your bottleneck then, not inserting into the database
[10:11:09] <Ange7> huhu
[10:11:22] <kurushiyama> Are we talking of PHP?
[10:17:51] <kurushiyama> Derick Just checked the docs. If j does not have an effect on WT (which I could imagine,given how wt works), it is not documented.
[10:18:27] <Derick> kurushiyama: I even thought it doesn't do anything in 3.2 at all anymore, but I need to check that
[10:20:21] <kurushiyama> Ange7 Ok, let us take 1.6k inserts/s for granted – that is quite a lot.
[10:21:51] <kurushiyama> Ange7 can you describe your setup a bit? Replset/sharded/standalone? RAM/CPU? SSDs or HDDs? RAID level?
[10:22:07] <Derick> and how an insert gets "created"
[10:22:08] <kurushiyama> Ange7 If sharded: what shard key?
[16:04:02] <crazyphil> are pymongo questions handled here or in another channel?
[16:04:34] <kurushiyama> crazyphil You may ask, but you may not get an answer ;)
[16:04:40] <crazyphil> ok then
[16:05:12] <crazyphil> I am wondering if the pymongo tailable cursor *can only* be used with capped collections, or if it would work with a regular collection?
[16:05:50] <crazyphil> (I am basically wanting to write a python program to pull all documents from a given collection and export them to another location - so I want to read from the beginning, and keep grabbing new records from the end as they arrive)
[16:08:50] <kurushiyama> crazyphil Tailable cursors in general can only be used with capped collections. This has something to do with how capped collections work on the storage level. Basically, only capped collections can guarantee that documents are returned in insertion order.
[16:09:24] <crazyphil> ok, this would explain why I see examples that use the oplog
[16:09:47] <kurushiyama> crazyphil Of course you can do a normal query, sorted by an insertion date and use a cursor to iterate.
[16:09:51] <crazyphil> what if I don't care about the order though?
[16:10:01] <kurushiyama> crazyphil ^
[16:10:19] <crazyphil> ah, I get it now
[16:11:15] <kurushiyama> crazyphil A tailable cursor basically gets "notified" when a new document is inserted to the capped collection tailed.
[16:11:40] <crazyphil> and because the oplog is pretty much a capped collection, the tailable works there
[16:11:57] <crazyphil> ok, makes sense
[16:12:03] <crazyphil> so I'll need to write two programs now
[16:12:09] <kurushiyama> crazyphil The oplog is nothing more than a capped collection, with special semantic meaning.
[16:12:18] <crazyphil> I get it
[16:12:26] <crazyphil> just needed some better perspective on it
[16:12:29] <crazyphil> thanks kurushiyama
[16:12:32] <kurushiyama> crazyphil Wait. What are you trying to _do_?
[16:13:06] <crazyphil> kurushiyama: rather than write a bunch of different apps that go after the data in Mongo directly, I'm trying to export this collection to Kafka
[16:13:13] <crazyphil> since most of my apps already grab stuff from Kafka
[16:13:38] <crazyphil> it's Apache Kafka
[16:13:59] <crazyphil> it's basically a message broker that massively scales
[16:14:52] <kurushiyama> crazyphil I have done some vodoo with MQ, dealing with several hundred messages a second on a single instance ;)
[16:15:03] <crazyphil> I'm aware of MQ
[16:15:10] <crazyphil> and Redis
[16:15:38] <crazyphil> the Kafka structure just works out better since we also end up dumping a lot of stuff in Spark
[16:15:50] <kurushiyama> crazyphil What I wanted to say is that scaling needs tend to get overestimated nowadays. And successful complex systems usually evolve from successful simple systems.
[16:16:17] <crazyphil> well, we started simple
[16:16:43] <crazyphil> now we're getting ready to scale to several hundred million messages a day that need to end up in various distributed systems
[16:17:11] <kurushiyama> crazyphil Ah, spark involved, too. Hm, ok, let's get into that in a bit more detail. You have messages coming in, stored into MongoDB and you want to pass them into kafka, right?
[16:17:19] <crazyphil> corerct
[16:17:24] <kurushiyama> Camel
[16:17:27] <kurushiyama> ?
[16:17:35] <crazyphil> and I do need to perform some light touch-up on some of the data in these documents
[16:18:07] <crazyphil> I don't think we're using Camel
[16:18:18] <crazyphil> I'm pretty sure we're using a NodeJS consumer
[16:19:34] <crazyphil> or are you saying use camel to pull from mongo
[16:19:43] <crazyphil> as I just found that doc set
[16:21:00] <kurushiyama> I'd create a producer endpoint on camel and have it forwarded to kafka directly. You should be able to easily plug your node thingy before the producer
[16:21:22] <kurushiyama> Integration via DB is so 90s ;)
[16:21:34] <crazyphil> no doubt
[16:21:55] <crazyphil> I'm in very uncharted territory for myself, so I default to building things out of Python
[16:22:14] <crazyphil> never even dawned on me that Apache would have built something like Camel
[16:22:20] <kurushiyama> crazyphil http://camel.apache.org/kafka.html
[16:23:03] <kurushiyama> Your node could call a REST interface, or send an email, or whatever to get it into camel, and then you can send it to kafka.
[16:23:44] <crazyphil> actually, I can do a batch import to grab everything I have up until now, and then fire up a persistent tail tracking route from Camel
[16:24:10] <kurushiyama> crazyphil Yup.
[16:24:12] <crazyphil> now what I find interesting is that in all my searches, Camel never once came up
[16:25:21] <kurushiyama> crazyphil It is sadly a not so well known project, albeit its extreme usefulness. Servicemix is a distribution of Camel, MQ and Karaf. If done correctly, it scales massively.
[16:25:49] <crazyphil> Camel might be useful to me in other ways I've not yet envisioned
[16:25:59] <kurushiyama> crazyphil I can not go into details, but the enterprise backbone of at least one global telecommunications company runs on servicemix.
[16:27:15] <kurushiyama> crazyphil basically, you decouple your applications with it, and implement the business rules in relatively small, dedicated modules, which can be replaced without downtime.
[16:27:24] <crazyphil> so KaraF, as opposed to Docker or Rocket or CoreOS?
[16:27:43] <kurushiyama> crazyphil Karaf is incomparable to that.
[16:28:57] <kurushiyama> crazyphil it is an OSGI container. You can replace modules without message loss. If you have business-logic-1.0.1, and replace it with business-logic-1.0.2, karaf makes sure that the transition is flawless.
[16:29:23] <kurushiyama> crazyphil I still would use a maintenance window, but proper testing provided, it should be mere minutes.
[16:31:15] <kurushiyama> crazyphil karaf actually loads mq, camel, your camel routes and whatever you need. If done properly, you have your setup in a dedicated environment within karaf, making it extremely easy to deploy it multiple times.
[16:32:13] <kurushiyama> crazyphil If you have more questions, please PM me, since we surely do not want to derail #mongodb
[16:33:06] <crazyphil> ahh, I know why I've not run into this yet, it's all Java based
[16:33:29] <crazyphil> and what's being built is primarily NodeJS based, with a handful of Python and C++
[17:01:41] <edrocks> does anyone know why or have links about why mongodb switched back to spidermonkey from v8 in 3.2?
[17:03:30] <Derick> edrocks: some of the changes in v8 made it abort in certain situations, which is not good for embedding it into the DB
[17:03:34] <Derick> only one I remember :-/
[17:04:51] <edrocks> why didn't they give any info. seems like a pretty major change
[17:07:00] <Derick> i think there was also something to do with threading
[17:07:08] <Derick> I'm sure it was in the release notes though
[17:11:01] <kurushiyama> Darn, I did not even knew...
[17:12:34] <Lonesoldier728> Hey so I have one database, and I have three collections, tags for searching, items, and users - Is it more efficient to have them each in their own database? or at least the searching since it does not interact with the other collections as it stores everything it needs for a search?
[19:21:18] <Derick> dddh__: how nice! why? :-)
[19:22:02] <dddh__> Derick: just want me to fill some survey
[19:22:21] <dddh__> "Help us out by answering 10 questions about your cloud usage!
[19:22:22] <dddh__> (You might even win an Amazon gift card)"
[19:24:18] <dddh__> by the way I received this email after canceling audible membership :(
[19:33:16] <Derick> it says "might win"
[19:37:42] <dddh__> :D
[20:26:16] <FuzzySockets> I have a users table that keeps an array of sub documents tracking each login. How could I move the logins: [] array of collections into it's own collection keyed by the owner id?
[21:45:00] <neybar> is there any recommended best practice on where to create a user? I can create the user in the database I want them to have access to, or in another database (admin for example) and assign a role pointing at the correct db. Seems like the later is easier to manage since you can see all the users in one place, but maybe not?