[04:24:26] <bz-> how come a simple command to count the number of rows (> db.products.find().toArray().length;
[04:24:27] <bz-> ) returns.... "Killed" mere seconds after the command is issued?
[05:02:55] <macwinne_> not sure.. i just do .find().count()
[05:03:44] <joannac> My suspicion would be OOM killer, from turning the whole collection of documents into an array
[07:21:16] <tylkas> Hello. Can I ask someone? Is repo https://repo.mongodb.org/yum/amazon/2013.03/mongodb-org/3.2/x86_64/ repo files (from repomd.xml) are not presented in listing
[07:26:05] <mkozjak> is it a normal behaviour to use mongodb just as a temporary document store and another system for persistent storage? i want to use elasticsearch for persistent storage, but am yet to find out which library supports insert syncs and ignore the deletes...
[09:43:46] <Ange7> I don't understand why for insert in mongo i have error : Uncaught exception 'MongoCursorTimeoutException' with message 'localhost:27017: Read timed out after reading 0 bytes, waited for 30.000000 seconds
[09:44:20] <Derick> it meant that PHP waited 30 seconds for a reply from the server, but it didn't come
[09:44:27] <Derick> ie - your query is taking to long
[09:45:14] <Derick> I would advice to set the cursor time out (through the connection string) to something akin to 5 minutes (300 seconds), and use maxTimeMS with a query option to get better control over what the server does. A cursor timeout is client side, and will also abort the connection.
[09:45:32] <Ange7> Derick: ok but it's insert with j option
[09:45:52] <Derick> it's possible that that takes more than 30 seconds... what does your mongodb log file say?
[09:46:02] <Derick> (also, don't set the j option, it's not needed)
[09:46:19] <Ange7> I setted j option to insert async
[09:46:46] <Derick> That sentence makes no sense. How did you come to that conclusion?
[09:47:59] <Ange7> I readed the PHP doc, i would like my insert was async. so i setted J option
[09:48:15] <Derick> sorry, that makes again no sense
[09:50:13] <Derick> where does it say this in the PHP docs? Cause if it really does, I'll fix that right away.
[09:51:47] <Ange7> When using unacknowledged writes (w=0) and the replica set has failed over, there will be no way for the driver to know about the change so it will continue and silently fail to write.
[10:17:51] <kurushiyama> Derick Just checked the docs. If j does not have an effect on WT (which I could imagine,given how wt works), it is not documented.
[10:18:27] <Derick> kurushiyama: I even thought it doesn't do anything in 3.2 at all anymore, but I need to check that
[10:20:21] <kurushiyama> Ange7 Ok, let us take 1.6k inserts/s for granted – that is quite a lot.
[10:21:51] <kurushiyama> Ange7 can you describe your setup a bit? Replset/sharded/standalone? RAM/CPU? SSDs or HDDs? RAID level?
[10:22:07] <Derick> and how an insert gets "created"
[10:22:08] <kurushiyama> Ange7 If sharded: what shard key?
[16:04:02] <crazyphil> are pymongo questions handled here or in another channel?
[16:04:34] <kurushiyama> crazyphil You may ask, but you may not get an answer ;)
[16:05:12] <crazyphil> I am wondering if the pymongo tailable cursor *can only* be used with capped collections, or if it would work with a regular collection?
[16:05:50] <crazyphil> (I am basically wanting to write a python program to pull all documents from a given collection and export them to another location - so I want to read from the beginning, and keep grabbing new records from the end as they arrive)
[16:08:50] <kurushiyama> crazyphil Tailable cursors in general can only be used with capped collections. This has something to do with how capped collections work on the storage level. Basically, only capped collections can guarantee that documents are returned in insertion order.
[16:09:24] <crazyphil> ok, this would explain why I see examples that use the oplog
[16:09:47] <kurushiyama> crazyphil Of course you can do a normal query, sorted by an insertion date and use a cursor to iterate.
[16:09:51] <crazyphil> what if I don't care about the order though?
[16:12:32] <kurushiyama> crazyphil Wait. What are you trying to _do_?
[16:13:06] <crazyphil> kurushiyama: rather than write a bunch of different apps that go after the data in Mongo directly, I'm trying to export this collection to Kafka
[16:13:13] <crazyphil> since most of my apps already grab stuff from Kafka
[16:15:38] <crazyphil> the Kafka structure just works out better since we also end up dumping a lot of stuff in Spark
[16:15:50] <kurushiyama> crazyphil What I wanted to say is that scaling needs tend to get overestimated nowadays. And successful complex systems usually evolve from successful simple systems.
[16:16:43] <crazyphil> now we're getting ready to scale to several hundred million messages a day that need to end up in various distributed systems
[16:17:11] <kurushiyama> crazyphil Ah, spark involved, too. Hm, ok, let's get into that in a bit more detail. You have messages coming in, stored into MongoDB and you want to pass them into kafka, right?
[16:17:35] <crazyphil> and I do need to perform some light touch-up on some of the data in these documents
[16:18:07] <crazyphil> I don't think we're using Camel
[16:18:18] <crazyphil> I'm pretty sure we're using a NodeJS consumer
[16:19:34] <crazyphil> or are you saying use camel to pull from mongo
[16:19:43] <crazyphil> as I just found that doc set
[16:21:00] <kurushiyama> I'd create a producer endpoint on camel and have it forwarded to kafka directly. You should be able to easily plug your node thingy before the producer
[16:21:22] <kurushiyama> Integration via DB is so 90s ;)
[16:23:03] <kurushiyama> Your node could call a REST interface, or send an email, or whatever to get it into camel, and then you can send it to kafka.
[16:23:44] <crazyphil> actually, I can do a batch import to grab everything I have up until now, and then fire up a persistent tail tracking route from Camel
[16:24:12] <crazyphil> now what I find interesting is that in all my searches, Camel never once came up
[16:25:21] <kurushiyama> crazyphil It is sadly a not so well known project, albeit its extreme usefulness. Servicemix is a distribution of Camel, MQ and Karaf. If done correctly, it scales massively.
[16:25:49] <crazyphil> Camel might be useful to me in other ways I've not yet envisioned
[16:25:59] <kurushiyama> crazyphil I can not go into details, but the enterprise backbone of at least one global telecommunications company runs on servicemix.
[16:27:15] <kurushiyama> crazyphil basically, you decouple your applications with it, and implement the business rules in relatively small, dedicated modules, which can be replaced without downtime.
[16:27:24] <crazyphil> so KaraF, as opposed to Docker or Rocket or CoreOS?
[16:27:43] <kurushiyama> crazyphil Karaf is incomparable to that.
[16:28:57] <kurushiyama> crazyphil it is an OSGI container. You can replace modules without message loss. If you have business-logic-1.0.1, and replace it with business-logic-1.0.2, karaf makes sure that the transition is flawless.
[16:29:23] <kurushiyama> crazyphil I still would use a maintenance window, but proper testing provided, it should be mere minutes.
[16:31:15] <kurushiyama> crazyphil karaf actually loads mq, camel, your camel routes and whatever you need. If done properly, you have your setup in a dedicated environment within karaf, making it extremely easy to deploy it multiple times.
[16:32:13] <kurushiyama> crazyphil If you have more questions, please PM me, since we surely do not want to derail #mongodb
[16:33:06] <crazyphil> ahh, I know why I've not run into this yet, it's all Java based
[16:33:29] <crazyphil> and what's being built is primarily NodeJS based, with a handful of Python and C++
[17:01:41] <edrocks> does anyone know why or have links about why mongodb switched back to spidermonkey from v8 in 3.2?
[17:03:30] <Derick> edrocks: some of the changes in v8 made it abort in certain situations, which is not good for embedding it into the DB
[17:04:51] <edrocks> why didn't they give any info. seems like a pretty major change
[17:07:00] <Derick> i think there was also something to do with threading
[17:07:08] <Derick> I'm sure it was in the release notes though
[17:11:01] <kurushiyama> Darn, I did not even knew...
[17:12:34] <Lonesoldier728> Hey so I have one database, and I have three collections, tags for searching, items, and users - Is it more efficient to have them each in their own database? or at least the searching since it does not interact with the other collections as it stores everything it needs for a search?
[20:26:16] <FuzzySockets> I have a users table that keeps an array of sub documents tracking each login. How could I move the logins: [] array of collections into it's own collection keyed by the owner id?
[21:45:00] <neybar> is there any recommended best practice on where to create a user? I can create the user in the database I want them to have access to, or in another database (admin for example) and assign a role pointing at the correct db. Seems like the later is easier to manage since you can see all the users in one place, but maybe not?