[02:52:30] <ycon_> Hi all, I'm trying to run mongoDB for the F8 app. Not having success- ECONNREFUSED keeps happening on the port http://dpaste.com/0DXPRSE
[03:04:45] <joannac> can you connect manually with the mongo shell ycon_ ?
[03:14:46] <joannac> ycon_: okay, so pastebin me what you typed and what the result was
[03:16:45] <ycon_> joannac: pastebin under load. dpasted shell session http://dpaste.com/0PEKRGV
[03:16:58] <joannac> 2016-05-09T13:08:01.320+1000 I STORAGE [initandlisten] exception in initAndListen: 29 Data directory /data/db not found., terminating
[03:17:55] <ycon_> the instructions here really dont make much sense then hey https://github.com/fbsamples/f8app
[03:19:49] <ycon_> It still doesn't find /data/db when I mkdir -p data/db. It says permission denied if I mkdir -p /data/db (first slash). That the issue? http://dpaste.com/2WTGMAM
[03:28:19] <ycon_> joannac: Any ideas on getting it running? the first slash causes an issue http://dpaste.com/2WTGMAM
[03:30:29] <ycon_> right- it needs to be at the root dir (sudo mkdir -p /data/db)
[03:35:52] <ycon_> joannac: Now that I've installed to the root- it tells me file: /data/db/mongod.lock errno:13 Permission denied Is a mongod instance already running?, terminating2016-05-09T13:27:27.035+1000 I CONTROL [initandlisten] dbexit: rc: 100
[07:02:51] <Adam0410> Hey guys, I am trying to store a javascript function in a mongo database, for me to retrieve and run (from within node.js not the mongo shell) at a later date, is there a way to do this?
[08:53:08] <jokke> i'm having a bit of trouble setting up sharding
[08:53:25] <jokke> if i add a shard i get this error message: "Can't use localhost as a shard since all shards need to communicate. Either use all shards and configdbs in localhost or all in actual IPs. host: ip-172-31-29-6.us-west-2.compute.internal:27018 isLocalHost:0"
[08:54:18] <jokke> i don't quite understand what's going on. the host i try to add is not localhost
[09:04:15] <kurushiyama> jokke: You _configure_ your sharded setup. Now, how should shard a reach a host which is called "localhost", denoting a totally different machine?
[09:33:31] <kurushiyama> gain_: Your modelling looks wrong to me. You are logging user logins with access tokens, I assume?
[09:35:30] <kurushiyama> jokke: Well, please add check the configs of the other servers. Please add the sh.addShard() commands you used. The shards and the config server should all listen to something != 127.0.0.1
[09:38:16] <gain_> kurushiyama: I'm a noob, so if you say to me that is wrong I trust you... where can I find some doc about nosql data modeling and best practises?
[09:39:11] <kurushiyama> gain_: The thing is this: You impose a theoretical limit on the numbe of logins with that model, since BSON docs are limited to 16MB
[09:40:07] <kurushiyama> gain_: In your case, I'd do {username:"me",date:someISODate, token:"abcd"}
[09:40:33] <jokke> i assume if `mongo ip-172-31-29-6.us-west-2.compute.internal:27018` works, DNS and port configs are configured properly
[09:41:26] <kurushiyama> jokke: each single node is required to be able to access the host. So that would be config and mongos, iir your setup correctly.
[09:41:37] <gain_> kurushiyama: so every document in a collection is limited to 16MB?
[10:15:17] <jokke> i didn't think that's necessary
[10:15:38] <jokke> i thought it's mongos <-> configsvr <-> shards
[10:16:48] <kurushiyama> jokke: Rule of thumb: each member of a cluster needs to be able to resolve every other member + most likely acess the according ports.
[10:18:14] <jokke> then i don't quite understand why a mongos should run on the application server. I don't see why the application server should be accessible by the db host
[10:23:08] <kurushiyama> jokke: It does not necessarily have to access it: but it needs to be able to resolve it.
[10:23:43] <kurushiyama> jokke: Since you are experiencing problems, you should make sure that everything works as expected first before gettign smart on it ;P
[12:28:03] <zerOnepal> Hi there, what is the best recommendation on deleting the huge mongo records, I need to delete about 800GB of data on a production server
[12:28:54] <kurushiyama> zerOnepal: documents in collection, collection, whole databases or complete instances?
[12:29:36] <zerOnepal> its a single collection mostly, where I have indexed the created_at field
[12:30:07] <kurushiyama> zerOnepal: drop the collection and recreate the indices, if needed.
[12:30:54] <zerOnepal> currently I am using sth like this: to fetch a record I want to delete, and delete it on loop, db.collection.findOne(created_at: {$lt: new Date('2015/11/30 11:59:59');}})
[12:31:41] <kurushiyama> zerOnepal: As for dropping the collection: http://bfy.tw/5fSm
[12:31:42] <zerOnepal> but I wish there exists some batch deletion things, which supports query and doesn't hamper the production traffic
[12:32:06] <kurushiyama> zerOnepal: it is called bulk operations. And every operation affects the other.
[12:32:23] <zerOnepal> I can't drop the whole collections, I need to keep last 8 months records intact
[12:32:42] <zerOnepal> and remove all the records older than 8 months
[12:33:17] <zerOnepal> and that collection is about 800GB :(
[12:34:49] <kurushiyama> zerOnepal: That is why I asked.
[12:35:04] <kurushiyama> zerOnepal: use a TTL index, maybe
[12:36:37] <kurushiyama> zerOnepal: Depends on wether you want to delete the old documents once or wether you want to always keep data for 8 months only
[12:37:59] <kurushiyama> zerOnepal: As for the removal: db.yourcoll.drop({created_at:{$lt: ISODateEigthMonthAgo }})
[12:41:09] <zerOnepal> isn't this db.yourcoll.drop({created_at:{$lt: ISODateEigthMonthAgo }}) expensive for 800 GB weigh collection ?
[12:42:10] <kurushiyama> zerOnepal: Again, you can not have the cake and eat it. If your hardware dimensions do not fit your use cases, there is something seriously wrong with your setup.
[12:42:22] <kurushiyama> cheeser: Sorry, you are right. It is remove, of course
[12:44:52] <zerOnepal> yes, kurushiyama data seems to be accumulated since 5years back, and never been archived...
[12:46:11] <kurushiyama> zerOnepal: There are 3 ways to reduce the impact of such an operation: Add a random wait while iterating over the docs, use bulk operations and execute them after some waiting or do it during low load times.
[12:46:50] <kurushiyama> zerOnepal: If you only ever want to keep the last 8 months, I'd guess that using a TTL index is the best option.
[12:46:59] <zerOnepal> yes, I am polling the less impact time window
[12:47:39] <zerOnepal> but would that be applicable for very old data, like I have data of 5years horizon
[12:47:41] <kurushiyama> zerOnepal: With using a TTL index, you basically make sure that your collection will never accumulate more than 8month + 1 minute of data
[12:48:00] <kurushiyama> zerOnepal: How should I know? Not my use case...
[12:54:49] <kurushiyama> cheeser: Which reminds me of a question I have: Would it make sense to have a "TTL deletion window" similar to the balancing window to reduce the impact of TTL deletions in case there are a lot of deletions at a single point in time? Not that I personally had any noticeable impact of TTL deletions, just curious. I guess basically the question is "How does the TTL removal work internally?"
[13:05:35] <cheeser> kurushiyama: i can imagine where that would be useful, sure.
[13:06:09] <kurushiyama> cheeser: Maybe I should write a feature request and see wether people like it.
[13:06:33] <cheeser> can't hurt. you'll likely be told to implement that in your app, though.
[13:07:07] <cheeser> adding new knobs to twiddle has a high bar.
[14:10:28] <kurushiyama> hemangpatel: I am pretty sure that I told you where to look for the problem, and even gave you a hint what to look for...
[15:48:58] <bros> I know Mongoose sucks, but can anybody here answer a question related to its Schema class?
[15:49:53] <kurushiyama> bros: Aside from "Do not use it"? I am afaraid I can not. If it is about modelling in general, I may.
[16:25:05] <emacsnw> suppose I have a collection of users (email, status, ...). with index {email: 1, status: 1}, query db.users.find({email: "foo@bar.org", status: {$in: [0, 1]}}) scanns the same amount of documents as db.users.find({email: "foo@bar.org"}) does.
[16:31:50] <landonn> cheeser: did you just kick me fron ##java ? i said it was the wrong channel, no need to do that
[16:33:19] <landonn> i don't like how a mongodb employee kicks people who critize mongodb in a Java community channel anyway. what an asshole.
[16:36:28] <StephenLynx> i don't think hes an employee though
[16:36:57] <StephenLynx> I also don't think what people do on other channels is pertinent to this channel.
[16:38:17] <kurushiyama> landonn: You make a lot of assumptions and conclusions based on them. And then, calling somebody an asshole based on mere assumptions is not exactly what I would call polite, reasonable or justified.
[16:39:14] <landonn> me neither, but it's true as far as i can tell
[16:39:15] <kurushiyama> Nor helpful, for that matter.
[16:39:27] <landonn> irc doesn't have to helpful. a public company does.
[16:39:42] <StephenLynx> again, I am pretty sure he isn't employed by mongodb
[16:39:44] <kurushiyama> landonn: Well, good luck with that attitude.
[16:40:07] <StephenLynx> and even if he was, this is not an official channel.
[16:40:37] <StephenLynx> so you would have to take it to somewhere where the company actually handles.
[16:40:41] <StephenLynx> like their customer support.
[16:41:13] <landonn> that's besides the point. an employee shouldn't do that.
[17:06:11] <BrianBoyko> Hello. I'm getting a very weird error from Mongo. It works perfectly from one view of my front-end, but not another, despite the code being almost identical. :( I get AssertionError: { [MongoError: key $end must not start with '$'] -- problem is, I'm not using $end anywhere that I can see.
[17:08:00] <xialvjun> why mongodb save it's _id as an ObjectID rather than a string?
[17:08:35] <xialvjun> I have a foreignkey bid ref to another collection's _id...I don't know should I save the bid as a normal string or an ObjectID
[17:25:09] <silviolucenajuni> xialvjun ObjectID is smallest that String. Have a question about this on StackOverflow http://stackoverflow.com/questions/18680462/mongodb-benefit-of-using-objectid-vs-a-string-containing-an-id
[17:37:03] <kurushiyama> xialvjun: Just to let you know that you do not _have_ to use ObjectId. For a lot of applications, a more natural _id might well be useful.
[17:42:36] <cheeser> for the record, i didn't kick anyone.
[18:29:40] <skullcrasher> would you recommend mongodb also for tasks that could be "easier" solved by using a rdbms like user management? Reason I'm asking is I can think of this user managment in mysql and the rest of the data in mongodb, but all in on db would be nice maybe
[18:30:51] <kurushiyama> skullcrasher: a) User management is at least as easy in MongoDB as it is in MySQl b) even if it wasn't dealing with two datasources creates an overhead negating any, even just perceived advantage.
[18:31:26] <skullcrasher> kurushiyama, a) not for me (coming from mysql :P) b) totally agreed
[18:32:35] <skullcrasher> so basically going "the hard way", and looking how to transfer such stuff to mongodb
[18:32:58] <kurushiyama> skullcrasher: If user management is hard for you in MongoDB, I would not suggest to implement something security related until it is easy for you.
[18:33:35] <skullcrasher> kurushiyama, hmm difficult as the rest needs a user to work :D
[18:33:35] <kurushiyama> skullcrasher: No offense, but it is _extremely_ easy in MongoDB, and if it is hard, you probably do not know enough, yet.
[18:33:59] <skullcrasher> kurushiyama, well I know about embedding etc, just not sure on how to setup this all properly.
[18:34:21] <skullcrasher> without having to much duplicate/redundant information
[18:34:42] <kurushiyama> skullcrasher: For starters: Embedding should only be used for limited sized subdocs or arrays of subdocs.
[18:36:17] <skullcrasher> kurushiyama, so redundancy is ok in many parts I think, just curious about updating this then.
[18:36:39] <skullcrasher> if e.g. user adress is saved in 4 different locations and he updates this.
[18:37:03] <kurushiyama> skullcrasher: Here is what I usually suggest: First, et your use cases /user stories right. Next, derive the questions you have on your data from them. Then, model accordingly, so that your questions get _answered_ in the most efficient way.
[18:37:13] <skullcrasher> although a heavier update wouldn't be that problem then, as it isn't changed that often, right?
[18:37:24] <kurushiyama> skullcrasher: Let us say you have a collection failed logins
[18:37:37] <kurushiyama> skullcrasher: And let us say a user is mutable
[18:37:56] <kurushiyama> skullcrasher: Basically, you are on the right track already
[18:38:38] <skullcrasher> kurushiyama, well I did the mongodb java course (was quite good, but missed many things already :P)
[18:38:38] <kurushiyama> skullcrasher: because you correctly stated that a user changes his username probably... what... every half year?
[18:38:48] <skullcrasher> just the schema modelling is still hard somehow
[18:39:27] <kurushiyama> skullcrasher: Well, Here is what I suggest: Forget JPA. Forget JDO, and for god's sake, forget Spring Data.
[18:39:28] <skullcrasher> the "technical" stuff like sharding, queries etc on an existing data structure is ok to do
[18:40:04] <skullcrasher> but creating good new ones is hard :P
[18:40:38] <kurushiyama> skullcrasher: Presumably, you are accustomed to "Identify entity, identify properties, identify relations"
[18:41:09] <kurushiyama> skullcrasher: With NoSQL in general, and MongoDB in particular, you do it the other way around.
[18:41:26] <kurushiyama> skullcrasher: use case => questions to data => model
[18:43:10] <skullcrasher> kurushiyama, ok. I think I should find more to read about data modelling until asking new questions ;P
[18:43:26] <jeroentrappers> I have a small question about the mongodb journal prealloc files. We have a deployment on AWS using 3 volumes, one for the rootFS, one for the data and one for the logs. It seems that the journal prealloc files /var/lib/mongodb/journal are allocated on the rootFS volume, while the actual data and journal reside on /data (data volume)
[18:43:29] <kurushiyama> skullcrasher: Nah, it is ok. Better ask first than later
[18:43:37] <jeroentrappers> It seems illogical to prealloc on a different volume
[18:43:46] <jeroentrappers> is there some way to configure this?
[18:44:02] <kurushiyama> jeroentrappers: setting acording mountpoints comes to my mind
[18:44:59] <jeroentrappers> yes, sure... I get it.
[18:46:34] <jeroentrappers> Would it not be more logical for mongodb to prealloc the journal files under the database directory as well?
[18:47:11] <jeroentrappers> we just didn't take the prealloc files into consideration when choosing mount points for our data volume. Obviously
[18:47:25] <hdkiller> i am facing some scaling issues and i would like to ask what route i should choose to be able to handle more connections/requests per secs on a 'relative small' dataset
[18:47:58] <hdkiller> i need to set up replicasets? or is there a decent up to date guide what points me toward the proper way of doing this?
[18:48:27] <jeroentrappers> hdkiller: can you scale vertically first?
[18:48:36] <jeroentrappers> faster cpu / memory / ssd
[18:51:22] <jeroentrappers> yes, sure... But I would advice to setup a replicaset, and then apply sharding as well
[18:52:00] <hdkiller> what does route the operations? it's need to be done on app level?
[18:52:10] <kurushiyama> I am with jeroentrappers here. If you have enough data for a sharded cluster, it is probably worth thinking about availability.
[18:52:27] <hdkiller> i need high avaiability of cours, fail over, etc
[18:52:49] <kurushiyama> hdkiller: replicated shards, then
[18:52:53] <jeroentrappers> hdkiller -> read up on replica sets for availability, and apply sharding for more performance, better distribution of load
[18:53:20] <kurushiyama> hdkiller: How urgent is it?
[18:55:09] <kurushiyama> Or https://university.mongodb.com/certified_professional_finder/
[18:55:30] <Derick> oh, I didn't know we had that list
[18:55:32] <kurushiyama> But with that pressure, I'd probably rather go for Derick's suggestion.
[18:55:57] <kurushiyama> Derick: Well, I was "invited" to have me listed there, so I happen to know ;)
[18:55:59] <jeroentrappers> I was still thinking about my peralloc files, and instead of changing the mount points, I could just symlink the /var/lib/mongodb/journal onto the /data volume
[18:56:42] <jeroentrappers> that should also fix my volumes / data distribution problem one per node.
[18:57:49] <uatec> hey, it's been a while since i wrote some upsert code
[18:57:59] <Derick> jeroentrappers: symlinks work - it's what I use too
[18:58:20] <uatec> my document has a bunch of children fields, and one of them is an array. When I do my upsert, it merges the old and new arrays.
[18:58:33] <uatec> However, the old and new are both the same, so i'm getting loads of duplicates. :| What Can I do about this?
[19:07:03] <jeroentrappers> uatec: in your merge logic, treat the array as a set https://lodash.com/docs#uniq
[19:22:09] <uatec> ahh, thanks for the pointer jeroentrappers, i'll check that out
[19:24:52] <uatec> although, jeroentrappers, i'm not sure i can see how to specify my own merge logic
[21:10:30] <Gloomy> Hi :) Have a small question, when I do find({field: 'string'}), it finds also all documents where 'string' is only a part of the field (ex field: 'stringy', field:'superstring'). How do I avoid this?