[00:08:51] <bloudermilk> Hey all, I'm currently faced with removing 100M+ documents from several collections in a database with ~300M documents in total. I have a list of IDs of all the documents to be remove, and the goal is to remove them as quickly as possible. My current approach is to send batches of 5k to be deleted via { _id: { $in: [] } } but this turned out to be quite slow (taking upwards of 22s in some cases)
[00:09:41] <bloudermilk> I just read about the Bulk API and I'm wondering if that might be a better solution. If so, I'm curious what the most efficient way to make use of it is. In other words, send several batches as mentioned above, or send batches individual remove()s using the Bulk API
[00:52:48] <DragonPunch> so is there not a way to connect to mongodb remotely
[00:53:01] <DragonPunch> cuz i dont want to use mysql
[00:53:24] <zivix> You can't listen on localhost. You need to listen on 0.0.0.0
[00:58:06] <zivix> The built-in config should bind to a public interface so if you have something in your config that specifies the bind, try taking it out and restarting.
[02:33:13] <morenoh149> _blizzy_: there's also #mongoosejs
[02:47:15] <boutell> hi. I just published a utility for piping entire mongodb databases through the shell, without using a giant folder of files like mongodump would. Feedback welcome: https://www.npmjs.com/package/mongo-dump-stream
[02:47:37] <boutell> it’s written in node, so it’s not as fast as mongodump/mongorestore, but as long as it outruns your network, it doesn’t really matter that much
[04:08:12] <jsfull> hi guys, i have a question regarding schema design for time series (daily basis) - should I #1) create one document and update each day with the additional next day results OR 2) Create a seperate document for each of the days and have them findable by a common term that links the particular time series data
[04:08:49] <zivix> jsfull: As a rule of thumb, never create a schema such that one document will have unbounded growth.
[04:09:10] <zivix> Eventually it will get too big and you'll lose data.
[04:09:38] <jsfull> ahk gotcha so #2 essentially, thanks zivix :-) one sec, pasting an example, would be great to get your confirmation - please give me a few mins
[04:15:45] <jsfull> so essentially i want to keep track of a particular keywords top 3 results on google and wondering if this way of storing the documents in mongodb is a good idea
[04:17:01] <jsfull> what i will be doing is quering this data through json in a very similar output and using highcharts to display the ranking changes using multi line graph, something like on the following page: https://www.serpwoo.com/members/main/intro
[04:18:10] <zivix> Seems fine to me. Are createdon and date redundant?
[04:19:09] <jsfull> ya the once i mentioned in the example are dummy but both will be the same essentially so i'll skip the createon part
[04:19:20] <jsfull> because it'll record the same dates
[04:21:21] <zivix> Not sure which driver you're using but mongo can store a datetime object, so probably a good idea to use that. Makes it easier to query data later on.
[04:23:38] <jsfull> oh right true, i'm using node and using a ORM called waterline, do remember that it allows datetime, will use that, thanks again :D
[05:49:13] <afroradiohead> Okay, can someone help me understand this statement:
[05:49:41] <afroradiohead> "Mongo doesn't handle relational data as well as RDMS"
[05:50:09] <afroradiohead> I hear people saying it, but I can't find out why
[05:50:58] <afroradiohead> more importantly, samples of how a RDMS would handle relational data vs a NoSql implementation
[05:58:20] <kaushikdr> hello.. how do I retrieve a section of a large video files stored in gridfs just to show the thumbnail of the file?
[05:59:13] <afroradiohead> NoOutlet: ahh, that's good to knnow
[06:00:22] <afroradiohead> how about mongodb then, does it have a significant disadvantage vs a sql implementation?
[06:00:39] <NoOutlet> MongoDB is not as good as RDBMS with relational data because RDBMSes allow for JOINs which allow you to do something along the lines of "get all of the friends of all of the patients of these three doctors" (if you had that information).
[06:02:32] <NoOutlet> So maybe you can't store all of that data in each "doctor" document because the document would grow too large and so you'd have to store _ids for patient documents and use separate queries to get those documents which themselves would store _ids for friends documents"
[06:02:34] <afroradiohead> ahh, complex joins like that one
[06:03:15] <NoOutlet> Now, if you're asking about noSQL, something like Neo4J is good at relational stuff like that.
[06:04:20] <afroradiohead> ok i see what you're saying
[06:04:49] <afroradiohead> That sucks.... but then I'm thinking, is a complex join like that necessary?
[06:05:00] <NoOutlet> It depends on your application.
[06:05:21] <NoOutlet> In a social network, joins way more complex would be useful.
[06:05:37] <afroradiohead> but the specific sample you gave me, would you actually get "all" friends?
[06:05:56] <afroradiohead> of "all" patients of those three doctors?
[06:06:28] <afroradiohead> I guess now I'm more-so asking towards your experience with building applications
[06:06:30] <NoOutlet> The example I gave was nonsensical. I didn't put too much effort into giving a realistic example.
[06:07:10] <NoOutlet> But, if you did queries in these databases, then yes you would get all of the friends of all of the patients of the three doctors.
[06:07:40] <NoOutlet> Whether that's what you need or not is impossible to tell.
[06:08:00] <NoOutlet> What database would store patient doctor relationships as well as friendships?
[06:08:50] <afroradiohead> true that, but I'm curious of a query like that would be made, if that list of all the friends would be passed to the frontend
[06:08:59] <afroradiohead> I guess here's the thing:
[06:10:05] <afroradiohead> the idea of placing a limit is making me think of a scenario where relational data can be handled perfectly well in a nosql
[06:10:41] <afroradiohead> again it's just a gut feeling, I personally don't have enough experience with mongo
[06:11:46] <afroradiohead> but it's because I'm the type of person to constantly ask why until I run out of them... the sample you did give me was helpful though
[06:12:33] <afroradiohead> but thank you NoOutlet for explaning that to me
[06:12:36] <NoOutlet> There are situations (usually simple relationships) where MongoDB will outperform relational databases.
[06:16:54] <NoOutlet> So I can simplify that relationship, but I lose information in the simplification and more importantly, I can't store all of that data embedded in the me document because the document would grow too large.
[06:18:16] <NoOutlet> So I'd have a "people" collection. I'm going to store my mother in an object, father in an object, and any siblings in an array of objects.
[06:19:05] <NoOutlet> But am I going to store my mother's mother in my mother document? As well as a separate document for my mother? And what about the mother document within my mother subdocument?
[06:25:04] <NoOutlet> Well hopefully it illustrates why MongoDB is bad at relational data.
[06:25:54] <NoOutlet> Though, for the record an RDBMS would not be ideal for this application either really. This is a perfect use case for a graph database.
[06:29:28] <NoOutlet> This channel is not frequented by many from MongoDB.
[06:29:51] <NoOutlet> It's mainly just enthusiastic users of MongoDB.
[06:30:01] <dragunov11> getting "Error: 54f29ce1ed9061a5a3afbe42 does not exist", during uploads and retrieval from gridfs, mongo 2.4. anyone know why?
[06:30:18] <pajooh> so enthusiastic users, please help!!!!!
[06:30:36] <NoOutlet> And it's virtually always >95% idlers. Especially at 1:30AM EST.
[06:47:41] <afroradiohead> I know what you mean now
[06:47:50] <afroradiohead> "It depends on how the application will access the data"
[06:48:42] <afroradiohead> (I was taking a dump while thinking the whole Doctor's Patient's Friend thing and then laying down - I was like... what type of application would be giving ALL friends that's dumb)
[06:48:51] <afroradiohead> but if it's split off in segments
[06:49:40] <afroradiohead> if the application is asking for data in small segments... then essentially I won't run into this issue of access relational data being slow
[06:49:54] <afroradiohead> hence this whole (limit) gut-feeling I had
[06:50:22] <afroradiohead> now... it's just me testing that theory
[08:43:46] <me2resh> I am trying to find documents within 1 kilometer of a certain point, is that the correct query http://paste2.org/aY3OwHIg ? it doesn't seem to get me the right results
[08:48:33] <Waheedi> is there a way to remove an element with its indexes?
[09:02:32] <whowho> Can mongo run a query for similar key?
[10:16:07] <Waheedi> if my secondary getIndexes is returning less indexes than my primary
[10:28:58] <tim_t> Mongo and morphia with Java question: I have a collection that is going to be heavy on reads and light on writes - a lot of permission checking etc. To cut down disk traffic I intend on holding the collection in memory and write to the in-memory and on-disk collection when rarely necessary. I can do updates etc to the on-disk DB easily, but how do I do an update query on the in-memory copy of objects?
[10:32:56] <tim_t> It looks like I would have to hold each document in a hashmap or something, update the on-disk document then reload the document into memory. that just seems awful so I am hoping to be able to run an update query directly on the in-memory document
[10:33:18] <tim_t> the problem is i do not know how to do this
[19:14:05] <Alphart> but I don't think it answers to my question (could be wrong tho)
[19:21:26] <boutell> Alphart: I’m not a java person. In the node driver, whjch is of course async, the default used to be “safe: false,” which meant your callback was invoked as soon as the request was shot over to mongo but before mongo confirmed it had actually happened. This was a crazy default and they changed it to “safe: true”. I imagine a similar evolution may have occurred in the java driver but if you point me to the docs for the
[19:21:27] <boutell> “write” method you’re using I might be able to say more.
[19:22:07] <boutell> I’m used to java dealing with simultaneous things through threads rather than async, but maybe that’s not what you’re doing.
[19:22:51] <Alphart> Actually Callback are more used in javascript but I also use them sometimes in Java (would be better to use Future but I love the simplicity of callback)
[19:23:16] <boutell> Alphart: “safe: true” means you can expect to read the data that has been written if you make a query now… in a single-server mongo setup anyway. I don’t know how it would behave in a replica set, I don’t know when you are guaranteed to see the data you just wrote regardless of which node in the set you’re talking to.
[19:23:24] <Alphart> I think that's the "write method" you're talking about : http://api.mongodb.org/java/current/com/mongodb/WriteConcern.html
[19:24:44] <boutell> that doesn’t seem to be the actual write method Alphart, but rather a supporting class of some kind, but an interesting one because the option exists not to acknowledge until replicated to all nodes in the set
[19:25:10] <boutell> I think a reported bug in our CMS is probably because we’re using a mongo-backed session store that probably doesn’t use that feature
[19:25:24] <kakashiA1> could anybody please explain me whats the difference between statics and methods in mongoose?
[19:25:30] <boutell> maybe it’s a default we can set globally, which would make my day
[19:25:49] <Alphart> I want to know when I retrieve a collection from my software if the I/O operations are executed on the same thread or not
[19:25:53] <sflint> Alpha write concern of 2 will make sure writes alwasys make it to mongo
[19:25:57] <Alphart> cause if they do, it will generate lag
[19:26:02] <boutell> kakashiA1: I recall it’s like static methods versus instance methods in java or any other traditional OOP language.
[19:28:07] <Alphart> sflint: ok thanks for the info, I'm actually trying to understand how does the java driver perfom I/O operations
[19:28:14] <Alphart> if it's on the same thread or not
[19:30:14] <kakashiA1> boutell: hmm...when do I have to use what, I mean I would use static methods if they have nothing to do with the object
[20:12:56] <boutell> kakashiA1: a static method might be “go get me all the blue xuptubblers.” An instance method would be “percolate this xuptubbler.”
[20:13:34] <boutell> a static method might also be “calculate a valid xuptubbler fribble to be passed to any xuptubbler method that requires one"
[20:27:55] <kakashiA1> boutell: statics are model (class) related, like: - search in this model, delete in this model, get from this model
[20:28:44] <kakashiA1> and methods are document (object) related, like: - delete this value of this document, change that value of that model
[20:30:37] <kakashiA1> with other words you add a "satics" to a model, if you want to define global function, something that is model related
[20:31:15] <kakashiA1> and you define "methods" to give all your instances specific functionality, like changing a value
[20:36:38] <kakashiA1> boutell: in a statics definition, who is "this"?
[20:37:20] <kakashiA1> in a methods definition, its the document, for example: user = this
[20:37:39] <boutell> kakashiA1: probably the model
[20:38:00] <boutell> probably you can call the other statics from this
[20:38:18] <boutell> but this is javascript and async so “this” is constantly changing on you anyway and is generally annoying
[20:45:19] <kakashiA1> boutell: okay, makes sense and statics are always called by the model, so it makes sense :)
[20:56:54] <MatheusOl> hm... Funny how the "downloads" link is hidden on MongoDB site now. Easier to Google then to find on the site.
[21:00:53] <MatheusOl> Ah, .org => good and easy, .com => show case and useless
[21:10:08] <boutell> MatheusOl: I don’t know what you’re installing onto, but if it’s Linux or Mac, you’d be better off using your package manager or homebrew respectively
[21:10:38] <boutell> also mongodb maintains official repos for debian/ubuntu and centos/redhat
[21:11:01] <boutell> maybe you know this though, just musing
[21:14:35] <MatheusOl> I do know, I just felt strange that the link wasn't obvious from the site
[21:14:50] <MatheusOl> But then I realized there is .org and .com
[21:19:17] <kakashiA1> boutell: do you know why you have to put the methods definition BEFORE the model definition?
[21:24:06] <MatheusOl> boutell: Thanks for your concern anyway... :)
[21:25:23] <boutell> kakashiA1: sorry no, you’d have to ask someone who really uses mongoose (:
[21:26:08] <boutell> btw, released this last night and updated today: lets you pipe an entire database over ssh, without making a giant mongodump folder. https://www.npmjs.com/package/mongo-dump-stream
[22:24:08] <obeardly> Hi all: has anyone here run into an issue that after updates, MongoDB won't start on Ubuntu with a log file error?
[23:39:49] <boutell> obeardly: no, but if you post all the details of the error, we may be able to help.