[08:11:04] <saira_123> Excuse me boys can i ask a question? I want to know why is it said that HDFS performs much faster than MongoDB for both read and writes? with new wiredtiger can;t Mongo perform better than HDFS?
[08:25:29] <m3t4lukas> saira_123: because mongodb supports querying
[08:26:34] <m3t4lukas> saira_123: with HDFS you need to know exactly what you are looking for. HDFS is just a distributed file system, not a database.You can't compare apples with pears :P
[08:28:49] <m3t4lukas> saira_123: however, you can ask, why ext4 does not support a distributed file system and why it is that ext4 performs better than HDFS. They are both file systems. Or NTFS when you are looking at it from a DOS perspective.
[08:29:50] <m3t4lukas> saira_123: and why are you asking for boys? O.O
[08:39:16] <m3t4lukas> saira_123: not to forget that mongodb supports aggregation
[08:42:41] <m3t4lukas> Derrick, cheeser: do you think it will be possible in the future that the end user will be able to create own aggregation stages or add functionality to an existing one? E.g. adding a stage for grouping dates by month etc.?
[08:46:35] <m3t4lukas> I know there is a projection for that one so this might be a bad example. A better example would be a stage that calculates the number of months a date is apart from another date. Just as a simple example. I'm sure in the real world there are loads of examples where you might wish to calculate real complicated things. Maybe you could add a calculation node with GPU support :)
[08:47:27] <m3t4lukas> and let the calculation work like the shader pipeline in OpenGL 3.2+
[09:09:47] <m3t4lukas> Derrick, cheeser: how expensive is the projection stage? Will it make an impact to do it after skipping and limiting? Will it even be executed after skipping and limiting or are those stages always executed at the end no matter where I put them like it's the case with the query pipeline?
[09:19:18] <Derick> m3t4lukas: it helps if you spell my nick right ;-)
[09:19:32] <Derick> projection makes sense mostly in the way that it reduces data having to be moved
[09:19:50] <Derick> so it certainly makes sense aftr skipping and limiting, but why *wouldn't* you do it before the skip/limit?
[09:19:52] <m3t4lukas> sry for annoying today, but something came into my mind when I just saw my BigDataMeta class. The date of creation already comes for free in case someone uses an ObjectId. But what about the date of deletion? I'm sure I don't need to explain any of you the paradigms of Big Data and why you never actually delete stuff in Big Data. I created a class for that reason. But every time I query I need to specify explicitly that I only want to query objects t
[09:19:53] <m3t4lukas> hat are not deleted (have no date of deletion in the field "meta.deleted" or the field does not exist or the date is more than the current date). It would be really nice if you could build in that kind of meta data and make it a query parameter to query deleted docs, too. You could spin this further by adding admin tools for "really" deleting all docs that have been deleted for e.g. two or more years.
[09:20:18] <m3t4lukas> Derick: sry for for spelling it wrong :o
[09:20:40] <m3t4lukas> Derick: because I only project it for renaming fields
[09:20:58] <Derick> then it's probably better to do after
[09:21:03] <m3t4lukas> Derick: I do it after a group by for making it a bit nicer
[09:21:04] <Derick> as you've less docs to operate on
[10:13:17] <saira_123> m3t4lukas bundle of thanks dear , for your reply, actually i am doing a research to confirm mongo is much faster compared to HDFS extending research in this article datasys.cs.iit.edu/events/ScienceCloud2013/p02.pdf
[10:34:25] <m3t4lukas> saira_123: the thing is that mongodb and hdfs have two different use cases. It is not necessary to compare since they are good for very different things. Hadoop is a data cruncher, not a database. You can use Hadoop along with mongodb.
[10:35:48] <saira_123> yes m3t4lukas bundle of thanks again, maybe i can extend this research by using mongo 3.0 and saying mongo is much better in terms of dealing IO load than before
[10:37:11] <m3t4lukas> saira_123: please understand that there is no "better" between these two. They do different stuff. If you want to evaluate then do so between databases. Or between data crunchers, depends on what you need.
[10:37:28] <m3t4lukas> saira_123: do you need a data cruncher or a database?
[10:38:21] <saira_123> m3t4lukas :) i need to extend this research , i am a student doing thesis, i have some hands on practice on mongodb
[10:39:57] <saira_123> m3t4lukas what is better in performance : mongodb aggregation, mongo mapreduce or mongo with hadoop connector?
[10:40:44] <m3t4lukas> saira_123: if you are doing thesis and you have hands on practice with mongodb I recommend comparing mongodb to couchdb. If you compare mongodb to hadoop you might look like an idiot if you have a professor that does both or reads the wikipedia page of both of these products. You really can't compare mongodb to hadoop
[10:41:19] <aps> Hi all. Mongo primary crashed and I restarted it. It's now stuck in ROLLBACK with "rollback 2 exception 10278 dbclient error communicating with server: example.com". But I can connect to example.com from this instance perfectly fine. [example.com is the new primary member] in the replica-set.
[10:41:19] <aps> I'm clueless. What could be wrong here? :/
[10:41:19] <m3t4lukas> saira_123: that totally depends on the data and whether you need it in real time
[10:42:07] <m3t4lukas> aps: did you ping example.com from the crashed node? Maybe an error in the current wiring of the ds
[10:42:18] <saira_123> m3t4lukas no i dont need it in real time , maybe then i can do a research on mongodb :how to improve write performance by using better chunksize , disabling journaling or disabling replication
[10:42:36] <aps> m3t4lukas: I did. I can even open it in mongo shell and run commands
[10:43:20] <m3t4lukas> aps: did you reconfigure the repset on the crashed machine?
[10:43:43] <aps> m3t4lukas: no, I just restarted the mongod service
[10:43:48] <saira_123> aps can you show the rs.config and rs.status here?
[10:44:16] <m3t4lukas> saira_123: for non real time hadoop is better suited. You can read data from mongodb using hadoop, then process it and write the results back for later querying by clients
[10:44:44] <m3t4lukas> aps: please don't do it! I just want to know whether you did it
[10:45:09] <saira_123> m3t4lukas: u mean hadoop "MapReduce" is better?
[10:45:12] <m3t4lukas> aps: if so you might need to use some advanced repset config commands in order to inforce the fix
[10:45:18] <aps> m3t4lukas: yeah yeah. That message was for saira_123 :)
[10:45:47] <m3t4lukas> saira_123: in your case it is. Note, however, that hadoop can't do it on its own. It needs some find of database
[10:46:24] <m3t4lukas> aps: no, that message was for you
[10:47:10] <aps> m3t4lukas: I got that. "gimme a min" was for saira_123
[10:47:24] <m3t4lukas> aps: if you reconfigured the cluster on the crashed server before putting it back you will have a problem (which is solvable, but it won't resolve itself)
[10:51:18] <m3t4lukas> aps: rs.status() says it did connect
[10:51:38] <m3t4lukas> aps: the crashed machine is just not yet finished with rolling back
[10:51:54] <aps> So, what I understand is that the primary crashed and it had some writes that weren't replicated yet. Hence, it needs to ROLLBACK those before starting replication of new writes.
[10:52:09] <aps> But it has been like that for a long time now
[10:52:12] <m3t4lukas> aps: and they are all on the same config version so nothing went wrong there
[10:52:24] <aps> and the logs tell me it's not getting better. Let me put logs here
[10:53:27] <saira_123> aps what version of mongo are u running?
[10:54:34] <aps> saira_123: 3.2 and 3.0.8 One member isn't updated yet. I'm dealing with this issue that I filed - https://jira.mongodb.org/browse/SERVER-22000
[10:54:46] <aps> mongo keeps crashing every few hours
[10:54:53] <m3t4lukas> aps: I just see the uptime of all the servers is a bit strange. Did you just restart them all in the middle of syncing process?
[10:56:19] <saira_123> m3t4lukas optime is fine ,it lacking only for problematic node
[10:56:54] <m3t4lukas> aps: they all have the same uptime. Except for the crashed one, it is one millisecond (or second, I don't know from the top of my head) ahead.
[10:57:21] <saira_123> aps mongod logs for ip-1 will tell whats wrong there
[11:02:48] <aps> m3t4lukas: rs.status() on current primary. notice uptime - https://www.irccloud.com/pastebin/JOFYyR28/
[11:03:03] <saira_123> aps either server ip-3 is overloaded or the network is bad between ip-1 and ip-3
[11:03:35] <saira_123> how about ping -s ( continuous ping) from server 1 to 3 and compare it with server 2 to 3
[11:03:39] <m3t4lukas> saira_123: there are ways you can do that, yes. For some of your points you might wanna dig into source code. Maybe an internship at 10gen would be great for writing a thesis about mongodb. You also might either decide on a storage engine or compare the existing ones. Don't know if it will still be two when you actually do it.
[11:04:27] <m3t4lukas> aps: yeah, saira_123 is right. Did you save on hardware? Or is there a lot of other stuff running on the node?
[11:04:39] <saira_123> m3t4lukas internship at 10gen , i wish, i am not in US , they dont take interns online, i already applied but no reply , bundle of thanks for ur help
[11:04:57] <saira_123> m3t4lukas oh thanks at least i am right about something :D
[11:06:07] <aps> m3t4lukas: no, this is a proper setup. They always replicate just fine with replication lag < 2 sec
[11:06:08] <saira_123> m3t4lukas i bet its network issue
[11:06:34] <aps> All this was working fine until I upgraded to 3.2 with WiredTiger :(
[11:06:41] <m3t4lukas> saira_123: any time. Yeah, you should temporarily move anywhere near 10gen for an internship. Maybe you can get a scholarship or any other kind of financial help, maybe from a company you will work for.
[11:07:35] <Derick> I think we only do summer interns in New York actually
[11:08:42] <m3t4lukas> Derick: maybe you know companies that take students who do thesis into contract and pay for everything needed (flat, public transport, food, etc.) for the internship
[11:09:08] <Derick> it's been a while since I've been a student
[11:09:32] <m3t4lukas> saira_123: I'm sure if you write some applications you will find a company willing to do such a thing
[11:09:37] <Derick> although I do think we pay our interns reasonable wages
[11:10:53] <saira_123> Derick Does 10gen allow intern to work online?
[11:11:13] <Derick> no, they're all in the NYC office
[11:11:51] <m3t4lukas> aps: with a lag of two secs syncing might take a lot of time. So please don't worry about mongo. rs.status() says it's healthy. You should, however, take a look at your infrastructure.
[11:12:30] <m3t4lukas> saira_123: even if they allow remote internships, they would not make a lot of sense.
[11:13:59] <m3t4lukas> saira_123: You should evaluate what Derick just wrote about the wages. Then do research on how you can use this money to successfully do an internship.
[11:14:14] <aps> m3t4lukas: okay, thanks. These are all r3.xlarge instances on AWS. So I didn't think network is the bottleneck.
[11:15:03] <m3t4lukas> aps: the ping says networking IS the bottleneck. You should contact your provider (aka. amazon)
[11:16:35] <aps> m3t4lukas: running "ping ip-3" in ip-1 gives me time=1.30 ms and no packet loss. Shouldn't that be ok?
[11:31:20] <m3t4lukas> aps: yeah, that should be okay
[11:32:00] <m3t4lukas> aps: you should monitor other system stuff like memory and CPU load
[11:33:23] <m3t4lukas> aps: I use OpenStack, so I don't know much about AWS. Do they guarantee you resource availability? I ask since VMs can be scaled at runtime and can be given mins and maxes
[11:36:19] <saira_123> aps use continuous ping with larger packet size
[11:40:53] <aps> saira_123: ping -s 65507 ip-3 gives time=1.90 ms and 0% packet loss
[11:54:01] <saira_123> i have to go find a job thanks all
[12:00:38] <m3t4lukas> aps: with that amount of data, did you consider professional support by 10gen?
[12:01:35] <m3t4lukas> aps: they offer quite interesting stuff like checking your setup
[12:04:33] <aps> m3t4lukas: No. I've thought of moving away from mongo though. Too many issues everyday. This is just one of the clusters. All of these are setup following the docs and production notes carefully and this wasn't done overnight.
[12:06:35] <m3t4lukas> aps: I actually never had problems with mongo
[12:20:01] <m3t4lukas> beside that they still don't have a Dlang driver. As soon as I find some time (we know how that goes) I will create one.
[12:22:02] <cheeser> i'm debating starting a Swift driver since it's on the schedule to learn this year.
[12:26:27] <m3t4lukas> cheeser: never done anything using swift
[12:27:26] <pamp> Anyone here using Windows Azure Server for Mongodb?
[12:27:49] <pamp> How can I prevent the timeouts in windows
[13:13:41] <jessu> now I am very new to mongo and how I create a table like in mongo
[13:14:06] <m3t4lukas> jessu: but then you should know how to do that
[13:14:30] <m3t4lukas> jessu: that is exactly what is covered there
[13:15:14] <m3t4lukas> jessu: if you completed the training you should have very good practice on creating collections and inserting and updating stuff
[13:15:52] <jessu> I have never gone to any training
[13:16:12] <jessu> m3t4lukas: please kindlky give me excatt lines for using a email and password
[13:16:55] <m3t4lukas> jessu: sry, I never worked with node.js. That's why I told you about training
[13:46:05] <vagelis> Hello i would like to ask if you have input/output and I mean receive a request with data that u have to validate and then u want to save it to mongodb but u want to change the types of some fields, do u design 2 schemas?
[13:46:51] <vagelis> The only thing that comes to my head is that i want to change mongod db _id to reference id
[13:51:59] <vagelis> I mean that i will receive mongo ids in a string format i want to validate them first and then save them as a reference to that document that the id refers to
[13:52:43] <m3t4lukas> vagelis: depends on whether you can afford to stop your clients and update at once or whether you will need to do a rolling release
[13:53:17] <vagelis> i dont know anything about that :S
[13:53:29] <m3t4lukas> vagelis: it also depends on how well you implemented your schema. Ideally your schema is a single library
[13:54:07] <m3t4lukas> vagelis: you have to know, it is your application. And if you don't know you will have to ask the admins first
[13:55:39] <vagelis> i dont get it. Again, i will receive some data, one field will be an ObjectId in string format. I want to validate it and save it as a reference inside the document, i mean i want to only change its type.
[13:56:15] <vagelis> Btw i will use mongoengine just to validate stuff!
[13:57:50] <m3t4lukas> vagelis: the thing is that you have two choices: for once you can change the type of the field in the clients and an the server by running a small script. That has to happen all at once and will cause downtime of your service. This method is very clean and should always be done if downtime is affordable. If downtime is no option you will need to create a second field with the same content and make your clients handle the current field and the legacy
[13:57:50] <m3t4lukas> field at the same time. Once the clients are able to do that you can update the database using a script.
[13:59:05] <m3t4lukas> vagelis: you can repeat the second method in order to rename the new field back to the legacy fields name if you made sure, that the clients can handle it afterwards.
[14:00:14] <m3t4lukas> vagelis: changing the type of a field is always a bad idea in general. That's why you should really carefully consider the types you use during schema design
[14:01:09] <vagelis> Ooooh now i understand what u thought and u are talking about downtime :O OK sorry i didnt make my self clear. I was talking about the process before the saving in the mongo db. I mean in the application level.
[14:02:57] <vagelis> ANyway thanks for your help but i realized that i shouldnt ask in the first place because: I will receive the data, validate them with mongoengine and then using pure pymongo save them to db so i can just change the ids that i want to REFs and proble solved!
[14:50:26] <vagelis> May I ask a mongoengine related question?
[14:57:02] <m3t4lukas> vagelis: you don't need to ask whether you may ask :P
[14:57:31] <vagelis> Well im not sure if people use mongoengine as i just started using it so :S
[14:58:17] <m3t4lukas> I don't use it, but I'm sure someone will know the answer
[14:58:37] <m3t4lukas> maybe it turns out to be a mongodb related question
[14:58:38] <vagelis> Ok so I have a list of dicts. These dicts might be type A or B. How am i supposed to define this field? Like: ListField(choices=(A, B)) ? Im just throwing out there what im thinking :S
[14:59:25] <vagelis> I think this is more appropriate: ListField(EmbeddedDocument, choices=(A, B))
[14:59:38] <vagelis> i really dont know i started using mongoengine yesterday.
[15:00:12] <vagelis> I dont even think that i can use choices like that but anyway i wanted to make u understand my question.
[15:00:14] <m3t4lukas> vagelis: as I said, I don't use mongoengine, since I don't use python. But what you are looking for is called polymorphism. Maybe that helps when you use google
[15:00:44] <vagelis> Ah ok i have open their documentation and i saw somewhere this word, thanks.
[15:07:03] <fish_> so I just enabled internal authentication (and therefor general access control) pretty much like described here: https://docs.mongodb.org/manual/tutorial/enable-internal-authentication/#access-control
[15:07:15] <fish_> now I can login with the users I created just fine - but only from localhost
[15:08:10] <fish_> from any remote system I get "Error: Authentication failed."
[15:08:59] <fish_> it's a sharded cluster and I always connected to mongos (both for creating the initial admin user as well as a less privileged user)
[15:09:14] <fish_> it fails for both the admin and the less privileged user :/
[15:17:25] <m3t4lukas> fish_: this may be a bug. At least I don't know of a feature that lets one specify certain hosts a user can log in from.
[15:17:43] <fish_> m3t4lukas: I've created an admin user which should disable the localhostauthbypass if I understand it correctly
[15:19:05] <m3t4lukas> fish_: it makes that this option is not used any longer. But maybe there is a bug and this parameter affects the behavior of db.auth()
[20:22:23] <cheeser> java is one of our most popular languages actually.
[20:25:14] <m3t4lukas> cheeser: I store conversatoins in mongodb and I have a "state" field with an array of state objects. Each account participating in the chat has a state object. Every time a user reads the chat I want to be able to update the field "read" in the state object in the array where "account" equals the account I want to set the state for. If no state object for that account exists in that array I want to upsert it and set the field
[20:25:14] <m3t4lukas> s for the account the state object is associated to and the "read" state field. I also want to the "read" field to be created and initialized with that value if the field does not exist in an existing state object for that account.
[20:45:49] <m3t4lukas> now with that document I want to be able to constantly upsert "state.read" for a specific "state.participant"
[20:47:56] <m3t4lukas> cheeser: I know that I could do it "by hand". But a _real_ upsert would be much nicer.
[21:13:53] <msx> hi everybody, a new commer to Mongo here. I just hit this issue: https://jira.mongodb.org/browse/TOOLS-679 when trying to upgrade a MongoDB 2 mmap to a Percona MongoDB 3 wiredTiger. Does anybody know if there's any documented procedure on how to get this working? Or what resource should I check to start learning how to proceed?
[21:22:05] <cheeser> m3t4lukas: this might help https://docs.mongodb.org/manual/reference/operator/update/positional/
[21:36:28] <m3t4lukas> cheeser: yeah, I know of that. Does that do an upsert?
[22:54:20] <tsturzl> I'm having a really strange problem that I can't seem to correlate to any changes
[22:54:38] <tsturzl> My primary went down today, which sometimes happens from network issues with my provider
[22:54:52] <tsturzl> however the process actually crashed due to a seg fault
[22:54:59] <tsturzl> something I've never seen before
[22:56:11] <tsturzl> It dumped a "BACK TRACE". Is it possible something I'm doing is causing this, or is this a bug that I should report?