PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 12th of April, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[03:08:00] <nictuku> hi. I have a single server with a database that I want to move to a server across the atlantic (hetzner.de == cheaper). I'm OK with some downtime, but I wouldn't want to wait for a mongodump transfer. What would you recommend? convert my servers to a replica set, then move data that way?
[03:44:17] <Boomtime> @nickuku: "I wouldn't want to wait for a mongodump transfer." -> you have to transfer the data somehow, sooner or later those bits have to traverse the atlantic - a mongodump could at least be compressed for transmission, with scp or similar
[04:14:08] <nictuku> Boomtime: but wouldn't a replication copy the data incrementally?
[06:40:15] <lyze> Hello, I'm encountering a problem with morphia, is this the right place to ask for help?
[06:48:57] <Boomtime> @nictuku: what do you mean 'incrementally'? no matter what you do the data has got to be transferred, i'm not sure where you think the efficiency is being improved - the only way to get the data to be sent faster, is to send _less_ - unless you delete half your data, the only other option is to compress it, ala scp etc
[06:55:59] <lyze> And if it is, I've tried creating a simple test class as an entity ( http://pastebin.com/i70SwS2p ) which I try to save into the database. however I'll get a "indexoutofboundsexception" on the "save" method.: http://pastebin.com/r65xKGs3
[06:56:56] <lyze> ( Sorry, line 21-22 should be: datastore.save(new Test("blurgh")); )
[08:21:58] <mroman> If I have multiple mongos instances running, do I have to do the sh.addShard() on each of those?
[08:22:41] <mroman> or do they somehow discover each other/replicate data?
[08:30:17] <mroman> hm looks like the communicate through the config servers somehow.
[08:30:20] <mroman> *they
[08:51:26] <mroman> also.. you can't do show dbs but you can create a user?
[08:52:20] <mroman> nvm.
[09:21:36] <mroman> but more importantly: is there a way to convert an existing standalone mongodb server to shards?
[09:22:52] <kurushiyama> mroman: Sure.
[09:23:50] <Keksike> Hey, I need some help updating my mongo on Ubuntu. I have 2.6.12 installed now. I run sudo apt update, then sudo apt upgrade mongodb-org, but it says that newest version already installed
[09:23:57] <kurushiyama> mroman: Set up a config server replset. Fire up a mongos. Add the standalone via sh.addShard()
[09:24:14] <kurushiyama> Keksike: So what is your problem?
[09:27:29] <kurushiyama> mroman: Though it is a Very Bad Idea (tm) to have a shard backed by a standalone instead of a replset.
[09:29:01] <gta99_> Hi all, I'm currently trying to install Mongo 3.3 on Debian Jessie, but an apt-get update reports "W: GPG error: http://repo.mongodb.org jessie/mongodb-org/3.3 Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY BC711F9BA15703C6" any idea where to get the correct public key?
[09:30:09] <gta99_> I'm using saltstack to install the repository btw, if that helps :)
[09:30:18] <Keksike> kurushiyama: the problem is that although I have only 2.6.12 installed, it is saying that I have the newest version
[09:30:26] <Keksike> although newest version is 3.2. or whatever
[09:31:55] <kurushiyama> Keksike: From the pov of your operating system that is correct
[09:32:25] <kurushiyama> Keksike: You need to add the repo for the new version.
[09:32:48] <Keksike> kurushiyama: I tried apt update before apt upgrade. Do I need to do something else to add the repo for the newer version?
[09:33:11] <kurushiyama> Keksike: It is NOT a command. You need to add another software repository...
[09:33:33] <Keksike> kurushiyama: oh, ok...
[09:34:18] <kurushiyama> Keksike: https://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/#install-mongodb-community-edition is the result of googling "mongodb install ubuntu" ... ;P
[09:34:46] <Keksike> I only tried googling mongodb upgrade version on ubuntu
[09:35:00] <Keksike> im not too familiar with how these things work, so thanks for the help :)
[09:35:59] <kurushiyama> gta99_: Huh? Thought the dev release was only available for OSX...
[09:36:23] <kurushiyama> gta99_: Apparently, I was mistaken
[09:37:10] <gta99_> @kurushiyama: http://repo.mongodb.org/apt/debian/dists/jessie/mongodb-org/3.3/
[09:37:13] <Keksike> kurushiyama: do I need to somehow uninstall the previous version of my mongo, im on part 4 now of that walkthrough
[09:37:14] <gta99_> :)
[09:38:11] <kurushiyama> Keksike: no, apt should take care of that. Just make sure mongod is stopped.
[09:38:19] <Keksike> ok
[09:39:38] <kurushiyama> gta99_: I assume you have run "sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927"
[09:40:06] <kurushiyama> gta99_: Do not care about the ubuntu server, it is just for the signing key.
[09:41:21] <gta99_> that's what I've tried, but it doesn't work
[09:41:27] <gta99_> again, it's a saltstack configuration
[09:41:33] <kurushiyama> gta99_: use --allow-unauthenticated, then
[09:41:49] <kurushiyama> gta99_: And maybe you want to file a doc ticket.
[09:42:16] <gta99_> mongodb-repo:
[09:42:16] <gta99_> pkgrepo.managed:
[09:42:17] <gta99_> - humanname: MongoDB.org Repo
[09:42:18] <gta99_> - name: deb http://repo.mongodb.org/apt/{{ salt['grains.get']('os') | lower() }} {{ salt['grains.get']('oscodename') }}/mongodb-org/3.3 main
[09:42:20] <gta99_> - file: /etc/apt/sources.list.d/mongodb-org.list
[09:42:22] <gta99_> - keyid: EA312927
[09:42:24] <gta99_> - keyserver: keyserver.ubuntu.com
[09:42:26] <gta99_> - refresh_db: False
[09:42:28] <gta99_> - watch_in:
[09:42:30] <gta99_> - cmd: apt-get-update
[09:42:32] <gta99_> that's what I'm doing
[09:43:34] <gta99_> ok, I'll raise a ticket
[09:43:40] <kurushiyama> gta99_: I have no clue about saltstack, AND PLEASE USE PASTEBIN
[09:43:47] <gta99_> sry, my bad
[09:45:23] <kurushiyama> gta99_: Another question is why to install a dev release with some sort of recipe ;)
[09:45:51] <gta99_> well, I want to install a version of mongodb >= 3.0
[09:45:58] <gta99_> and it should work on jessie
[09:46:10] <gta99_> it seems like 3.3 is the only one available
[09:46:39] <gta99_> I've tried 3.2 wheezy mongodb but it fails with a dpkg configuration error for mongodb-org-server
[09:47:05] <gta99_> and I'm assuming this is due to me using a wheezy package on jessie
[09:49:45] <Keksike> kurushiyama: ok, now I installed the 3.2.4 according to the rules you linked. But my mongodb-org-mongos, -server, -shell and -tools are still 2.6.12. How do I update them also?
[09:50:02] <kurushiyama> Keksike: Absolutely
[09:50:04] <Keksike> I though they would act like dependencies and that updating the mongodb-org would update them too
[09:50:36] <Keksike> is there any rules for updating them?
[09:50:54] <Derick> they don't really depend on each other....
[09:51:00] <Derick> you can manually apt-get them
[09:51:11] <kurushiyama> Keksike: Do it manually, and then remove the old repo. just do an apt-get update mongodb-org-server and so on.
[09:51:29] <Keksike> ok
[09:51:40] <kurushiyama> Derick: Thought mongodb-org was a meta exactly for that prupose o.O
[09:51:55] <gta99_> maybe try apt-get remove --purge mongo* && apt-get update && apt-get install mongodb-org
[09:52:46] <kurushiyama> gta99_: Uhm, I am not super-fit with Debian/Ubuntu, but wouldn't the purpge option also purge the data?
[09:53:59] <Derick> kurushiyama: oh, that one *should* depend on the others. File a server bug ?
[09:54:03] <gta99_> true! if it's important data, always make a backup before upgrading!
[09:55:17] <kurushiyama> Derick: Me? For deb/ubuntu? Hell will freeze – and I want to see that!
[09:55:41] <kurushiyama> Derick: ;)
[09:57:18] <kurushiyama> Derick: Will first verify, though.
[09:57:51] <Keksike> now im getting this https://gist.github.com/Keksike/2555afb9a117d2461a8c7f59f4457612
[09:57:55] <kurushiyama> Keksike: What Ubuntu version do you exactly use?
[09:58:03] <Keksike> 14.04
[09:58:15] <kurushiyama> Keksike: line 13.
[09:58:24] <Keksike> what happened to my data/db
[09:58:31] <Keksike> o.O
[09:58:39] <kurushiyama> Keksike: No
[09:58:46] <kurushiyama> Keksike: Not data/db
[09:58:57] <kurushiyama> Keksike: /data/db – which is a file path
[09:59:08] <Keksike> yeah but umm
[09:59:14] <Keksike> if I only upgraded my mongo, it should already exist?
[09:59:25] <kurushiyama> Keksike: Not necessrily.
[09:59:52] <kurushiyama> Keksike: You might want to change the config file and point dbpath to /var/lib/mongodb
[10:00:27] <Keksike> mongod --dbpath /var/lib/mongodb
[10:00:27] <Keksike> ?
[10:00:38] <kurushiyama> Keksike: Does that look like a config file?
[10:00:46] <Keksike> nope
[10:00:59] <Keksike> but I thought you could do it that way too
[10:01:26] <kurushiyama> Keksike: Sure. Till the next reboot. Or restart of mongod.
[10:01:31] <Keksike> ah ok
[10:01:49] <Keksike> why didnt upgrading the mongo keep my old config file?
[10:02:04] <kurushiyama> Keksike: did you use the --purge option?
[10:02:11] <Keksike> nope
[10:02:52] <kurushiyama> Keksike: Well, good question then. TL;DR A bug in the package assembly manifest, most likely.
[10:04:34] <kurushiyama> Keksike: The longer story: Files need to be marked as configuration files during package creation. There are several ways to do that – and the side effects sometimes are complicated.
[10:05:12] <kurushiyama> Derick: of the top of your head, were there major changes in packaging done lately?
[10:05:13] <Keksike> ok
[10:05:33] <Keksike> hmm
[10:05:40] <Keksike> it seems that I already have a config file in /etc/
[10:05:52] <Keksike> and it has dbpath=/var/lib/mongodb
[10:06:00] <Keksike> I think my mongod just isnt using the config file
[10:07:35] <Keksike> cihanbebek@ci7:~$ sudo mongod --config /etc/mongodb.conf just doesnt do anything
[10:08:21] <kurushiyama> Keksike: ps ax | grep mongod
[10:08:33] <kurushiyama> Keksike: no news tend to be good news
[10:09:09] <kurushiyama> Keksike: And you _really_ should not start mongodb via sudo
[10:09:21] <kurushiyama> Keksike: PLEASE get your basic linux skills right.
[10:09:43] <kurushiyama> Keksike: Most likely, you have a permission problem, now, since new files most likely are created for root
[10:09:59] <kurushiyama> Keksike: For various reasons, it is a _very_ bad idea to run a service as root.
[10:10:41] <kurushiyama> Keksike: That https://xkcd.com/149/ is not the way you should think of sudo
[10:10:47] <kurushiyama> ;)
[10:11:12] <Keksike> heh
[10:11:13] <Keksike> ok
[10:14:58] <kurushiyama> Keksike: You might want to stop mongod by doing a "sudo killall -TERM mongod" now, then, you want to run "sudo chown mongodb.mongodb -R /var/lib/mongodb" and then you want to start mongodb via the usual means
[10:18:34] <Keksike> yeah I found out the problem
[10:18:58] <Keksike> I was trying to run it with mongod but I should have been running it with sudo service mongod start
[10:19:54] <kurushiyama> Keksike: Yep. But you need to go through the procedure I gave you above to fix potential permission problems.
[10:25:14] <Keksike> ah okay
[10:25:17] <Keksike> thanks alot man
[10:25:19] <Keksike> for your help
[10:26:10] <kurushiyama> Keksike: You are welcome.
[10:47:06] <SimpleName> If I config a product with serveral images, Should I design images model with mongodb, or use Mysql like this: img1 img2..img10, you can upload at most 10 images to one product
[10:49:01] <mroman> How does auth work in sharded clusters with replica sets?
[10:49:29] <mroman> I connect to a mongos instance and create the first admin user.
[10:50:00] <mroman> is that user replicated to all shards/replica sets?
[10:50:11] <Derick> I *believe* credentials are always stored on the primary shard.
[10:50:24] <mroman> or what happens if someone connects directly to a shard
[10:51:00] <mroman> I may be dealing with an architecture where not all servers are really under my control.
[10:51:24] <Derick> "The admin database in a sharded environment will actually live on the config servers."
[10:51:27] <mroman> some companies want to form a common database.
[10:51:34] <Derick> http://serverfault.com/questions/570032/mongo-sharding-config-server-and-mongo-authentication
[10:54:05] <Derick> also: https://docs.mongodb.org/v3.0/tutorial/enable-internal-authentication/
[10:54:58] <kurushiyama> SimpleName: That is entirely our decision, isnt it?
[10:55:34] <kurushiyama> @Derick You are correct with the primary shard thing in case the user is stored in the DB.
[10:55:53] <Derick> my sharding knowledge is a bit shaky
[10:57:25] <kurushiyama> mroman: Authorization with MongoDB's internal means (and that is the problem here) is not granular enough to guarantee separation of data
[10:57:47] <kurushiyama> mroman: Depending on the data and your jurisdiction, it might even be illegal to do so.
[10:59:04] <kurushiyama> @Derick Personally, in a sharded env, I tend to store the users in the admin database, though
[10:59:56] <kurushiyama> mroman: I know for sure it would be in Germany, and most likely it would be in the othe EU states, as well.
[11:00:22] <mroman> how so?
[11:04:09] <kurushiyama> mroman: There are limits defined in what you can do in the BDSG (Bundesdatenschutzgesetz – german for Data privacy act). One of them is that you have to use all means necessary to make sure the data is safe. Putting the data set of multiple companies into the same database is not exactly that. Furthermore in case the data is compromised and multiple companies have potential access, it is very hard to find out who is respo
[11:04:09] <kurushiyama> nsible, making it likely that the provider of said database will be held liable. Each breach of the BDSG might be fined with €50k - €300k, or more, at the judges discretion.
[11:05:07] <kurushiyama> mroman: And iirc, that is per incident. You better have your check book ready.
[11:06:28] <mroman> yeah but how would that be different from collecting all the data in one company?
[11:06:50] <mroman> those companies provide data from wheather sensors stuff like that
[11:07:04] <mroman> which they want to share
[11:07:07] <kurushiyama> mroman: Aside of that. When multiple companies share the same database, this is usually done to do some integrative tasks. The industry learned the hard way that databases are a poor tool for integration. With regards to one company
[11:07:25] <kurushiyama> mroman: With multiple, it is even more.
[11:07:44] <mroman> so one of the idea would be that each hosts a shard for that data
[11:08:06] <kurushiyama> mroman: It does not sound better the more you explain what you want to do.
[11:08:09] <kurushiyama> mroman: ;)
[11:08:47] <mroman> well you could just rent some cloud space and each company uploads stuff to the same cloud space
[11:08:52] <kurushiyama> mroman: If it is no-personal data, set up a joined data center.
[11:09:25] <kurushiyama> mroman: And set up an API gateway to CRUD said data. Decouple as much as you can
[11:10:31] <kurushiyama> mroman: I even have a name for said joint data center: "WSOC - Weather StOrage Center" ;)
[11:11:50] <mroman> well we want to to do data analysis on the whole thing :) @integrative task
[11:11:56] <kurushiyama> mroman: Nope
[11:12:05] <kurushiyama> mroman: That is no integration
[11:12:19] <mroman> anyway that's what we want to do with mongo.
[11:12:21] <kurushiyama> mroman: At best, that is aggregation. Guess why the framework is called that way.
[11:12:40] <kurushiyama> mroman: I get the picture. How many sensors?
[11:13:23] <mroman> probably a few dozens
[11:14:43] <kurushiyama> mroman: Uhm, we are not talking of creating random date, are we?
[11:14:55] <kurushiyama> s/date/data/
[11:15:36] <mroman> No. Random data could be generated way more cost efficiently
[11:16:21] <kurushiyama> mroman: Sorta. Iiirc, some poker networks use IVs of pseudo randomly selected sensors in a bariometer network...
[11:17:37] <kurushiyama> mroman: Well, my suggestion still stands. Create WSOC and an API to CRUD the data.
[11:20:19] <mroman> Also I need to run mapreduce jobs of course.
[11:20:35] <mroman> for which I'll probably need an API anyway, yes.
[11:21:05] <kurushiyama> mroman: On demand or timed?
[11:22:05] <kurushiyama> mroman: Hint: more often than not, aggregations are better suited and faster ;)
[11:23:32] <mroman> timed.
[11:23:49] <mroman> or on demand... that wouldn't really matter much.
[11:23:57] <mroman> if by timed you mean: always at 12:30
[11:24:22] <kurushiyama> mroman: Yes, that was what I was referring to.
[11:25:43] <kurushiyama> mroman: Or like "every 5 mins"
[11:25:51] <mroman> No :)
[11:25:57] <mroman> it's not that real-time
[11:30:27] <kurushiyama> mroman: Well, sounds reasonable. I was a bit concerned for the throughput, but unless each sensor sends _a lot_ of data/s, even a USB thumb drive should be fast enough. I am not sure wether sharding is necessary from the beginning,then, since adding shards would be to keep the sweet spot for data storage costs. However, you can start out with a single shard, to be sure everything works as expected should the need arise to s
[11:30:28] <kurushiyama> cale out.
[11:32:23] <mroman> sharding would be more to distribute mapreduce jobs
[11:33:12] <mroman> so you can distribute the 0.25TB data to shards and then let mapreduce jobs run
[11:33:40] <kurushiyama> mroman: Well, distributing the data comes at a cost, too.
[11:35:08] <kurushiyama> mroman: You should be careful there. I would first identify wether actually a m/r is needed. As a rule of thumb: If you can do it with the aggregation framework, it is usually the better idea.
[11:36:55] <kurushiyama> mroman: To cut a long story short: Create a test env and try to find the best way.
[11:37:33] <mroman> that's what I'm currently doing :D
[11:38:01] <kurushiyama> mroman: Have a deep look into the aggregation framework, then. A very deep look.
[11:39:42] <mroman> It has a limit of 10% RAM consumption? o_O
[11:40:16] <mroman> that's probably not good if we end up aggregating a few mio. entries
[11:41:43] <kurushiyama> mroman: You are assuming that you load all entries into RAM?
[11:41:50] <kurushiyama> mroman: Think again.
[11:41:56] <kurushiyama> mroman: ;)
[11:42:44] <kurushiyama> mroman: Aside from an early match: you can actually use disk space, as m/r does by default. But the aggregation framework offers _really_ nifty functions.
[11:44:10] <Mumla> hey there!
[11:44:39] <kurushiyama> Mumla: Hoi!
[11:45:08] <Mumla> I got a crazy problem which is summarized here: http://stackoverflow.com/questions/36499924/cant-start-mongodb-as-a-service (I hope you don't hate me for posting such a link here...). I'm quite desperate here at my work :(
[11:46:05] <Mumla> can you help me with this problem? that would be aaaaawsome!
[11:48:53] <kurushiyama> Mumla: And you should be. With all the trial and error, your install most likely is pretty screwed.
[11:49:31] <Mumla> I can do a reinstall when this would be a part of a soultion
[11:49:35] <Mumla> *solution
[11:49:43] <Mumla> i even did this one time ;D
[11:50:59] <kurushiyama> Mumla: On several conditions. a) Delete the question on SO. It belongs to dba.stackexchange.com b) Reask it there c) Add OS, its version and the mongodb version d) Success not guaranteed.
[11:52:54] <Mumla> @kurushiyama could be a begin, yes
[11:53:33] <kurushiyama> Oh, and e) Take notes and answer your own question when we fixed it ;)
[11:53:43] <kurushiyama> Mumla: The latter is optional.
[11:53:59] <Mumla> ;)
[11:54:54] <Mumla> when we - so you mean you come back to my dba-queston ;)? i'd also like to stay in touch with this Robert Udah...
[11:55:37] <kurushiyama> We will do it here. I need to ask questions, Inform him that you reask on dba. He will get the message anyway.
[11:57:03] <Mumla> uhm, okay, after our talk here i'll do this :)
[11:57:15] <Mumla> so, what do you like to now :)?
[11:57:19] <Mumla> reinstall first?
[11:57:24] <kurushiyama> No
[11:57:34] <kurushiyama> Mumla: OS, version and MongoDB version?
[11:59:01] <Mumla> (hpow can I mention you directly here?)
[11:59:12] <Mumla> Ubuntu 14.04.4 LTS
[11:59:31] <kurushiyama> Mumla: Depends on your irc client. Kuru and <tab> should do it.
[11:59:51] <kurushiyama> Mumla: Ergs. Ok. MongoDB version? (just to be complete)
[12:00:16] <Mumla> I was just looking for it, here it is: "CONTROL [initandlisten] db version v3.2.4"
[12:00:40] <kurushiyama> Mumla: Ok.
[12:00:42] <Mumla> (webclient. I now, its a shame. was looking for a fast login...)
[12:01:03] <kurushiyama> Mumla: We can talk about that later. Wait a sec
[12:01:21] <kurushiyama> You should have a PM
[12:01:21] <Mumla> ok, no hurry
[12:21:52] <mroman> kurushiyama: are lots of small entries a problem for mongo btw?
[12:22:28] <kurushiyama> mroman: Nope, never was. There was some padding with MMAP
[12:22:37] <kurushiyama> mroman: But not really severe.
[12:22:49] <kurushiyama> mroman: with wT, it is even less of a prob
[12:28:44] <mroman> has anybody run into the issue of "oh crap... I need more than 16MB" issue?
[12:29:12] <kurushiyama> mroman: http://blog.mahlberg.io/blog/2015/11/05/data-modelling-for-mongodb/
[12:32:22] <mroman> that article is a little bit shot on actual information :).
[12:33:11] <mroman> embedding might be useful for m/r though
[12:33:26] <mroman> if you split your data to avoid embedding you can't just map over it anymore?
[12:33:39] <kurushiyama> mroman: Is it? ;) And no, embedding actually does more harm than good for m/r, generally speaking.
[12:35:51] <kurushiyama> mroman: Uhm. Aside from the fact that map/reduce has little to do with actual maps of the source data, the 16MB size limit is hardcoded. So regardless, you need to adapt your models.
[12:36:28] <mroman> yeah but assume I have a data set with a few rows.
[12:36:50] <kurushiyama> mroman: Read the #Conclusion of that blog post ;)
[12:36:53] <mroman> I can store them as {"dataset":[{row1},{row2},{row3}]} something like that
[12:37:05] <mroman> or I can each row as a document
[12:37:06] <kurushiyama> mroman: Sure. until you hit the size limit.
[12:37:29] <mroman> now let's say I need to do some analyzes on all rows together
[12:37:53] <kurushiyama> mroman: Uhm, that is why you use m/r and/or aggregations...
[12:38:02] <mroman> well yeah
[12:38:14] <mroman> but I can either have the document as a whole as an input to map
[12:38:21] <kurushiyama> mroman: Regardless on what you want and how you put it: the 16MB size limit is given. Model your data accordingly ;)
[12:38:43] <mroman> If I need to know what's in row2 to analyze row1
[12:38:52] <mroman> it'd be convenient if I could map over them as a whole entity
[12:40:57] <kurushiyama> mroman: Dependent analysis? In m/r jobs? Good luck.
[12:43:28] <mroman> well all the guys are screaming for distributed and m/r stuff...
[12:43:55] <mroman> but I think that a lot of what we want to do might be not well suited for it actually
[12:43:59] <kurushiyama> mroman: Well, it _really_ depends on your use cases.
[12:44:35] <mroman> we don't really know. We just have a shitload of data and we're looking for stuff in it.
[12:44:39] <kurushiyama> Storing and analyzing time series data? MongoDB was developed for storing and analyzing clickstreams. Go figure ;)
[14:03:06] <mroman> Can you aggregate to a temporary collection and then use find or stuff like that?
[14:03:16] <kurushiyama> mroman: Sure
[14:03:45] <mroman> I tried db.foo.aggregate().find :D
[14:03:48] <mroman> but that obviously is wrong
[14:04:05] <Derick> db.foo.aggregate()
[14:04:06] <Derick> no find
[14:04:21] <Derick> you can use $out into a collection, and then use find() on that one
[14:04:51] <mroman> yeah but then I have persistent temporary collections hanging around :(
[14:04:54] <kurushiyama> mroman: If you need to find stuff in your aggregation, use an additional $match
[14:05:25] <mroman> hm $sample should do
[14:05:26] <crodjer> In MongoDB `db.serverStatus().opcounters` does query mean `find` and `findOne`?
[14:10:57] <mroman> I'm too SQL damaged :(
[14:11:32] <kurushiyama> mroman: No worries, sooner or later you'll be cured ;)
[14:11:39] <mroman> ok so there's this {"rows":[{"row":1},{"row":2},...,{"row":10}]}
[14:12:16] <kurushiyama> mroman: If this is going to be lengthy, better use pastebin or sth.
[14:12:21] <mroman> ok
[14:13:43] <mroman> http://codepad.org/lxUNtkrW
[14:14:07] <mroman> I basically want to pick the document with the highest "row" number
[14:15:52] <mroman> (i.e. SELECT * FROM ... WHERE row = SELECT MAX(row) FROM ...)
[14:31:45] <mroman> how would I even just select all the hop values?
[14:35:45] <kurushiyama> Sorry, was away for a sec
[14:35:55] <mroman> eh
[14:35:56] <mroman> row values
[14:37:18] <kurushiyama> mroman: You'd $unwind them, sort them by "row.rows" descending, and use $first in a group stage, for example.
[14:43:29] <kurushiyama> mroman: > db.rows.aggregate({$unwind:"$rows"},{$sort:{"rows.row":-1}},{$group:{_id:"$id",maxRow:{$first:"$rows.row"}}})
[14:47:24] <mroman> that gives me "_id": null
[14:47:42] <mroman> sholudn't it be $_id?
[14:49:04] <mroman> "exceeded memory limit"
[14:49:05] <kurushiyama> mroman: I used the data you gave me. Mom
[14:49:06] <mroman> damn :D
[14:50:30] <Derick> hmm, you get a memory limit error. How many documents are you storing?
[14:50:39] <kurushiyama> mroman: http://hastebin.com/okudakujul.sm
[14:50:39] <mroman> around 6000
[14:50:51] <mroman> but allowDiskUse fixes it of course
[14:51:00] <mroman> well 6000 before unwinding
[14:51:03] <kurushiyama> well, with a lot of embedded docs... ;)
[14:51:05] <mroman> after unwinding there are probably um....
[14:51:15] <mroman> 300k documents probably
[14:51:36] <kurushiyama> mroman: Told you it is a pita with embedded docs
[14:55:28] <mroman> the order of things in the array determines the order of execution right?
[14:55:40] <mroman> so I can project, then group, then project again, then group again?
[14:56:06] <mroman> but 6k documents is nothing.
[14:56:48] <mroman> that's like not even 1% of the data.
[14:57:13] <mroman> (also the average document size is about 5k)
[14:58:01] <mroman> (5KB)
[15:02:24] <kurushiyama> mroman: Aye
[15:02:48] <kurushiyama> Think of it as a command pipe chain in Gnulix
[15:11:28] <basldex> hi
[15:11:41] <basldex> I'm getting a backtrace (=> segfault?) after changing bind_ip
[15:11:49] <basldex> only deinstall with purge fixes that issue
[15:11:52] <basldex> any known problems on that?
[15:12:20] <basldex> oh, it's signal 6, aborted, apparently
[15:12:40] <basldex> tried to make it listen to 0.0.0.0 now it says it does by default (apparently it doesn't, netstat shows bind to 127.0.0.1)
[15:13:02] <kurushiyama> basldex: Simply comment out bind_ip
[15:13:18] <basldex> abort stays btw. even after changing it back to 127.0.0.1
[15:13:51] <kurushiyama> basldex: And no, not that I know of (which does not say too much, though). OS, version, MongoDB version?
[15:16:06] <basldex> same problem after only commenting out
[15:16:17] <basldex> kurushiyama: 2.4.9 on ubuntu 14.04 lts
[15:16:40] <kurushiyama> basldex: You do some sort of experimental archeology?
[15:17:19] <basldex> here's the backtrace: https://gist.github.com/anonymous/4f2c2404225aea0f07399793c50c3ab2
[15:18:00] <basldex> I won't mess with my package system on production environments just to get an update, sorry
[15:18:11] <kurushiyama> basldex: You dont get it.
[15:18:28] <basldex> apparently
[15:18:33] <kurushiyama> basldex: If you do not update until 2.6 EOL, you are stuck
[15:19:08] <basldex> which is in october
[15:19:20] <kurushiyama> basldex: Yes. so you better plan now. ;)
[15:19:21] <Derick> 2.4 has already been EOL'ed
[15:19:41] <basldex> arr ok
[15:19:42] <kurushiyama> May well be that some lib was updated...
[15:20:14] <basldex> still that's a somehow weird strategy
[15:20:35] <kurushiyama> My bet goes on libboost fs.
[15:20:50] <basldex> as I said. I really don't want to mess around with my production machines' package managing
[15:21:04] <basldex> so 2.4 is like "works if you're lucky, otherwise gtfo"?
[15:21:14] <Derick> kurushiyama: I thought we compiled that in statically - but could be wrong. If it's an *Ubuntu* packaged mongodb, then, I think you're out of luck anyway
[15:21:24] <basldex> ubuntu, yes
[15:21:58] <Derick> they might do weird things too
[15:22:59] <basldex> ok, guess then there's no way to get around installing from different sources
[15:23:10] <kurushiyama> Derick: There is a reason why Ubuntu scares me of like a devil is from holy water.
[15:23:29] <Derick> kurushiyama: it's good as a desktop OS, but they make intersting packaging choices
[15:24:21] <kurushiyama> basldex: You almost always want to use MongoDB Inc's official packages, never failed me, though some of them could be reworked (and yes, I am in the process of doing so) ;=
[15:24:57] <basldex> it is an interesting packaging choice indeed to break important server software in your long term release
[15:25:08] <Derick> LTS is IMO a silly concept anyway.
[15:26:15] <basldex> maybe homoepathic only
[15:26:22] <kurushiyama> Derick: As a desktop OS I find it to be too bloated, and as a server OS? Yeah... "interesting" is probably the politest way to describe it. For example the "interesting" choices regarding the proc fs
[15:26:51] <Derick> kurushiyama: I run Debian unstable with xfce. Works For Me™
[15:27:24] <kurushiyama> Derick: OSX and elementaryOS as desktop, CentOS and RHEL as server.
[15:27:34] <basldex> ok 3.0.11 is fine now?
[15:28:06] <kurushiyama> basldex: It is
[15:28:15] <Derick> but it's better to stay with the latest
[15:28:21] <Derick> we fix bugs and stuff
[15:28:22] <kurushiyama> basldex: But you need to take an intermediate step
[15:28:26] <Derick> and make it faster
[15:29:25] <kurushiyama> Derick: Erm. I do not want to do finger pointing, but there were _extremely_ nasty bugs in wt integration when 3.0 was released. tbh, since then, I always stay a minor behind.
[15:29:45] <kurushiyama> basldex: Now worries, those problems are fixed now.
[15:30:07] <Derick> kurushiyama: yeah, premature does happen.
[15:31:41] <kurushiyama> basldex: And by any chance, unless you have _very_ good reasons not to, you might want to migrate to wiredTiger.
[16:03:00] <bros> Does anybody know if there are plans to add a query/$where to $lookup on aggregation?
[16:07:32] <cheeser> you mean $match?
[16:32:00] <diegoaguilar> bros, I think $match does what u are looking for
[16:32:10] <bros> I need the join of $lookup
[16:33:35] <kurushiyama> bros: More often than not, this is an indication of a flawed data model. Just saying. I _never_ missed JOINs
[16:33:46] <bros> kurushiyama: I have extremely relational data
[16:33:50] <bros> and absolutely should be on Postgres
[16:34:03] <kurushiyama> bros: Not necessarily.
[16:34:19] <bros> I have a collection "locations" with an id, a name, and a warehouse_id
[16:34:27] <bros> then I have "location_items" with a location_id and an item_Id
[16:34:32] <bros> Is that flawed?
[16:35:00] <kurushiyama> I just do not see where we need a JOIN, yet
[16:35:14] <bros> I need to know what locations exist and which items belong to which location.
[16:35:32] <kurushiyama> bros: I'd split that in two questions
[16:36:06] <kurushiyama> bros: Even more so since it is unlikely that you want to display ALL items of ALL locations SIMULTANEOUSLY.
[16:36:20] <bros> kurushiyama: It is for cache rebuilding
[16:36:24] <bros> I have it split into two queries
[16:36:29] <bros> but then it takes 3s to map them together in the API server
[16:37:04] <kurushiyama> bros: What is the use case to have _all_ items of _all_ locations gathered together?
[16:37:36] <bros> kurushiyama: Rebuilding a cache.
[16:38:32] <kurushiyama> bros: Sounds to me like a readthough cache would be more useful to you. But setting that aside: What sort of cache? What are you caching and why?
[16:38:47] <bros> Readthrough cache?
[16:39:14] <bros> My clients stores this cache client-side. It's a JSON blob. I store it in redis.
[16:39:26] <bros> I have a real-time warehouse management app that requires this data.
[16:40:14] <kurushiyama> bros: Set aside the language, but this pretty much explains it: https://github.com/golang/groupcache
[16:41:31] <kurushiyama> bros: And here is what I do not get. Basically, you tend to query either an item and want to find out where to get it. Or, you have a location and want to know what is stored there.
[16:42:57] <jr3> is there a way to sum the number of elements in an array from all docs?
[16:43:30] <kurushiyama> jr3: like you have 10 items in doc A and 20 items in doc B, the answer should be 30?
[16:43:38] <jr3> {array:[1,2]}, {array:[3,4,5]} -> 5
[16:43:41] <jr3> yes
[16:43:49] <kurushiyama> jr3: Mom
[16:44:22] <jr3> Ask my mom to tediously count them all?
[16:45:21] <kurushiyama> jr3: Moment, please
[16:46:04] <jr3> Would it be better to just have another field arrayCount keep it in sync the index it
[16:46:14] <jr3> s/the/then
[16:47:27] <kurushiyama> jr3: db.yourColl.aggregate({$project:{_id:1,count:{$size:"$yourArray"}}},{$group:{_id:"numberOfItems",count:{$sum:"$count"}}})
[16:48:31] <kurushiyama> oh, you can set _id:0 in the project stage,
[16:52:48] <kurushiyama> My pleasure
[18:28:39] <dddh_> https://youtu.be/Mhllo1xQer8
[18:35:39] <kurushiyama> dddh_: Beautiful, robost, secure, portable and scalable? Yes, that is funny!
[18:37:08] <dddh_> python > java
[18:38:10] <kurushiyama> Javatar and .not – .not bad
[18:39:59] <kurushiyama> dddh_: We could argue about that, but this is #mongodb and not #languagebash
[18:40:10] <dddh_> :)
[18:41:27] <dddh_> anyway in my case python is in java vm
[18:44:38] <kurushiyama> dddh_: I run native code ;P
[18:50:59] <dddh_> :)
[19:07:37] <bros> kurushiyama: no. i need to dump an entire collection, filtered by account_id
[19:11:11] <kurushiyama> bros: Would need detailed structure and expected results. But I am preparing dinner, so maybe tomorrow?
[19:44:12] <shlant> hi all. will I run into problems dumping a 3.2 db with a 3.0 client and then importing to a 3.2 db?
[19:44:29] <shlant> as in, does the client dumping effect the dump itself?
[19:44:58] <shlant> and if so, is that also true the other way around? 3.0 db dumped with 3.2 client and imported into 3.0 db
[20:14:38] <mkjgore> hey folks, just did a restore of our mongo cluster and I've noticed that the machines just don't perform as they used to. Even when sitting "idle" (our app isn't accessing the cluster as far as I can tell) there are times when iotop is showing almost constant reads across all the rep sets (sometimes in the Mb for a few seconds)
[20:14:47] <mkjgore> has anyone else seem this when restoring from mongo cloud?
[20:22:00] <crazyphil> kurushiyama: you weren't kidding about trying to export and import the monster setup I have running right now, it's going on 4 days and the import is only at 34%
[20:28:24] <StephenLynx> daium
[20:28:27] <StephenLynx> how big?,
[20:35:24] <kurushiyama> crazyphil: I could say "Told ya so!", but that would help neither of us...
[20:36:47] <kurushiyama> mkjgore: Across all replsets? Hmm, syncs?
[20:38:07] <kurushiyama> annoymouse: Shouldn't it read anonymouse?
[20:38:38] <annoymouse> kurushiyama: That's someone else. I'm annoymouse :P
[20:40:25] <kurushiyama> annoymouse: Reminds me of https://youtu.be/UE7WrEidvyY
[20:45:55] <kurushiyama> crazyphil: Thinking of it, it might be easier/faster to stop and build a temporary replset?