pmxbot IRC Log Viewer

[04:33:02] <mocx> i'm experiencing a weird phenomenon

[04:33:27] <mocx> in mongoose

[04:38:04] <mocx> when i console.log in a post save middleware of a object reference

[04:38:08] <mocx> i get an id

[04:39:25] <mocx> the following example is of an "item" schema that has a field reference to a "product" object

[04:39:32] <mocx> http://pastie.org/private/gugllntoibcyaadkkxd5q

[04:39:49] <mocx> when i console.log outside of the call to elastic search i get the id

[04:40:21] <mocx> when i console.log inside the elasticsearch call i get an object: http://pastie.org/pastes/10185789/text?key=fhmk92jzeyg9vbwyuwhz5q

[04:40:26] <mocx> why????

[06:32:55] <duncan_donuts> Pymongo v3… is it just me or is anyone else getting annoyed becuase the ObjectId is no longer manipulated back into the dict?

[06:33:29] <duncan_donuts> (after inserting)

[06:53:12] <Nikesh> Can anyone help me install MongoDB on Ubuntu 15.04? The apt-get install seems to want to use 'upstart', but apparently Ubuntu now uses systemd

[06:53:24] <Nikesh> initctl: Unable to connect to Upstart: Failed to connect to socket /com/ubuntu/upstart: Connection refused

[06:58:23] <Nikesh> I am leaving for now but will check the logs if anyone has some insight.

[07:51:56] <devdvd> hello everyone :)

[07:54:15] <devdvd> I am working on learning about mongo performance tuning and scaling and was wondering what the effect is of my mappedWithJournal value being higher than my available memory. Does this mean it's swapping out to the hard drive?

[07:54:50] <devdvd> the mapped value is less than physical memory

[08:50:44] <watmm> I have a mongodump cron job set up and recently it's started returning "warning: Failed to connect to x.x.x.x:27017, reason: errno:111 Connection refused", but i can't see anything in mongo's logs and (to my knowledge) the conf wasn't changed. Any idea where to look?

[08:54:23] <devdvd> watmm: is the mongodump being run on the same box as the mongo instance?

[09:14:51] <watmm> devdvd: Nope

[09:34:28] <devdvd> watmm: then the first thing i suggest is trying to establish a connection from the dump box to the mongo instance using something like telnet or nc

[09:35:30] <devdvd> connection refused usually means the target server (in this case mongo) is telling the source(in this case the box running the dump) to piss off

[09:36:19] <devdvd> or even better, try installing the mongo client on the box doing the dump and try to log in to your mongo instnace with it

[09:37:05] <devdvd> while doing this i'd suggest doing a tcpdump on the mongo box filtering by sourceip of the dump server

[09:37:15] <devdvd> this will tell you if traffic is even getting to the server

[09:37:24] <devdvd> also check your firewalls

[09:37:36] <devdvd> make sure someone hasn't blocked something

[09:38:16] <devdvd> if you're running mongo on windows and have windows firewall enabled, it could have auto added a rule blocking that server

[09:41:52] <watmm> The packets are getting through alright. telnet just gives the same connection deniedx

[09:43:30] <watmm> Do you have to set up a deny/allow on the server? 'cause i don't see one

[10:00:50] <devdvd> windows of linux?

[10:00:55] <devdvd> the mongo server

[10:09:31] <bigsky> hi all

[10:09:42] <bigsky> what is the difference between mongo and mongo localhost ?

[10:10:26] <Derick> nothing? I might not fully understand the question

[10:12:46] <joannac> "mongo localhost" connects to the mongod on 127.0.0.1, port 27017, to the database "localhost"

[10:12:49] <bigsky> Derick: when i connect mongodb service running my local machine using mongo localhost, i cannot find the collection which can be shown when i connect with mongo

[10:12:55] <Garito> hi!

[10:13:03] <bigsky> Derick: what's wrong?

[10:13:05] <Garito> I'm playing with docker-compose and mongodb

[10:13:12] <joannac> bigsky: ^^

[10:13:22] <bigsky> joannac: what mongo which connect if running whithout localhost?

[10:13:23] <Garito> I would like to know how to start a replicaset automatically

[10:13:27] <Garito> is it possible?

[10:13:28] <Garito> thanks

[10:13:28] <Derick> bigsky: what joannac wrote

[10:13:44] <joannac> bigsky: rephrase. i have no idea what that means

[10:13:52] <Derick> bigsky: by default, "mongo" connects to the "test" database

[10:14:00] <Derick> you are connecting it to the "localhost" database

[10:14:37] <Derick> derick@whisky:~ $ mongo

[10:14:37] <Derick> MongoDB shell version: 2.6.6

[10:14:38] <Derick> connecting to: test

[10:14:41] <Derick> derick@whisky:~ $ mongo localhost

[10:14:41] <Derick> MongoDB shell version: 2.6.6

[10:14:42] <Derick> connecting to: localhost

[10:15:09] <bigsky> Derick: why localhost database? isn't localhost translated to the local ip which is 127.0.0.1?

[10:15:13] <Derick> no

[10:15:19] <joannac> Derick: pastebin! ;)

[10:15:25] <Derick> if you want to connect to a different IP, you need:

[10:15:30] <bigsky> Derick: MongoDB shell version: 2.6.1

[10:15:38] <Derick> mongo localhost/demo (for the demo database on 127.0.0.1)

[10:15:48] <joannac> bigsky: mongo <singleword> connects to 127.0.0.1:27017 to the database <singleword>

[10:16:01] <Derick> joannac: I didn't paste it! :P

[10:16:01] <joannac> as long as the singleword has no dots and no colon

[10:16:47] <joannac> I personally find it confusing... but that's how it goes

[10:16:58] <Derick> yeah, I wish it would supported the URI strings :)

[10:17:18] <Derick> you can also do: mongo --host localhost dbname

[10:17:49] <joannac> yeah, it's annoying that it doesn't support mongodb uri format

[10:18:09] <Derick> i thought I filed an issue for it at some point

[10:18:26] <joannac> you probably did. i vaguely remember it

[10:19:10] <joannac> Derick: server-6233 ?

[10:20:51] <Derick> yeah

[10:20:56] <Derick> not mine though

[10:21:13] <Zelest> o/ Derick

[10:21:14] <Zelest> ltns

[10:21:24] <Derick> hi hi

[10:21:38] <Derick> joannac: vote for it :P

[10:22:03] <joannac> bossy :p

[10:22:16] <Derick> hehe

[10:22:20] <Derick> congrats btw

[10:22:33] <Garito> hi!

[10:22:43] <Garito> need some help with replica sets and docker

[10:22:50] <Garito> could you help me, please?

[10:22:52] <Garito> thanks!

[10:23:11] <joannac> Derick: thanks! :)

[10:23:15] <Derick> sorry - don't know anything about that Garito

[10:23:30] <Garito> @Derick perhaps is not related with docker

[10:23:40] <Garito> the question is how to automatic start a replica set

[10:23:44] <joannac> Derick: do you have a replset handy? apparently you cna connect to a replset now. http://docs.mongodb.org/manual/reference/program/mongo/#cmdoption--host

[10:23:54] <Derick> joannac: oh neat, but I don't

[10:23:58] <Garito> I have an up and running replica set

[10:24:02] <Garito> but started by hand

[10:24:09] <Garito> I would like to start it automatically

[10:24:14] <Garito> is it possible

[10:24:15] <Garito> ?

[10:24:16] <joannac> Garito: i guess in theory you could pre-load the image with a replset config?

[10:24:27] <Garito> sure

[10:24:32] <Garito> that's what I have

[10:24:36] <joannac> okay

[10:24:39] <Derick> Garito: started up by hand, how?

[10:24:45] <Garito> but then I need to connect to the mongodb container

[10:24:49] <Garito> start mongo client

[10:24:55] <Garito> and start the replicaset by hand

[10:25:04] <joannac> define "start the replicaset"

[10:25:06] <Derick> how do you start it by hand?

[10:25:07] <Garito> would like to do so with a script or something

[10:25:09] <Garito> sure

[10:25:18] <Garito> let me check the actual command I use

[10:25:21] <joannac> like, rs.initiate()?

[10:25:26] <Garito> yeah!

[10:25:30] <Garito> exactly that

[10:25:34] <joannac> okay

[10:25:47] <joannac> so why can't you start docker with the database files?

[10:25:57] <Garito> no, no

[10:26:00] <Garito> I have started it

[10:26:08] <Garito> and start the replicaset

[10:26:18] <Garito> now robomongo reads the databases and so on

[10:26:21] <Garito> as expected

[10:26:33] <Garito> but I fear that when I stop the thing it lost the configuration

[10:26:39] <joannac> okay. so why can't you save the image complete with database files?

[10:26:39] <Garito> will be this the case?

[10:26:55] <Garito> let me try something

[10:26:56] <joannac> no idea, i have no idea what docker saves

[10:27:16] <Garito> is not a docker thing is a mongo one isn't it?

[10:27:21] <joannac> no

[10:27:27] <Garito> no?

[10:27:33] <Derick> Garito: replica set configuration is persistent though - just starting all the nodes up after a "reboot" works as well.

[10:27:34] <joannac> when you start a mongod, you start it with a database path

[10:27:39] <Garito> so if I start a replicaset with the rs.initiate()

[10:27:53] <Garito> it configure this and persist the configuration?

[10:27:58] <Derick> Garito: yes

[10:28:07] <joannac> as long as you don't touch those files, yes you can start/stop as much as you like

[10:28:07] <Garito> no

[10:28:13] <Garito> I'm checking right now

[10:28:18] <bigsky> Derick: then how to connect to local mongodb service from remote?

[10:28:20] <joannac> and the replica set will come up again

[10:28:31] <joannac> bigsky: IP address or hostname?

[10:28:36] <Derick> bigsky: mongo 192.168.1.222/test

[10:28:50] <Derick> bigsky: specify the IP address, then / + dbname

[10:28:54] <Garito> no you are right seems

[10:28:58] <bigsky> Derick: how can i read that pic collection which can be seen using mongo other than mongo localhost/pic

[10:29:05] <Garito> seems that the replica set is configured

[10:29:07] <bigsky> Derick: how can i read that pic collection which can be seen using mongo other than mongo mongodb://localhost/pic

[10:29:16] <Garito> no

[10:29:17] <Garito> no

[10:29:18] <Garito> sorry

[10:29:23] <Garito> the robomongo said this:

[10:29:29] <Garito> error: Unable to load database names

[10:29:30] <Derick> Garito: can you please use your return key a bit less?

[10:29:50] <Derick> bigsky: sorry, what?

[10:30:15] <Garito> when I try to access the databases with robomongo it raise the error

[10:30:48] <joannac> Garito: what's that got to do with the replica set?

[10:30:52] <Garito> but rs.printReplicationInfo() say that the replica set is initialized

[10:31:07] <joannac> Garito: that suggests the mongod is not reachable

[10:31:41] <Garito> so what to do?

[10:32:04] <joannac> Garito: type rs.conf() and make a note of what the host/port says. is that the same host/port you're using for robomongo?

[10:32:11] <Garito> ok

[10:32:46] <Garito> there is no info about any host/port sorry

[10:32:51] <Garito> pastebin?

[10:33:06] <joannac> Garito: sure

[10:33:19] <Garito> http://pastebin.com/vqUFgwmd

[10:34:03] <joannac> Garito: dude, line 7

[10:34:08] <Derick> "host" : "d4b0bf5717ba:27017",

[10:34:17] <Garito> sorry

[10:34:27] <Garito> I was expecting an ip + port

[10:34:39] <Garito> how to access this then?

[10:34:43] <Derick> replicaset configurations use resolved hostnames

[10:34:46] <Garito> it is the first time I see this

[10:35:35] <Garito> don't get it, sorry, this will be my first replicaset

[10:35:51] <Garito> mongodb://d4b0bf5717ba:27017

[10:35:52] <joannac> is the hostname resolvable?

[10:35:54] <Garito> like this?

[10:35:56] <Garito> sure

[10:36:07] <Garito> I've connected correctly when I initiated it

[10:36:14] <joannac> i dunno what format robomongo expects

[10:36:22] <joannac> but sure, if it accepts a URI

[10:36:38] <Garito> ok

[10:36:41] <Garito> let me check

[10:37:23] <Garito> no, with d4b0bf5717ba can't connect

[10:37:46] <Garito> this replicaset is on a computer on 192.168.1.8 with the default port opened

[10:37:59] <joannac> try the IP then

[10:38:21] <Garito> it connects but when I try to get the database list it fails with the error I point you before

[10:38:29] <joannac> what's the error?

[10:38:35] <Garito> error: Unable to load database names.

[10:38:46] <joannac> that's not useful

[10:38:56] <Garito> it is what raises robomongo

[10:39:01] <joannac> it could be a robomongo issue for all i know

[10:39:02] <Garito> let me check with mongohub

[10:39:34] <Derick> Garito: just try the mongo shell: mongo -host 192.168.1.8

[10:39:39] <Derick> Garito: just try the mongo shell: mongo --host 192.168.1.8

[10:39:45] <Derick> then: show databases

[10:39:53] <Garito> I haven't installed the mongo cli in my computer only in the container

[10:40:00] <Garito> let me see if It works there

[10:41:08] <Garito> mongohub do the same that robomongo: it connects and gives back the server configuration but don't list any database

[10:41:34] <bigsky> Derick: http://pastie.org/10186185 why i failed to connect

[10:41:38] <Garito> perhaps is because the node is not defined as master nor slave?

[10:42:00] <pamp> hi

[10:42:00] <Garito> how can I define the role in the mongodb.conf yaml file?

[10:42:24] <joannac> bigsky: mongo localhost:27017

[10:42:34] <pamp> how can I get the larger document in a collection?

[10:42:55] <joannac> Garito: try in the mongo shell

[10:43:02] <Garito> in mongohub is a repl section with info that says: Does not have a valid replica set config

[10:43:02] <bigsky> joannac: seems i have to use 27017, why

[10:43:18] <joannac> that's the port number?

[10:43:33] <Garito> here is my config file: http://pastebin.com/85x1Bt1Z

[10:43:43] <Garito> I believe I miss some config

[10:43:51] <Garito> is this true?

[10:44:21] <Garito> I need this replica set to use it with mongo-connector

[10:44:33] <joannac> Garito: run rs.conf()

[10:44:40] <joannac> pamp: huh?

[10:45:07] <Garito> http://pastebin.com/CnyTqhBj

[10:46:31] <joannac> looks fine to me.

[10:47:06] <Garito> so, what do you recommend?

[10:47:18] <Garito> sorry for be so newbie with that...

[10:47:51] <joannac> use the mongo shell

[10:47:58] <Garito> perhaps is because it is not defined any master?

[10:49:32] <joannac> unlikely. you should always be able to connect

[10:49:44] <pamp> I want to know which is the larger document size in a collection?

[10:49:49] <Garito> perhaps this will help: http://pastebin.com/7dBYAB0v

[10:49:51] <joannac> also, that node should be the primary anyway

[10:50:08] <Garito> check my last pastebin about that

[10:50:48] <Derick> pamp: you need to scan them all, and check yourself

[10:50:58] <Garito> If I try to show dbs it raises this: http://pastebin.com/jjKWtB81

[10:54:40] <joannac> Garito: rs.status()

[10:55:11] <Garito> http://pastebin.com/r7jD9kjM

[10:55:17] <Garito> it is not member

[10:55:29] <joannac> right

[10:55:33] <joannac> there's your problem?

[10:55:34] <Garito> so rs.initiate is not enought isn't it?

[10:55:37] <Garito> perhaps

[10:55:44] <Garito> I'm complety lost on this

[10:56:00] <joannac> I don't know what you did, but it wasn't rs.initiate()

[10:56:02] <Garito> I always use mongodb without replicas

[10:56:20] <Garito> let me check

[10:56:34] <joannac> Garito: what's the hostname of this host?

[10:56:44] <Garito> pepe.sistes.net

[10:56:48] <joannac> right

[10:56:49] <Garito> or 192.168.1.8

[10:57:19] <joannac> so where did "d4b0bf5717ba" come from?

[10:57:24] <Garito> no idea

[10:57:42] <Garito> I'm checking dns entries righnow

[10:57:59] <Garito> sorry was app.timefouner.com

[10:58:12] <Garito> app.timefounder.com but it makes any difference?

[10:58:47] <joannac> I dunno how you could've gotten a config with "d4b0bf5717ba"

[10:59:32] <Garito> the thing is that I could connect and list the databases when i manualy start the replica set (or the thing I've done)

[10:59:54] <joannac> okay, well you did something since then :)

[11:00:09] <Garito> stop the containers nothing else, I promise :)

[11:00:48] <Garito> this is why I was asking about automatic start, because I has believing that the replica needs to be started every time

[11:01:28] <Garito> I've started (teorically) with: rs.initiate()

[11:01:43] <Garito> as recommended here: https://github.com/10gen-labs/mongo-connector/wiki/Getting%20Started

[11:02:01] <Garito> I'm trying to follow the mongo-connector instructions

[11:03:15] <joannac> does restarting the container give you a whole new hostname?

[11:03:46] <Garito> from which context?

[11:03:54] <Garito> because one thing is the docker container name

[11:04:02] <Garito> another the dns entry for the docker bridge

[11:04:19] <Garito> another how its name inside the containers infrastructure

[11:04:30] <joannac> Garito: in the mongo shell, the output of hostname()

[11:04:33] <Garito> I understand that docker puts another layer of probles

[11:05:38] <Garito> checking if the node name changes between containers restart

[11:06:16] <Garito> yeah

[11:06:21] <Garito> it changes between restarts

[11:06:30] <Garito> would be this the problem?

[11:06:35] <joannac> yes

[11:06:37] <Garito> ok

[11:07:04] <Garito> I need to figure out how docker people solve this issue. I will contact here if I can't solve this

[11:07:08] <Garito> thanks a log @joannac

[11:07:14] <Garito> a lot I mean

[11:07:17] <joannac> np

[11:07:21] <Garito> cheers

[13:26:20] <fxmulder> so since we need to keep the oplog, I'm guessing I want to rsync the enmtire data directory over including the local directory

[13:26:30] <fxmulder> local db

[13:29:53] <ciwolsey> whats the best way to do something like .update({_id: whatever}, {$set: {something: {} } });

[13:30:20] <ciwolsey> to blank something out

[13:30:27] <ciwolsey> not remove.. just make sure its empty

[13:38:27] <fxmulder> waiting for 6 pings from other members before syncing

[13:38:36] <fxmulder> what does that mean exactly?

[13:39:04] <fxmulder> this host has been the only primary for a while and there should be no pending operations

[13:39:10] <fxmulder> so nothing to sync

[13:41:37] <joannac> fxmulder: ?

[13:41:48] <joannac> what are you syncing?

[14:12:29] <fxmulder> joannac: I rsynced one replica to another

[14:13:01] <fxmulder> bringing both up seems to have worked, was just unsure since the primary wasn't coming up

[14:17:38] <pamp> its is normal when we separate index and data into different drives, the read performance decrise a lot?

[14:19:22] <pamp> I separated the data, index, journal and logs into different drives.. the write performance increased a lot. but the read performance decrease too much

[14:20:27] <pamp> I should not separate index and data?

[15:10:09] <mrmccrac> snappy compression in mongo 3.0 — is the compression of one document independent of compression in a different doc? or will docs in the same collection benefit from other docs?

[15:18:15] <GothAlice> mrmccrac: AFIK the compression is performed against chunks of the on-disk stripes; it's not document-aware.

[15:18:28] <GothAlice> (And also not collection aware.)

[15:18:44] <mrmccrac> oh good to hear

[15:19:19] <GothAlice> However, as a word of warning, WiredTiger has some serious outstanding issues that affect performance, reliability, and data safety.

[15:20:23] <mrmccrac> ive seen a few peculiar things i guess :)

[15:20:36] <GothAlice> https://jira.mongodb.org/browse/SERVER-17424 https://jira.mongodb.org/browse/SERVER-17456 https://jira.mongodb.org/browse/SERVER-16311 (See also: https://jira.mongodb.org/browse/SERVER-17386)

[15:20:44] <GothAlice> These are just the ones that effect me. ;)

[15:21:11] <GothAlice> Under nominal load I can crash a primary every 15-30 seconds. :D

[15:21:20] <GothAlice> (Using WiredTiger.)

[15:22:00] <mrmccrac> we were crashing it until we added --wiredTigerEngineConfigString=hazard_max=10000

[15:26:34] <GothAlice> The OOM-killer, at one point, even chose to kill a kernel process that panicked the entire VM. That was fun to see. (I enjoy watching things explode and burn too much, I think. ;)

[15:26:48] <cheeser> burnination!

[15:27:37] <GothAlice> You can never have too much burnination… in a development environment.

[15:28:40] <mrmccrac> guess they fixed this one https://jira.mongodb.org/browse/SERVER-17614

[15:29:17] <GothAlice> Yeah, 3.0.2.

[15:35:53] <boutell> Hi. I am puzzling over the authSource option, which specifies “the source of the authentication credentials.” Apparently this allows the client to say where to verify the password it’s providing… does that make any sense from a security standpoint? Why would the server allow me to say “hey, verify my password against this database of *my* choosing”? That’s why I am pretty sure I have misunderstood what it

[15:35:53] <boutell> really does.

[15:40:13] <GothAlice> boutell: Users are tied to databases.

[15:40:31] <GothAlice> boutell: This has the effect that a user can be granted permission on a different database than the one storing the user credentials.

[15:41:13] <GothAlice> boutell: For example, a global "admin" user with permission to read anything. To connect to random database you'll need to specify that the account credentials are actually coming from the "admin" database, or it won't find the user.

[15:47:13] <boutell> GothAlice: okay, but apparently I can specify any database, such as one I have ReadWrite access to, and store credentials saying I can access some other database?

[15:50:52] <GothAlice> If you have UserAdmin on that database, then you can manage users there.

[16:26:07] <boutell> GothAlice: but isn’t this crazypants? It means there’s this permission I can grant on database A which is in reality a ticket to blanket permission on databases B-Z.

[16:26:22] <boutell> it does sound like ReadWrite does not imply UserAdmin, which is good of course.

[16:26:37] <GothAlice> Well, thus you should really carefully control who gets UserAdmin. ;)

[16:27:09] <boutell> yes… I’m wondering why there isn’t simply a single admin database, period. Maybe if I understood when this would ever be a good idea beyond the admin database...

[16:27:45] <GothAlice> For the most part my database users are isolated to their databases, i.e. the only account in the "admin" databases is "admin".

[16:27:51] <GothAlice> Not all are, however.

[16:28:38] <boutell> is there some kind of perf win with that or something? As opposed to all users being in the admin database?

[16:28:48] <GothAlice> For example, I have a "rita" user that has permissions on "production" and "staging" databases. The "rita" user is in the production database.

[16:29:20] <GothAlice> Well, for one, if the user is local to the database you don't need to specify the authenticationDatabase when connecting.

[16:29:24] <GothAlice> That's convienent.

[16:29:33] <GothAlice> Isolation of concerns is useful, too. Nuke the DB, the users go with it.

[16:29:47] <cheeser> specifying that is always annoying...

[16:30:02] <boutell> I see.

[16:30:11] <boutell> I do recall encountering that annoyance.

[16:30:47] <boutell> although, if mongo were designed to only use the admin database for users, you wouldn’t have to specify. The point about nuking users and the database with a single command makes sense I guess.

[16:31:25] <GothAlice> Or you can functionally group your users.

[16:31:39] <GothAlice> Create a database called "shared" for your "shared hosting" clients, etc.

[16:31:49] <GothAlice> (Store the users in "shared", with permissions on their own DBs.)

[16:32:02] <GothAlice> The flexibility means that even use cases I can't imagine right now can be covered.

[16:32:47] <GothAlice> (It also means there are no special cases. Users are users, users live in a DB, users have permissions on DBs. No magic about the "admin" database, etc.)

[16:34:51] <pamp> Hi

[16:34:54] <pamp> it is possible to have a cluster, where shards (replica set) are linux environments and the router / config servers are in Windows environments?

[16:38:16] <GothAlice> pamp: Yes, mixing platforms shouldn't matter in that scenario. It shouldn't really matter in any scenario, but having differing IO performance on data-carrying replicas can exhibit some really strange issues.

[16:39:19] <GothAlice> (Like replica secondaries falling far enough behind to re-sync.)

[16:39:25] <GothAlice> In your scenario, though, that won't be an issue.

[16:42:17] <pamp> all server will be on linux environment

[16:42:50] <pamp> only configs and router will be in windows environment

[16:43:02] <pamp> thanks

[16:44:05] <GothAlice> Yup. :)

[16:46:14] <boutell> is anybody keeping a terabyte of data that doesn’t shard easily in MongoDB? Do you feel it makes sense at that size or would you use something else in that scenario?

[16:47:17] <StephenLynx> why wouldn't it be easily sharded?

[16:51:19] <GothAlice> boutell: All data is shard-able, even if it just gets allocated to a given shard by hash (random, even distribution) instead of something more "intelligent".

[16:51:30] <GothAlice> With the exception to the above being capped collections.

[16:51:37] <GothAlice> Those can't be sharded for performance reasons.

[16:52:17] <GothAlice> boutell: Sharding makes sense any time the amount of data you have exceeds the amount of RAM you have. Sharding is the only real way to efficiently reduce the per-host data size… by adding more servers.

[16:52:27] <boutell> StephenLynx: that’s an entirely fair point, it’s a pretty arbitrary insistence on my part.

[16:53:02] <StephenLynx> usually arbitrary decisions lead to bad practical consequences.

[16:53:26] <boutell> GothAlice: OK. So on a per-machine basis though, it really isn’t a good idea to go beyond RAM size, while MySQL has been optimized somewhat for that case. (I’m not trying to start a flamewar here, I’m looking at pros and cons and use cases)

[16:53:35] <boutell> (I use MongoDB pretty much exclusively right now & love it)

[16:54:54] <boutell> (I assume that swap partitions don’t work around that especially well?)

[16:54:54] <StephenLynx> cases where you want to use mongo: lots of read and write, constantly horizontal scaling, no need for joins on data.

[16:55:45] <boutell> that makes sense and it’s a pretty conservative and sane way of defining it StephenLynx. Although in practice, I do oodles of joins with mongo, because even when I have to implement my own joins, it’s still vastly better for the kind of data structures I’ve got.

[16:55:49] <StephenLynx> where you don't want to use mongo: need for relational integrity and joins, graph relation

[16:56:28] <boutell> interestingly, in our CMS work, both graph relationships and joins are huge, although we don’t need strict integrity - it’s not hard in practice to just not fetch things that ain’t there no mo

[16:57:00] <StephenLynx> have you benchmarked it?

[16:57:10] <StephenLynx> is not that is hard to do, it just isn't optmized at all.

[16:57:31] <StephenLynx> with a relational db it is built from the ground to perform the joins before sending the data to your application.

[16:58:14] <StephenLynx> with a graph db the same to traverse recursive relations.

[16:58:28] <boutell> StephenLynx: yes, I understand. In practice, for a CMS a single VPS can typically serve even a large and substantially popular site, like a major college for instance. And that means the latency between you and mongo is really, really small, which cuts down on the pain of making multiple queries to handle the joins and relations.

[16:59:28] <boutell> mongo *could* support joins though. It’s not as if mongo is conservative about adding support for features that aren’t traditionally considered “lean and mean and NoSQL"

[16:59:28] <StephenLynx> if you are just going to justify it by comparing the hardware you got and the load you have to handle, then any problem could be justified by just throwing more resources until the problem is small enough to be handled.

[16:59:57] <StephenLynx> I am not saying it can't be done, I am saying it could be done better.

[17:00:25] <StephenLynx> the whole point of picking a tech for a project is handling with increasing requirements.

[17:00:36] <StephenLynx> dealing*

[17:01:05] <StephenLynx> so today you have to deal with a single campus. then you need to integrate an international network of campus and your server bites the dust.

[17:01:06] <boutell> mmm, not every project is intended for “webscale” or will ever be “webscale” or has to be based on that open-ended worry

[17:01:08] <GothAlice> boutell: In the CMS example, there are data design considerations that allow you to not need joins.

[17:01:29] <StephenLynx> then just pick a name from a hat and roll with it.

[17:01:37] <boutell> I would actually say that the vast majority of projects will never be webscale and shouldn’t be twisted into that direction

[17:02:10] <StephenLynx> don't ask for pros and cons if you are just going to say the project is too small for said cons and pros matter.

[17:02:50] <boutell> StephenLynx: I should clarify. I asked the original question out of curiosity. I am not saying it is in any way relevant to our CMS work. I brought that up later in the conversation because we were talking about joins.

[17:02:51] <GothAlice> https://gist.github.com/amcgregor/901c6d5031ed4727dd2f#file-taxonomy-py-L23-L35 < my CMS stores a parent reference, parents list of references, name, coalesced path, and sort order for each "Asset" in the CMS tree. Querying for /foo/bar/baz/diz will perform a single query looking for a path $in ['/foo', '/foo/bar', '/foo/bar/baz', '/foo/bar/baz/diz'] sorted by -path, taking the first result. This gives the "deepest" matching document from

[17:02:51] <GothAlice> the DB.

[17:03:15] <boutell> GothAlice: yeah, we do pretty much the same thing.

[17:03:34] <boutell> my 1TB question <> anything to do with the CMS

[17:03:47] <GothAlice> boutell: Extra special trick: the ACLs of all parents are cached in the children, allowing us to avoid a second lookup to get the ACLs of the parents for security validation.

[17:04:04] <StephenLynx> which you just said it isn't easily sharded because you don't want to shard.

[17:04:46] <boutell> StephenLynx: no, I said I was curious about what would happen if it wasn’t easily sharded, it was a hypothetical. I don’t actually have a webscale problem to solve. I just wanted to further my understanding of what Mongo does and doesn’t claim to do, for some future project, maybe. <— likes learning things

[17:05:23] <boutell> I then got onto the subject of joins in a way that generated confusion.

[17:09:44] <boutell> . o O I don’t work for a startup. If you work for a startup, considerations are probably very different. If you don’t have millions of users, you’ve failed, straight up

[17:10:22] <boutell> . o O so yes, you have to pick tools entirely on scalability first, convenience second

[17:24:28] <RWOverdijk> Hello :) I've been able to get mongo to retrieve records near a specific location (geospatial). Now, I'd like to first sort records by a date field, and then sort that result based on distance... In SQL terms, `order by start_date, distance`. I don't know where to get started.

[17:27:09] <StephenLynx> are you using find to get a cursor?

[17:28:54] <RWOverdijk> StephenLynx, Forgive me my ignorance but, what's a cursor? I'm currently using collection.db.command({geoNear : "event", near: [] ... to find the records.

[17:29:26] <StephenLynx> command? I have seen that before.

[17:29:30] <StephenLynx> never used though.

[17:29:43] <StephenLynx> what platform is your driver for?

[17:30:08] <StephenLynx> a cursor is what you get from the find command.

[17:30:26] <RWOverdijk> Okay I'm going to guess here... It's either node.js, mac os x (unix) or something completely different you're asking.

[17:30:32] <StephenLynx> yeah, node.

[17:30:41] <RWOverdijk> I really don't like being a noob again.

[17:31:00] <StephenLynx> let me check geo queries, never used them

[17:31:21] <RWOverdijk> All I can find is near, within etc.

[17:32:29] <StephenLynx> http://mongodb.github.io/node-mongodb-native/2.0/api/Collection.html#geoNear here

[17:32:46] <StephenLynx> you can run the command on the variable representing the collection

[17:34:12] <StephenLynx> so you run geonear first, then by the driver documentation it will yield the result you are already obtaining in the second parameter of the callback.

[17:34:48] <StephenLynx> either the result will be an array or a cursor.

[17:35:01] <RWOverdijk> And by "the command" you mean, sort?

[17:35:29] <RWOverdijk> Because I'm looking for a way to do it the other way around

[17:35:36] <StephenLynx> db.collection('collectionName') returns an variable that represents the collection. on this variable you can run functions like find,aggregate

[17:35:52] <RWOverdijk> Ah....

[17:36:10] <StephenLynx> so you don't need to call command and then specify which command you wish to run, you just run the function that represents the command.

[17:36:11] <RWOverdijk> Does that hold all data, or will it not hold anything until I execute it?

[17:36:21] <RWOverdijk> Because it's a large database :p

[17:36:31] <StephenLynx> that what cursors are for.

[17:36:40] <StephenLynx> they don't hold the documents, but references to them.

[17:36:52] <RWOverdijk> Got it.

[17:37:04] <StephenLynx> some operations yields cursors, like find and other yields arrays, like aggregate.

[17:37:44] <StephenLynx> the thing is, the driver doc just specify "result callback" and just says the second parameter of the callback is an object.

[17:37:44] <RWOverdijk> I'll look up cursors.

[17:38:18] <StephenLynx> but

[17:38:23] <StephenLynx> let me check something

[17:38:32] <RWOverdijk> You're too kind, thank you.

[17:39:19] <StephenLynx> nah, I am just idle at work :v

[17:39:43] <StephenLynx> ah

[17:39:48] <RWOverdijk> http://docs.mongodb.org/manual/reference/command/geoNear/#behavior

[17:39:58] <RWOverdijk> geoNear sorts documents by distance. If you also include a sort() for the query, sort() re-orders the matching documents, effectively overriding the sort operation already performed by geoNear.

[17:40:10] <StephenLynx> you can also use the $geoNear aggregation pipeline

[17:40:21] <StephenLynx> and sort in the aggregation

[17:40:28] <RWOverdijk> I should really read up on the terminology

[17:40:31] <StephenLynx> http://docs.mongodb.org/manual/reference/operator/aggregation/geoNear/#pipe._S_geoNear

[17:40:53] <RWOverdijk> An aggregate is a collection of x.

[17:41:07] <RWOverdijk> So what's x in this sense?

[17:41:52] <StephenLynx> an aggregate is a series of operations that are performed at once in the database before returning to your application from what I understand that will return an array.

[17:42:06] <StephenLynx> it includes some complex operations you can't use commands

[17:42:10] <StephenLynx> like unwind and group.

[17:42:25] <RWOverdijk> It looks like a performance blow

[17:42:39] <RWOverdijk> It'll apply my operations on all results

[17:42:54] <StephenLynx> yeah, GothAlice mentioned that usually they are slower, but sometimes you need those operators.

[17:43:06] <RWOverdijk> Mhm.

[17:43:13] <StephenLynx> there is nothing else that will perform those operations.

[17:43:19] <GothAlice> Exactly. If you can get away using standard queries, it's generally recommended to do so.

[17:43:38] <RWOverdijk> I wish I could apply distance sort as a secondary sory.

[17:43:40] <GothAlice> However, sometimes a proper data processing pipeline, with the extra overhead, is either acceptable or required to answer a given query, as StephenLynx says.

[17:43:41] <RWOverdijk> sort*

[17:44:05] <StephenLynx> you probably can.

[17:44:11] <StephenLynx> there is a sort command.

[17:44:13] <RWOverdijk> It's for meetings.

[17:44:15] <StephenLynx> for cursors.

[17:44:23] <RWOverdijk> I first needs what's earliest, and then sort by distance

[17:44:49] <StephenLynx> and since you are using node, I suggest taking a look at io.js.

[17:44:56] <StephenLynx> if you don't know it yet.

[17:45:00] <RWOverdijk> I do :)

[17:45:21] <RWOverdijk> How can io.js help me out?

[17:45:31] <StephenLynx> it is more stable and faster.

[17:45:36] <RWOverdijk> Because I'm already running it

[17:45:37] <RWOverdijk> Ah

[17:45:43] <RWOverdijk> Is it faster though?

[17:45:48] <StephenLynx> and all contributors not working for joyent are working on it.

[17:45:52] <RWOverdijk> Benchmarked it last week, it was about the same

[17:46:02] <StephenLynx> about 20% from a benchmark I saw.

[17:46:06] <RWOverdijk> Mhm

[17:46:11] <RWOverdijk> Was it perhaps opinionated?

[17:46:17] <StephenLynx> plus on version 2.0 it implemented a url parser that is about 20x faster.

[17:46:28] <RWOverdijk> I'm using io.js, don't get me wrong... I'm just curious :)

[17:46:33] <StephenLynx> dunno, it seemed very thorough

[17:46:45] <StephenLynx> plus they are actually releasing updates, unlike node

[17:46:46] <RWOverdijk> Okay so, I either use an aggregate or keep looking?

[17:47:08] <StephenLynx> yeah, but I am pretty sure you can use a sort on that cursor.

[17:47:39] <RWOverdijk> I'll look up cursors

[17:47:41] <RWOverdijk> thanks

[17:47:44] <StephenLynx> np

[17:48:40] <RWOverdijk> Reading http://docs.mongodb.org/manual/aggregation/ and http://docs.mongodb.org/manual/core/cursors/

[17:48:46] <RWOverdijk> Are those good?

[17:48:49] <StephenLynx> yes.

[17:48:56] <StephenLynx> official docs for mongo are quite good.

[17:49:11] <StephenLynx> and aggregation will solve many hard to tackle queries.

[17:49:43] <StephenLynx> but cursors will be the answer when you have to deal with larger amounts of data.

[17:49:50] <StephenLynx> because aggregate will give you everything at once

[17:49:57] <RWOverdijk> I'm putting mariadb and mongo to the test with this query

[17:50:28] <StephenLynx> from what I heard mariadb doesn't hold a candle to mongo if you don't have to join data.

[17:50:30] <RWOverdijk> I have about 3 mil records to query now

[17:50:51] <RWOverdijk> I do have to join unfortunately

[17:50:55] <StephenLynx> hm

[17:50:59] <RWOverdijk> I'm using mongo for media indexing

[17:51:10] <StephenLynx> yeah, you might be better using maria for the data you will join.

[17:51:19] <RWOverdijk> So mongo's being used anyway, but I reallly would love it if the geospatial stuff would work :p

[17:52:00] <RWOverdijk> Oh so, I could reverse this possibly

[17:52:14] <RWOverdijk> I could first fetch records within a certain range, and sort those by date

[17:52:24] <RWOverdijk> But that would mess up the distance sorting

[17:54:21] <RWOverdijk> Final question... aggregates and cursors run over already filtered results, right?

[17:54:43] <StephenLynx> if you filter the results, yes.

[17:54:44] <RWOverdijk> So out of the three mil I have, it'll first filter based on the conditions I gave it, and then allow me to fiddle with that data?

[17:54:51] <RWOverdijk> cool.

[17:55:00] <RWOverdijk> So the slowness will be "okay"

[17:55:02] <StephenLynx> if you run a find on the cursor or $match on aggregate

[17:55:29] <RWOverdijk> Got it :

[17:55:31] <RWOverdijk> :)

[17:59:00] <RWOverdijk> Can I sort by time, and sort that result by distance?

[17:59:07] <RWOverdijk> So, using that cursor

[17:59:34] <RWOverdijk> So the cursor for find().sort('start')

[17:59:48] <RWOverdijk> (if there is such a thing)

[18:00:45] <StephenLynx> find will not return the cursor, but will yield as the second parameter of the callback

[18:01:11] <StephenLynx> find({query block},{projection block}, function muhCallback(error,cursor{})

[18:01:57] <StephenLynx> in node I/O operations are asynchronous by default and the function itself will not return a value.

[18:03:35] <RWOverdijk> I did know that. I just didn't know the second arg would be the cursor

[18:04:41] <StephenLynx> it is a standard since node to always use the first parameter as error.

[18:04:52] <StephenLynx> and the others as your data being passed arround.

[18:05:06] <RWOverdijk> yah, but a cursor doesn't contain data right?

[18:05:13] <StephenLynx> no.

[18:05:19] <RWOverdijk> I'm used to streams, where you have a on('data')

[18:05:36] <StephenLynx> its different because you have to request the document.

[18:05:42] <StephenLynx> it won't come by itself.

[18:05:51] <RWOverdijk> Poorly trained document

[18:06:15] <StephenLynx> when you get a cursor you have to perform operations to get the data. if you just want them all at once, there is a function toArray that is synchronous if I am not mistaken.

[18:06:20] <RWOverdijk> I shouldn't drink whiskey and ask questions at the same time, I make bad jokes. Sorry :p

[18:06:25] <StephenLynx> lol

[18:06:41] <RWOverdijk> toArray accepts a callback doesn't it?

[18:07:12] <StephenLynx> dunno, I am not familiar with it.

[18:07:19] <RWOverdijk> I like the data I'm trying to get. It's difficult

[18:07:41] <StephenLynx> I barely use find because I probably abused aggregate, since I just found out there are performance issues.

[18:07:47] <StephenLynx> :v

[18:08:02] <RWOverdijk> I want something that's soon, and close to show up first. Then, something that's soon but a bit further away to come second, and nicely balanced with other records.

[18:08:06] <StephenLynx> so if I needed a series of documents I just used aggregate so I wouldn't have to perform multiple operations.

[18:08:56] <RWOverdijk> But it's good. I can look up the actual performance loss

[18:09:02] <RWOverdijk> Figure out if it's worth it

[18:09:12] <RWOverdijk> Because aggregates also work on already sorted data

[18:09:31] <RWOverdijk> So if I use near, I can limit, and sort the matching records

[18:09:37] <RWOverdijk> Which I obviously limit to "x"

[18:10:01] <RWOverdijk> And my logic will only apply to a subset of the 3 mil records

[18:11:06] <StephenLynx> yeah, but again, most stuff can be done with cursors.

[18:11:24] <StephenLynx> keep that in mind if you use aggregate and get a bottleneck

[18:19:12] <RWOverdijk> thanks :)

[18:19:21] <RWOverdijk> I have to go now. Thank you for your help!

[18:23:00] <StephenLynx> np

[20:46:30] <teen> hi guys i am a total noob here... i have a Channel model that has many Broadcasts - I'm trying to query the most recent Broadcast for each Channel. The problem is the Channel table doesn't have a ref to the Broadcast- but the Broadcast does have a ref to the Channel. Can I still query the Channel docuemnt and populate its Broadcasts? pls halp

[20:47:15] <StephenLynx> hm

[20:47:46] <teen> this would be really simple in sql : (

[20:48:08] <GothAlice> teen: It's simple in MongoDB, too, and pretending you couldn't use joins in SQL, the solution is the same, too.

[20:48:33] <StephenLynx> what if you use aggregation, first you order by time of broadcast then you group using the _id as the channel and set the value of the broadcast as the one you get?

[20:48:47] <StephenLynx> so you will get on the broadcast the last broadcast of each channel

[20:48:59] <GothAlice> Indeed, that would be the optimum approach, StephenLynx.

[20:49:32] <teen> yea but don't i need a for loop then for each channel ?

[20:49:36] <StephenLynx> no

[20:49:42] <StephenLynx> aggregation runs all documents.

[20:50:04] <StephenLynx> and since the broadcast has a reference to the channel

[20:50:06] <StephenLynx> you are fine.

[20:50:40] <teen> i want the result to look like [{name: 'channel A', last_broadcast: {name: 'broadcast x'}}, {name: 'channel B', last_broadcast: {name: 'broadcast Y'}}]

[20:50:50] <teen> ofc with other properties

[20:51:01] <StephenLynx> yeah in the group stage you write something like this:

[20:51:04] <GothAlice> Getting the channel details is separate from getting the "latest broadcast" for each.

[20:51:16] <GothAlice> (Due to the lack of joins.)

[20:51:51] <StephenLynx> {$group:{_id:'$channel',last_broadcast:{name:'$fieldOfName',x:'$fieldOfY'}}}

[20:52:07] <StephenLynx> and yeah, you can't get values of another collection.

[20:52:10] <GothAlice> StephenLynx: You forgot to use $first/$last, depending on sort order.

[20:52:22] <StephenLynx> ah, yeah, that is what I was going to look for.

[20:52:34] <GothAlice> (I'd use $first and a {_id: -1} sort.)

[20:52:57] <StephenLynx> won't he sort by time of the broadcast?

[20:53:14] <teen> i am doing that but i am only querying for a subset of channels and need their properties as well

[20:53:18] <GothAlice> _id, with ObjectIds, represents the time, within certain levels of accuracy. I use it as a timestamp in many of my collections.

[20:53:42] <StephenLynx> so you can first use a match operator to filter the channels you want.

[20:54:00] <GothAlice> teen: "get properties X of subset Y of channels" is one query, "get the latest broadcast for subset Y of channels" is another.

[20:54:03] <StephenLynx> what if multiple broadcasts have the same time?

[20:54:33] <GothAlice> They kinda can't.

[20:54:34] <StephenLynx> then he won't be able to use the time on the _id.

[20:54:39] <GothAlice> You can't have two truly simultaneous inserts.

[20:54:52] <teen> how do i merge those two results back together tho

[20:54:57] <GothAlice> (If you try, one of the two will "win" and be first.)

[20:55:03] <StephenLynx> in application code.

[20:55:05] <GothAlice> teen: In your application.

[20:55:05] <GothAlice> :D

[20:55:13] <StephenLynx> ah, I see, you are using the _id insertion time.

[20:55:18] <StephenLynx> and not data from the application.

[20:55:31] <GothAlice> Anywhere where I'd need an SQL-style TIMESTAMP field, yes.

[20:55:41] <teen> this is rlly brutal

[20:55:47] <StephenLynx> kek

[20:55:52] <StephenLynx> you ain't seen nothing yet.

[20:55:54] <teen> would have been a 5 min process in sql : \

[20:56:10] <StephenLynx> yeah, but mongo is not designed for relational operations.

[20:56:11] <GothAlice> teen: It's a 5 minute process regardless of which DB you use, given experience in that DB.

[20:56:30] <GothAlice> Just because something is unfamiliar doesn't mean it's worse than another approach… just different.

[20:56:34] <teen> i see...

[20:57:47] <dede> t

[21:04:56] <GothAlice> teen: https://gist.github.com/amcgregor/75b352050eb859eb3641

[21:05:04] <GothAlice> Well then, he left.

[22:52:49] <unholycrab> trying to decide if i need to provision my mongodb instance with enough memory to store ALL indexes in memory ALL at the same time

[22:52:56] <unholycrab> or if i should provision something close to that

[22:53:15] <cheeser> it's best if you can keep all the indexes in memory

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 13th of May, 2015