[00:08:34] <Synt4x`> trying to use new ISODate in PyMongo and it's not working: games = db.games.find({"date" : { '$gt' : new ISODate("2007-08-01T00:00:00Z")}})
[00:08:45] <Synt4x`> how do I deal w/ISODate's outside of the mongodb shell?
[00:13:38] <GothAlice> Synt4x`: Note that timezone support in Python is pretty wonky. I recommend using pytz or another (properly) timezone-aware support lib, if you need to work outside UTC.
[00:14:05] <Synt4x`> ok have never heard of pytz, will definitely check it out. ty
[00:18:57] <daidoji> quick question, I noticed that if I just read from Mongo I can get read speeds up to the level that my hard drive that I store the mongo files on supports
[00:19:04] <daidoji> but then if I write to the DB even once
[00:21:39] <GothAlice> That's kinda nuts; I never see drops like that. After the drop, could you run "db.currentOp(true)" in a shell and pastebin the result?
[00:21:42] <daidoji> I can distribute the task, but I was wondering if there weren't some faster way to tell mongo (I'm just reading for this set of queries)
[00:40:40] <GothAlice> daidoji: https://github.com/bravecollective/forums/blob/develop/brave/forums/component/thread/model.py#L58 — might want to read through this at some point to introduce yourself to some of MongoEngine's more useful internals. :)
[00:41:55] <GothAlice> (get_comment notably bails out of MongoEngine to do some raw querying, then wraps everything back in MongoEngine when returning the result. update_comment mimmicks MongoEngine's .update() syntax to update a sub-document in a list.)
[00:43:43] <GothAlice> daidoji: What was the result of the explain, though?
[00:45:34] <daidoji> updated with the explain plan
[00:47:27] <daidoji> GothAlice: I'll keep the update_comment stuff in mind
[00:48:23] <GothAlice> That explain is interesting.
[00:49:32] <GothAlice> First, that query is slower than every query in my codebase at work by about 2-4x. (Our statistics aggregation queries run about 60 millis to process four weeks of data.)
[00:51:16] <daidoji> GothAlice: I mean this is just my local dev machine. The mongo files are located on an external usb3.0 hard drive
[00:51:26] <GothAlice> But, of the ways MongoDB could have processed that query ("AllPlans" of 466 scanned objects vs. 159!) it could have been much worse.
[00:51:51] <daidoji> GothAlice: but its literally 120M/s when I first start mongo and then 8M/s reads once I start writing and then 8M/s after that
[00:52:07] <GothAlice> Please get that data off of non-local busses if you're worrying about performance issues and measurement?
[00:52:43] <GothAlice> For all I know that could be the USB controller's way of handling eager caching, or the operating system's.
[00:53:30] <GothAlice> I.e. you can massively eagerly read ahead after opening a file as long as you only ever read from it, but as soon as you write you can't do that optimization any more, since data can change any time. (A naive approach, but I've seen USB do worse.)
[00:53:32] <daidoji> GothAlice: well I don't care too much. This issue's more an annoyance for me
[00:54:31] <GothAlice> Internal busses tend to do less insane "optimization" of resources, giving the operating system much better control over what's going on.
[00:54:39] <annakarin> hi again. another question: how do I get the _id for c in [ _id: "", a: "", b: [ _id: "", c: [ _id: "", d. ""] ] ? I´ve used .find() to get the document, but can´t get the _id in the last array
[00:55:11] <GothAlice> annakarin: You just broke my brain with that nesting. DX
[01:15:35] <joannac> GothAlice: but they can have more than 1
[01:16:05] <GothAlice> ^ That's probably how I'd model that. Assuming currency short-forms are unique; if not, swap RES for "544845087ee31f000069a949".
[01:16:27] <joannac> GothAlice: you would use a value as a key? :o
[01:17:44] <joannac> annakarin: I asked you 3 questions, and you gave me a single "yes". Is the answer "yes" to all 3?
[01:18:08] <GothAlice> joannac: I have magic fingers when it comes to $exists usage. ;^)
[01:18:48] <GothAlice> joannac: It's still insane, of course.
[01:18:59] <GothAlice> (I never do that, that deeply nested.)
[01:19:27] <annakarin> joannac: each user-file can contain data about multiple other accounts, each of those accounts currencies-file can contain multiple currencies, and each of those currencies can have multiple taxRates
[01:19:47] <joannac> annakarin: then your query can't be done
[01:20:05] <joannac> $eleMatch only goes one array deep
[01:20:25] <GothAlice> ^ annakarin: You're really going to have to split that data out a bit.
[01:21:05] <annakarin> GothAlice: do you have any suggestions ? I´m new to MongoDB and databases as a whole
[01:21:43] <joannac> you could aggregate it out, but I would recommend a different schema
[01:22:08] <GothAlice> annakarin: You seem to have gone to the extreme, of trying to put literally everything in a single document in a single collection. That way lies madness. (My forums store comments inside threads, sure, but "forums" are separate, as are "categories" for those forums.)
[01:23:47] <annakarin> GothAlice: there won´t be that much data in each of the arrays, each user might have a couple of 20-100 accounts on DIVIDEND_PATHWAYS, but each of those accounts won´t have that many currencies, maybe 1-3, and only 1-3 taxRates
[01:24:12] <GothAlice> Are the taxRates the same for each instance of the same currency? I.e. does RES always have a rate of 0.02?
[01:24:29] <annakarin> but if it´s un-quereably then I probably have to change it
[01:24:46] <annakarin> GothAlice: it can have multiple
[01:25:02] <GothAlice> Yes, but is it always the same multiple for a given currency?
[01:30:10] <stefandxm> ive been on a wild goose chase for 24+hours
[01:30:21] <joannac> annakarin: before taking GothAlice's advice (which is good), I would think about what kind of query patterns you want. You want to structure your data so it's easy to make the queries you want.
[01:31:35] <stefandxm> GothAlice: but luckily iam escaping now., just convinced a friend of mine to go with me to stockholm for the weekend. ie ill have him solve it in the car ;-)
[01:32:00] <GothAlice> … and then forget it after the parties, eh? ;)
[01:32:04] <stefandxm> but really iam just kidding myself. still stuck ;(
[01:32:55] <stefandxm> so; no warnings.. still UB all over it
[01:33:00] <annakarin> joannac: I probably have to think alot about it, it´s the first time I try to create a database. but, is it really impossible to find the _id of the taxRates-file ? if I could make it work with my current solution, I could test some other stuff before redesigning
[01:33:40] <GothAlice> stefandxm: GCC or LLVM? ('Cause LLVM static analysis is awesome sauce and can detect potential logic errors that would only be symptomatic at runtime. Including threading issues.)
[01:34:43] <GothAlice> I might try the static analyzer. It can work wonders, but if you do low-level ASM it can be a bit touchy. (I had to give up on LLVM for my kernel project, sadly.)
[01:34:43] <stefandxm> maybe even clang can give me an (obvious) warning
[01:35:06] <joannac> annakarin: find, yes. return just that _id? no
[01:37:57] <GothAlice> Not with an appropriate editor. IRC is wearing out my fingers. XD
[01:46:56] <annakarin> joannac & GothAlice will re-structure the whole database. I´ve learnt a lot from trying to build my first attempt. Thnx for all the help !
[02:01:19] <darkblue_b> " MongoDB is provided completely new library for driver called MongoDB's Meta Driver. So I have added support of that driver. Now compile time option is available to use legacy and Meta driver."
[02:04:58] <annakarin> jaraco: I though of something: can I name a collection dynamically ? like, store all user data of account_id in it´s own collection ?
[02:05:28] <GothAlice> annakarin: That is an approach some take, yes.
[02:05:47] <GothAlice> There are limits on the number of collections in a given database, however.
[02:05:54] <GothAlice> (But you can adjust this, I believe.)
[02:07:34] <GothAlice> (_id is always indexed; working out the math for that is left as an exersize for the reader.)
[02:07:47] <GothAlice> "Number of Namespaces: A 16 megabyte namespace file can support approximately 24,000 namespaces. Each collection and index is a namespace."
[02:08:35] <annakarin> GothAlice: you don´t happen to know how to name a collection after an already defined variable ? if I want to name it after var TAX_BLOB[2].account_id
[02:08:56] <GothAlice> stefandxm: A collection is a namespace. An index is also a namespace. Every collection has at least one index, on _id.
[02:09:03] <stefandxm> GothAlice: just doesnt make much sense =)
[02:09:28] <stefandxm> i mean, putting size on a btree
[02:09:37] <GothAlice> stefandxm: That segment of the documentation is actually quite explicit, and gives you every single bit of information you need to know about that limit, how to adjust it, and what it effects. ;)
[02:09:54] <stefandxm> its very implicit imo, and its most hopefully wrong.
[02:10:31] <joannac> stefandxm: how would you make it more explicit?
[02:11:08] <joannac> I can assure you that section is 100% correct the way GothAlice explained it
[02:11:24] <GothAlice> stefandxm: 16 MB is the default size, adjusted with the nsSize option, they can't be bigger than 2047 MB, a 16 MB namespace file holds ~24K namespaces by virtue of each namespace taking 628 bytes. Each sentence of that doc says one of these things. ;)
[02:11:41] <joannac> stefandxm: why? it's not exactly 24000 namespaces
[02:11:52] <GothAlice> It literally couldn't be more explicit. XD
[02:12:31] <stefandxm> i dont see the what the _id was all about?
[02:12:40] <stefandxm> the namespace is an only index tree no?
[02:12:41] <joannac> every collection has an index on _id
[02:13:05] <GothAlice> stefandxm: As each collection takes a namespace, and each index takes a namesapce, and each collection always has at least one index, the number of collections max is one half the number of namespaces if no other indexes are defined.
[02:13:08] <joannac> so for every collection you create, you use up one namespace for the collection, and one namespace for the index
[02:13:15] <joannac> dammit GothAlice you type faster than me
[02:13:29] <GothAlice> 3,428,810 namespaces is the absolute maximum at the maximum size, thus 1,714,405 collections maximum.
[02:47:18] <GothAlice> So your collection name could be "user.objectid" (where objectid is an actual ObjectId. E.g. from your "members"/"accounts" overview collection.)
[02:47:37] <GothAlice> (Well, the string representation of the ObjectId…)
[02:50:22] <GothAlice> annakarin: Seriously, though. Meditate hard on this design.
[02:51:04] <joannac> for example, if you ever need to find more than one person's accounts, then this design is not so efficient
[02:51:30] <GothAlice> The hardest part of MongoDB is thinking like a… mongol? Mongoista? Huh. ;P
[02:54:11] <GothAlice> annakarin: And as joannac mentioned earlier, it really matters how you plan on *using* the data, not on what you think the "best design" for it is. Pre-aggregating statistics opened my mind quite a bit on this subject.
[03:02:35] <stefandxm> GothAlice: thats so not mongodb specific :)
[03:08:38] <annakarin> GothAlice: I´m prototyping right now and testing an app-idea, will think alot about it as my app evolves.
[03:08:59] <annakarin> GothAlice: but returning to the first question, how do I get the _id of a sub-array ? http://pastebin.com/fynLkLGp
[03:10:16] <GothAlice> You don't, you get the array, a slice of the array, or a specific element from the array that $elemMatch-es a query.
[03:10:39] <joannac> scratch that, what GothAlice said
[03:10:43] <GothAlice> joannac: I went for a conservative definition of "get".
[03:12:48] <stefandxm> GothAlice: are you hired to sit here? are you > 1 persona?
[03:13:09] <GothAlice> annakarin: Every query you ever run through find() will return top-level documents from a collection. You can select a subset of fields to return and use $slice to get a specific element or range of elements by index, $elemMatch to filter down to one specific element by other criteria. (Map/reduce and aggregation do things differently by default.)
[03:13:28] <stefandxm> annakarin: spejjar du med automatvapen?
[03:13:36] <darkblue_b> .. sees BSON def for the first time
[03:13:45] <joannac> stefandxm: no one is hired to sit here. this channel is completely volunteers
[03:13:50] <GothAlice> darkblue_b: It's neat, if you can read BNF. ;)
[03:14:09] <GothAlice> stefandxm: And no, I'm just an AI. Nothing to see here, move along.
[03:14:12] <stefandxm> joannac: ok. so noone here is hired by mongodb :-)
[03:24:19] <stefandxm> but then again. you are AI so who cares ;)
[03:25:12] <stefandxm> i've already stated that the C++ Doc / tutorial is a joke
[03:27:05] <GothAlice> (Which is what I told you the last time I gave you that link. ;) Looking over it a bit, I see that you need to initialize the driver with an Options instance—most of the meat is in the Options class doc for that—then construct a ScopedDbConnection, cleaning up with a call to ::done, then ::kill when finished with it.
[03:27:11] <GothAlice> Less of a joke considering I just learned C++ using it.
[03:27:55] <darkblue_b> .. "declare this is a config db of a cluster; default port is 27019"
[03:28:04] <darkblue_b> .. "declare this is a shard db of a cluster; default port is 27018"
[03:28:15] <darkblue_b> so.. what might port 27017 be ?
[03:28:30] <Boomtime> standalone mongodb, or mongos
[03:31:24] <GothAlice> darkblue_b: But it's so fun, and I have tasty treats! ;^P
[03:31:46] <stefandxm> GothAlice: now lets introduce RAII shall we? :)
[03:35:58] <darkblue_b> I will stick with a single node now.. first time through...
[03:36:43] <GothAlice> darkblue_b: Just debugging the changes to my automation script and have the gist open on a blank entry, ready to go. Should take a few minutes, and will demonstrate how one could set everything up. :)
[03:38:07] <stefandxm> GothAlice: and while we are at it.. lets ask why we have an endless Or() ;)
[03:45:21] <darkblue_b> why would /etc/mongod.conf say dbpath=/var/lib/mongodb .. but starting with mongod says ERROR: dbpath (/data/db) does not exist.
[03:45:38] <GothAlice> darkblue_b: Did you tell mongod to use that configfile?
[03:51:04] <darkblue_b> next is .. /var/log/mongodb/mongod.log is owned by a user 'mongodb' .. but I started mongod as me.. best practice for a single node?
[03:51:48] <GothAlice> darkblue_b: Run mongos as a system service using your distro's native startup system. I.e. init.d, rc.d, etc.
[03:52:06] <GothAlice> It'll run as its own user (a security precaution) and start automatically.
[03:58:09] <darkblue_b> mongo-c-driver seems to be working...
[03:58:38] <GothAlice> Ugh; without some of the goodies I get to use at work, this automation is substantially harder. (I.e. conveniently waiting for a port to open or close, tailing a file until a pattern is matched, etc.)
[03:59:28] <stefandxm> GothAlice: maybe you should ask one of them bl0kes not wanting to register their own nickname?
[04:00:07] <darkblue_b> GothAlice: how can I help?
[04:29:00] <GothAlice> ^ Spawns an authenticated sharded replica set cluster, locally. Remotely follows the same principles.
[04:29:13] <GothAlice> Also differentiates between first and subsequent (fast) startup.
[04:31:54] <GothAlice> Something wonky with the replica set auth at the moment; just noticed. "note: no users configured in admin.system.users, allowing localhost access" but it doesn't. lul
[04:35:03] <GothAlice> Also, awesome BASH trick: ": ${1:=27017}" — default values for positional arguments to functions.
[04:46:31] <darkblue_b> oh you are *so* confused.. magic has to do with spirits and living things.. not the inside of computers ;-)
[04:47:59] <GothAlice> Magic involving somatic components is the only thing keeping my personal machines together. ;) ("Half-level fireball: a hand grenade. Half-level haste spell: a good pair of shoes and the phrase, 'Feet don't fail me now!'")
[04:48:16] <GothAlice> (somatic components = blood, sweat, and tears)
[04:48:58] <darkblue_b> is this script hardwired to a certain number of nodes on the localhost ?
[04:49:08] <darkblue_b> both, certain number, and, localhost ?
[04:49:19] <GothAlice> Yes, but adjust all instances of "1 2 3" with any number of elements you wish in the loops.
[04:49:36] <GothAlice> You could also wrap the rs0/rs1 stuff in a for loop, too.
[04:49:55] <GothAlice> darkblue_b: Also the construct() code.
[04:50:10] <darkblue_b> I m genuinely eager to try this.. its been a long day so so I will simply note what you have said and then return to it fresh
[04:50:30] <GothAlice> My exocortex automation (which this is based on… roughly at this point) executes these commands over parallel SSH.
[04:51:09] <GothAlice> This version is more based on the test suite preface (thus the localhost everything).
[04:51:52] <darkblue_b> suites me very well, as a start !
[04:52:18] <GothAlice> (And most of the informational "echo" commands are actually ebegin/eend macros stolen from Gentoo init scripts with proper error trapping, etc., etc. ;)
[04:53:13] <darkblue_b> lots of sfwr setups have logging trix
[04:53:52] <darkblue_b> one variation of that is to make the logs go to a write-only, secure pipe, that lands in the "security area"
[04:53:53] <GothAlice> darkblue_b: That's how I get my machines to start up to all services running in < 2s for most hosts in the cluster. And distcc distributed compiling of packages (with binary package distribution to the nodes) is gob-smacking. (Linux kernel compiles in 54 seconds with -j64; it takes Ubuntu longer to download and extract the binary ;)
[04:54:19] <GothAlice> That being Gentoo, not secure logging. ;)
[04:58:11] <darkblue_b> I went with a package mongo 2.6 btw, and will go fwd with that
[04:58:16] <GothAlice> darkblue_b: Heh; I couldn't imagine adding "retrying" to my package distribution automation. Something explodes, it e-mails me and I manually un-fsck. Anything else is too risky for me. ;)
[04:58:56] <GothAlice> darkblue_b: Some of my optimizations basically require compilation. I have specific PIC/APIC needs relating to pre-calculation of dynamic linking.
[04:59:09] <darkblue_b> the devs and the packagers seem to be two different crowds, in my world
[05:00:00] <GothAlice> I, sadly, basically have to do everything from the hardware up. XD
[05:00:16] <darkblue_b> anyway, this is super splendid.. I am going to quit while I am ahead right now.. but more to come.. very much appreciate todays chats
[05:00:30] <GothAlice> No worries. I hope that script is educational! :)
[05:38:57] <GothAlice> Substantially updated the script; numbers of replicas / shards is now somewhat more easily adjustable. (Still some hardcoded strings, but I'll tackle those some other time.) Stopped it from emitting the password to the log, also now using proper math to calculate port numbers, also verified correct auth behaviour.
[05:44:32] <Streemo> is it a bad idea to have a document in Shard A that references another document that is in Shard B? In general, is it OK to refer to documents that are in different shards?
[05:45:17] <Streemo> by "a document" i mean several, or most documents may inevitably have to refer to other docs that are inevitably in different shards
[05:45:20] <GothAlice> It'll have performance implications if you have a driver that eagerly attempts to load references…
[05:47:56] <Streemo> so your interface to mongodb in <your fav lang>
[05:50:48] <GothAlice> For further reading: http://grokbase.com/t/gg/mongodb-user/122fty6hk7/how-does-mongodb-dbref-work and http://docs.mongodb.org/manual/reference/database-references/#driver-support-for-dbrefs (note the "dereference" references.)
[05:51:42] <GothAlice> http://docs.mongodb.org/manual/reference/database-references/#database-references (and point two from this)
[05:52:20] <GothAlice> (MongoEngine on top of pymongo does auto-deref top-level DBRefs by default.)
[05:55:39] <Streemo> so if i understand correctly, the reference is an id of another doc in some field of my original doc, and calling dereference on that particular reference will return to me the document that the reference refers to?
[05:56:33] <GothAlice> Streemo: From that last link there are two approaches: if the field you are storing the foreign ID in always references the same collection you don't need DBRefs and have to dereference manually. (I.e. findOne() on that ID.)
[05:57:11] <GothAlice> The second approach is to use a DBRef, which is more versatile. It's roughly equivalent to storing {_id: …, collection: …}, and can be dereferenced automatically (or with a dereference() helper.)
[05:58:12] <GothAlice> I tend to avoid DBRefs. They add extra space, and my documents are uniform enough that I can guarantee "owner" (as an example) always references db.Users.
[05:59:50] <Streemo> yeah it seems like they arent needed if i can ensure that a doc always belongs to collection A
[06:00:08] <GothAlice> (At the expense of slightly more work to use the reference.)
[06:00:45] <Streemo> but calling findone isnt too bad
[06:01:25] <GothAlice> (At work we do fancier things; we basically rolled a caching DBRef that also stores arbitrary fields from the foreign document with the reference, for querying purposes. That has the expense of needing to update() the references if a cached field changes.)
[06:02:07] <GothAlice> That last can save needing to do any findOne()… if the value is cached. ;)
[06:02:38] <Streemo> which is desirable, because find one has to traverse the list
[06:02:58] <GothAlice> Well, _id b-tree index, but yeah.
[06:04:19] <Streemo> if i need to reference docs in different collections i could just group them collection-wise in different fields of the original doc, but id still have to use findone
[06:05:29] <GothAlice> … yes, I think. MongoDB doesn't JOIN, so any reference will have to be queried for client-side, adding extra roundtrips.
[06:05:58] <Streemo> unless i send the client the data set and have them query it in the browser
[06:05:58] <GothAlice> (Thus the foreign field caching craziness.)
[06:06:22] <GothAlice> That'd be one approach, I guess. Would you really *want* to do that, though?
[06:13:06] <Streemo> bit too much recursion in that video
[06:13:44] <GothAlice> It's one of the best (and fastest) overviews I've ever seen. I <3 vihart. (Her logarithms video I basically beat my friends over the head with.)
[06:15:49] <Streemo> funny how most anaylsis courses start with epsilon and delta. if i had to teach, i'd say: Students, watch this video, class dismissed for hte first day of class
[06:15:51] <GothAlice> https://www.youtube.com/watch?v=lA6hE7NFIK0 < this one is all about how some infinities are bigger than others, which was more specifically the one I was looking for.
[06:16:51] <Streemo> ah i loved the end of her video
[06:23:03] <GothAlice> How many high school formulae do you recall that involve 2π? Basically all of them, and the one that used π straight can be derived differently to avoid using 1/2 tau.
[06:23:14] <GothAlice> Streemo: vihart gets excited about things. ^_^
[06:23:26] <Streemo> its so funny though because she read my mind
[06:24:42] <GothAlice> Streemo: The pi vs. tau thing also has an effect on teaching maths. How far around a circle are you travelling (in percentage) when I say 0.5π radians?
[06:26:22] <Streemo> but this is done all the time when teaching radians versus degrees, no?
[06:26:29] <GothAlice> Basically, all of the formulae were derived on a shaky premise, that the radius was most important. And almost all formulae have glue to correct that mistake.
[06:27:16] <GothAlice> Streemo: You could teach tau radians a few years earlier due to the simplification. ;^P
[06:27:30] <Streemo> but i sure like using the radius when doing integrals in non cartesian coordinates , i wouldnt ever use the diameter there
[06:47:46] <GothAlice> Also yeah, I flipped the radius/diameter thing in all of that 'cause sleep. Is needed. So I'll go do that now. Have a great night everybody!
[09:44:29] <remonvv> Hi all. Did anyone tackle the huge performance drops every time mongod flushes to disk? We're getting stuff like this and during that the performance is reduced by 80% or so : [DataFileSync] flushing mmaps took 37586ms for 46 files
[09:44:56] <remonvv> As in, are there settings we can fiddle with or will TokuMX help with making this more constant?
[11:05:20] <calmbird> Hi :) Do you know any good way, to lock mongodb document from multiple reading. I have balancer and few node.js servers. And I don't want second process to read document, untull first wont finish operation on it.
[11:19:14] <calmbird> Should I use data server between mongo and my app servers, etc? hmm
[12:26:42] <tscanausa> What is the fastest waty to delete data.
[12:27:53] <Depado> Hit your hard drive with a hammer ?
[12:28:28] <tscanausa> thanks, I will add them to me list of options
[12:55:20] <annakarin> hi, I´m still trying to learn MongoDB. my documents contain arrays within arrays, and I´m trying to figure out how to make them searchable. has anyone ever used an _id-naming system like this: http://pastebin.com/A0hKdvub ?
[13:04:51] <docdoak> any idea when i use mongoimport on a tsv file, the first non header row's first item comes in as "\"Washington, D.C.\"" (including the quotations) instead of just "Washington, D.C."? Is this a utf thing?
[13:12:06] <calmbird> Do you know any good way, to lock mongodb document from multiple reading. I have balancer and few node.js servers. And I don't want second process to read document, untull first wont finish operation on it.
[14:23:03] <annakarin> hi I asked this yesterday too: if I want to .find() a document that contains 4 elements, a: "dog", b:"cat", c:"horse", and d:"wolf", how do I write that query ? flock.find({ $elemMatch: { a: "dog", b:"cat", c:"horse", d:"wolf"}}) ?
[14:23:28] <annakarin> have tried to get it to work for like hours
[14:24:24] <annakarin> now, it returns all coduments with a: "dog", and all documents with b:"cat", but I want the document that has all those 4 animals in it
[14:29:04] <doug1> Is there a way to always allow admin to connect from localhost without auth?
[14:36:40] <niczak> Quick question, if I have a 2.4 Mongo instance on one server and I want to start repilcating to a 2.6 instance on another server, is that going to cause major issues? Is it even possible?
[14:37:09] <tscanausa> It should not be an issue. but I would update to 2.6 as quickly as possible
[14:37:31] <niczak> Yeah, we are actually migrating away from the server running 2.4 so we just want to replicate over to the 2.6 and then shut down the server running the 2.4 instance.
[15:08:41] <Nomikos> I'm getting the "not master and slaveOk=false" error on what I /thought/ was the master, I have not changed any configuration or restarted things.. what could cause this?
[15:09:17] <Nomikos> there are 2 other mongodb servers running, and the webapp seems to be doing ok (so far), but I have no idea where to look for what went wrong
[15:12:42] <johanwhy> beginners-question: if my database has a lot of similar documents, and I want to find the document that matches 4 elements exactly, can I use .find({ a: "", b: "", c: ""}), can I use .find({$elemMatch: {a:"", b:"", c:""}}) ? have spent 3 hours on this
[15:13:30] <johanwhy> .find({ a: "", b: "", c: ""}) works in robomongo, but not in mongoose
[15:36:49] <Nomikos> what would make a mongodb master turn into "not master" ?
[15:41:26] <cheeser> ping timeouts with secondaries making the primary think the network has split
[15:42:29] <Nomikos> cheeser: ping seems to work fine atm, when I restart mongdb it remains secondary, does it rewrite its config file when network issues happen?
[15:42:53] <cheeser> there's nothing about primary in the config file
[15:43:05] <GothAlice> Nomikos: If you have only two shards, no election is winnable.
[15:43:14] <GothAlice> Nomikos: You should add an arbitrator, I believe.
[15:43:42] <cheeser> s/shards/replica set members/
[15:44:24] <GothAlice> Ah, sorry, read "2 other" as "2". XD
[15:47:29] <VeeWee> Is there some cli tool I could already use to run this functionality: https://jira.mongodb.org/browse/TOOLS-121
[15:49:45] <Nomikos> cheeser: this is from the second mongodb server: http://pastebin.com/raw.php?i=HUu2w6j4
[15:50:52] <Nomikos> so does that mean there was a network issue and it decided to elect itself king?
[15:53:18] <GothAlice> Nomikos: In the event that the connection to the primary is lost the remaining nodes will re-negotiate a new primary. When the connection to the original primary is restored it'll call for an election, since what *it* saw was losing its connection to its secondaries; the nodes will then talk amongst themselves to identify who has the latest data, a winner is picked, and things resume.
[15:54:13] <GothAlice> Nomikos: This infrequently happens to my own data (maybe once a month) during network maintenance, usually. Sometimes my rack provider just likes power cycling things. ;)
[15:54:33] <Nomikos> GothAlice: okay. it's.. just that it happened right after I did some manual inserting/duplicating, and the last time they went down that was because of an invalid mongo ID value being replicated
[15:54:39] <GothAlice> (So this is a situation you should expect and handle.)
[15:54:59] <GothAlice> Invalid ObjectIds would be a problem, yeah.
[15:55:02] <Nomikos> I couldn't find anything pointing to that this time
[15:55:05] <cheeser> i don't think an election is called when the split off nodes reconnect
[15:56:02] <GothAlice> cheeser: I've had cases where the isolated primary still had fresher data than the newly elected primary after the split; I've explicitly noticed the old primary crying foul and being re-elected.
[15:56:55] <Nomikos> hrm.. there's some iffy spikes in the statistics graphs >.<
[15:59:00] <cheeser> GothAlice: hrm. i'd expect a rollback in that case.
[15:59:16] <cheeser> but perhaps it can detect that no writes happened in the interim...
[15:59:35] <GothAlice> cheeser: That was certainly the case for the dataset on that cluster. Very read-heavy.
[16:02:15] <GothAlice> cheeser: That behaviour saved me having to update my DNS. (primary.db.example.com was only not the primary for about a minute. ;)
[16:22:35] <VeeWee> Is there an easy way to upsert documents from one database to another? Need to restore some data that has been removed from a database.
[16:26:59] <Nomikos> cheeser, GothAlice: thanks for the pointers!
[16:27:59] <GothAlice> ianp: No, on my production cluster both mongostat and mongotop spin up and start producing output in a few seconds.
[16:28:27] <ianp> Yeah, that instance is under super heavy load. I think that's why. thanks
[16:29:02] <annakarin> I solved it with mongoose query.and, http://mongoosejs.com/docs/api.html#query_Query-and
[16:34:30] <chasep_work> can you have two arbiters in a replica set?
[16:40:38] <chasep_work> I don't know... I might not.... we're doing some migrations and maintenance, and I thought I had worked out a scheme where it made sense, but, now it doesn't
[16:40:46] <chasep_work> so I'm not sure what I was thinking
[16:43:27] <chasep_work> basically, we have a VM cluster... it contains three servers which each host a member of the replica set.... as part of the maintenance, we're migrating some of the VMs (basically, half of any HA pairs) to a VM host outside of the cluster.... mongo is the only instance with three members, but, I think the best way to handle it is treat it like a geographically redundant replica set
[16:43:57] <chasep_work> even though, technically, they are on the same network and about a foot away from each other
[16:44:43] <GothAlice> chasep_work: Yup. That's actually how I handle live offsite streaming backups. (It's a secondary, off-site.) Would also work for migration—spin up new, let it settle, stop the old.
[16:46:27] <chasep_work> well, the cluster is hosted on a fancy VRTX, nice, all in one chasis. We need to do some firmware upgrades, so the entire system is coming down... hence why we are "migrating" stuff to another VM hosted outside the chasis and cluster.... technically, I can move things over to the non clustered host, and just let it run there, and then move things back next week
[16:46:55] <chasep_work> but, I'd rather it just work like an HA system......
[16:47:41] <chasep_work> after everything is stable again, and we're confident we aren't going to need to do anything that requires bring the entire chasis offline, then I'll move what's needed back over to the cluster
[16:49:21] <chasep_work> in fact, I'll probably clone one of them to be the priority 0 member on the non clustered host, then turn one of the members in the cluster into an arbiter.
[16:59:51] <chasep_work> Question... if I have 2 data and 1 arbiter at site A, and 2 data at site B..... and site A goes down.... how does site B elect a primary?
[17:02:41] <chasep_work> ok, nevermind... I misunderstood how the priority 0 members worked..... what I'm looking to do is, basically, have something that acts as a "hot standby"... only if all other members in the set are down, will it become the primary
[17:04:28] <GothAlice> chasep_work: AFIK, set the priority of the main cluster to be, say, 10, and have the priority of the hot standby as 5. As long as priority 10 members still live, the hot standby will remain a secondary.
[17:05:51] <GothAlice> Also waitaminute; you're running your primary "cluster" on a single physical chassis? I hope I heard that wrong… T_T
[17:06:57] <chasep_work> single chasis, multiple blades, giving two physical hosts and all that fun stuff... first part in getting an offsite DR setup in place.
[17:09:26] <GothAlice> chasep_work: Good, at least that situation will improve a bit. ;) Single points of failure are scary.
[17:10:17] <chasep_work> okay, forget the geographic redundancy, and hot standby and all that. Best idea, keep the three data members on the clustered hosts. Spin up two new instances on the non clustered host - then, when the three members on the clustered host go down, one of the two on the non clustered host will become primary
[17:10:35] <chanced> hey fellas, is it possible to take records and aggregate two fields into two records?
[17:11:02] <darkblue_b> chanced: can you restate that ?
[17:13:31] <GothAlice> Unfortunately not really. Could you pastebin an example document and an example search to auto-complete on that would match the example?
[17:14:20] <chanced> without context it wouldn't make a whole lot of sense
[17:14:28] <chanced> i'm building a real estate platform
[17:14:28] <GothAlice> Right now there is even less context. ;)
[17:14:40] <chasep_work> no, that won't work, because it needs a 3 member majority.... I'm thinking the only way to do this is just bring up a 3 member set on the non clusterd host, and shutdown the ones on the clustered host
[17:15:00] <chasep_work> I can move them over one at a time to prevent interruption
[17:15:18] <GothAlice> chanced: Unfortunately I can't help querying something I can't see.
[17:15:28] <chanced> what i want is to have a list that contains the MLS (basically the props unique id), the address, neighborhood, etc
[17:16:06] <GothAlice> chasep_work: Why not have three nodes all running on the foreign host all with a lower priority? (Or one data host and two lower-priority arbitrators, maybe?)
[17:17:02] <chasep_work> GothAlice: you need an odd number of VOTERS, correct?
[17:17:37] <GothAlice> chasep_work: I believe so, yes.
[17:18:26] <chanced> GothAlice: here's a record: https://gist.github.com/anonymous/e6700dc7422ff11a6230
[17:19:01] <GothAlice> chanced: Perfect. What fields are you trying to auto-complete on, and in what form do you want the results?
[17:19:10] <chasep_work> GothAlice: so, how can i distribute 3 or mode data members across 2 physical "locations" so that there is an odd number of voters, and still ensure that, if one "location" goes totally offline, the other location can elect a primary?
[17:19:47] <chanced> i essentially want the street address, mls, neighborhood (not present in this one because theres horrible data integrity)
[17:20:04] <chasep_work> actually, it only needs to be 2 or more data members
[17:24:32] <chanced> darkblue_b: i'm sorry to hear that :)
[17:24:34] <GothAlice> chasep_work: You could set the priority of the node you want to become "primary" (on the backup host) at, say, 5 (< the 10 of the main host, so if any of those live they take priority), and set the priority of a "secondary" process on the backup host to zero. (It'll never win. Optionally also set it hidden so it won't be queried.) Finally, have one arbiter on that host, too.
[17:24:59] <GothAlice> chasep_work: Seems that "odd number" issue has changed since the last time I looked at it. Couldn't find "odd" in the docs anywhere.
[17:25:19] <chasep_work> well, if you have an even number of voters, then you could end up with a split vote for the primary
[17:25:33] <GothAlice> chanced: Often the act of clarifying a question can provide an answer. :) No worries.
[17:26:12] <chanced> darkblue_b: i'm about to finish up a local broker's site i built. I've considered hustling it elsewhere but dealing with RETS was a nightmare
[17:26:27] <GothAlice> chasep_work: Not in this setup, you can't.
[17:26:42] <darkblue_b> surprise! yeah I used a few libs
[17:27:14] <chasep_work> priority 0 members can vote, but, never be voted for... right
[17:27:24] <chanced> and none of the libs i could find would work with it
[17:27:33] <chasep_work> so, really, it's not a matter of how many VOTERS there are, but, how many can be voted for
[17:27:34] <GothAlice> chasep_work: There would be three priority 10 replicas. As long as one of them is alive, one of them will be primary. There would additionally be a priority 5 (backup primary) replica, a priority 0 (backup secondary) replica, and an arbiter.
[17:28:13] <chasep_work> and, I really dont care about the "hot standby" so the priorties (except for the priority 0) don't really matter
[17:28:23] <darkblue_b> chanced - on the same page again... rets-version=rets/1.5
[17:28:43] <GothAlice> So worst-case you have one priority 10 primary, a priority 5 secondary, a priority 0 secondary, and an arbitrator that will feel foolish. (Only the priority 10 boxen can win.)
[17:29:42] <GothAlice> chasep_work: You'll need to ensure the hot standby can't naturally become the primary during normal operation, thus the higher priority on the main cluster.
[17:30:43] <GothAlice> This will matter for performance reasons if the backup boxen is higher-latency (i.e. offsite or in a different rack, etc.)
[17:31:04] <chasep_work> well, like I said, we're on the same network, so, the hot standby idea isn't really necessary and more trouble than it's worth...
[17:31:58] <GothAlice> chasep_work: True. :) Seems we've rolled back around to simply having a physically diverse (i.e. multi-chassis) replica set.
[17:33:34] <chasep_work> so, i can leave the three data members AS-IS on the clustered host, then spin up three (2 data and one arbiter) on the non clustered host
[17:36:36] <GothAlice> chanced: … why are you storing 3 as the string "***" on "baths"? XD Some of this data will make efficient querying difficult. There's also a nice querying shortcut I use on housing data these days: sum the total number of rooms (a kitchen being 1 if full, 0.5 if one of those apartment wall affairs, bathrooms are 0.5). Makes certain types of sorting easier, if not wholly accurate. For example, I live in a 4.5 apartment. :)
[17:36:50] <chanced> GothAlice: oh, that's not my doing
[17:37:24] <chanced> here's a source doc: https://gist.github.com/anonymous/cff86c263d761dcc2a0d
[17:37:33] <chasep_work> GothAlice: back to the odd number thing. It's an odd number of VOTERS and/or and odd number of CANDIDATES.
[17:37:41] <chanced> the ***s are fields that are hidden
[17:37:54] <chasep_work> either (or both) ensure that a primary can be elected
[17:38:30] <GothAlice> chasep_work: Again, I couldn't find a single reference to the word "odd" anywhere in the replication documentation.
[17:38:37] <chanced> GothAlice: but thank you for pointing that out. I guess they locked down another field from my view
[17:41:15] <chasep_work> GothAlice: from the section on Replica Set Arbiter "Only add an arbiter to sets with even numbers of members. If you add an arbiter to a set with an odd number of members, the set may suffer from tied elections. "
[17:41:17] <GothAlice> chasep_work: You *seem* to need an odd number of votes. It just makes sense, but during failure scenarios you can't guarantee that at all.
[17:41:47] <chasep_work> I guess, in the event of a tied election, it'll just keep voting until someone wins
[17:41:52] <chasep_work> which will eventually happen
[17:41:55] <GothAlice> chasep_work: A ha, third level deep on that page. Good to know.
[17:42:46] <GothAlice> chasep_work: Your setup has the side-effect of needing a minimum of three votes on the backup host for a "total failure of the first machine" situation.
[17:43:45] <chasep_work> yes... at least, to prevent a possible tied election... not seeing anything that says what happens on a tie
[17:43:51] <chasep_work> if it jut goes "oh well, we give up"
[17:43:56] <chasep_work> or keeps voting until there is a winner
[17:44:16] <GothAlice> So, a simple solution would be to have three nodes and an arbiter on the primary chassis, and two nodes and one arbiter on the backup.
[17:44:34] <GothAlice> Fully operational has an odd number of votes, fully failed on the primary chassis also has an odd number.
[17:45:31] <chasep_work> don't need the arbiter on the primary..... 3 data primary; 2 data, 1 arbiter secondary..... all up, there are 5 candidates, primary fails, there are 3 voters... secondary fails, there are 3 candidates
[17:46:17] <chasep_work> arbiter on the primary doesn't HURT, but, doesn't help either
[17:47:41] <chasep_work> so, there is NO way to prevent tied votes
[17:48:03] <chasep_work> three scenarios - primary and secondary are up, primary is up, secondary is up
[17:48:13] <chasep_work> you can only prevent possible ties in 2 of the 3
[17:48:28] <GothAlice> You architect around it so that your "100% OK" setup has an odd number of votes, and your "degraded" setup has an odd number votes, too. Half-way through a failure you can't really control how many hosts are reachable, and thus how many votes there are.
[17:49:52] <chasep_work> our secondary is temporary, to provide support while the primary is down.... so, a scenario where the primary is up and the secondary is down won't happen (until we make it happen, at which point we can adjust members on the primary)
[17:51:27] <GothAlice> Another thing I can't find in the docs is an exact description of what happens to votes during a failure; is there any voting at all if there is a clearly more "fresh" secondary to elect?
[17:52:09] <chasep_work> I can't find anything that says what happens if there is a tie - do they just keep voting (my assumption) or do they give up and go home
[17:52:53] <chasep_work> I'm pretty sure I read (and if not, I've seen) that on a failure (whether secondary or primary) a new primary is elected
[17:53:00] <GothAlice> chasep_work: From http://docs.mongodb.org/manual/core/replica-set-elections/ — "If a majority of the replica set is inaccessible or unavailable, the replica set cannot accept writes and all remaining members become read-only."
[17:53:17] <GothAlice> chasep_work: You may need the same number of nodes on both chassis.
[17:54:28] <chasep_work> you know what, this is a temporary thing.... I'm just going to move the 3 members over, one at a time, and let it run off the secondary
[17:54:36] <chasep_work> then next week, I'll move them back
[17:55:14] <GothAlice> XD Simple is often best, but really, you should try to avoid having a single point of failure. (One physical chassis like that is risky, as you're noticing by having to migrate things off.)
[17:55:45] <chasep_work> GothAlice: well, its a special circumstance... normally only one host in the chasis would need to be down at a time
[17:56:24] <chasep_work> but, there have been some issues, and, this is actually the last attempt to fix them, before we tell dell to find us something different
[17:56:40] <chasep_work> and, like I said, we will have off site DR location
[17:56:57] <chasep_work> and, in that scenario, things are fine. primary and secondary NEVER run at the same time
[17:57:38] <chasep_work> so, secondary can be an exact duplicate of primary (also, we're just using this as session storage, so, if we failover to secondary that isn't in sync, worst that happens is people have to log in again)
[17:58:15] <GothAlice> As I thought, BTW, candidates are ordered by optime and can't become primary unless they are > everyone else, eliminating the reelection issue on most failures.
[17:59:26] <chasep_work> still, a 3 member replica set, primary fails.... i don't know of any way to prevent the possibility of a tie when electing a new primary
[18:00:21] <GothAlice> chasep_work: It avoids it pretty well; one of the secondaries will have a optime that's ahead of the other.
[18:00:58] <GothAlice> (Even if it's only ahead by a few millis.)
[18:01:47] <chasep_work> well, I'm going to move a secondary first tomorrow, then the other secondary, and, finally, if the remaining server is still primary, I'll have it step down
[18:02:05] <chasep_work> so, that should make things even safer
[18:02:34] <chasep_work> we've definitely had some instances where 1 member went offline though, and there were never any issues... so, it must be doing something right
[18:04:50] <GothAlice> I've been running MongoDB in a replica set for… many years now. (Part of the reason my memory is flaky on this stuff is that I set it up so long ago and it just keeps on truckin'.) Never had an election issue despite a half dozen failures (unexpected dom0 maintenance, etc.) in that time.
[18:05:38] <GothAlice> ^ One of the secondaries. ;)
[18:07:22] <chasep_work> We've only been using it for about a month or two.... There are issues sometimes, especially if there is a server restart (even more so if it's unexpected) where mongo won't start. I've found the easiest solution is just to clear out /var/lib/mongodb/, start up one member and rs.initiate(), then start the others and rs.add. When we're at that point, other htings aren't working anyway, so, no big deal to lose the data (like I said, just session data a
[18:08:33] <GothAlice> chasep_work: Yeah, my DB hosts don't use permanent storage, it's all ephemeral. Hosts don't survive reboots, they destroy themselves and the management process spins up a clean node to add to the set.
[18:09:27] <chasep_work> GothAlice: probably will look into something like that at some point.... but, it's not that urgent right now...
[18:10:21] <GothAlice> ^_^ Automation becomes important when you get up to two dozen nodes and 24TiB of data…
[18:13:41] <ianp> we run a replicaset with only 2 members in production. Does this make any sense?
[18:14:03] <ianp> (I am reading http://docs.mongodb.org/manual/core/replica-set-architectures/ )
[18:14:04] <GothAlice> ianp: It gives you a backup and a way to offload reads from the primary, but it won't be fault tolerant.
[18:14:44] <ianp> Does it automagically balance read-only requests?
[18:14:57] <GothAlice> ianp: That's up to how you configure your client driver (i.e. pymongo).
[18:14:59] <ianp> if we configure the client to point to both? (using grails/java)
[18:15:05] <GothAlice> You have to inform it of the existence of a cluster.
[18:15:31] <ianp> Yea we configure the connection as a 'replica set' to the client driver and pass all the hostnames. but I guess I need to read up on its behavior, it may just have failover.
[18:16:06] <GothAlice> ianp: Well, each query you run can have a read_preference indicating if you're OK if the data comes from a secondary. That'll determine which connection the driver uses for the query.
[18:17:01] <chasep_work> GothAlice: that's weird, in my opinion, that it just makes it read only. You'd think there would at least be a setting where you could tell it "until there are X members, keep using the one with the longest optime as primary"
[18:18:38] <GothAlice> chasep_work: In the situation where you have two nodes that become partitioned, neither one is able to determine if it has the latest data out of a majority of the nodes, so they panic and go read-only. (Exactly 1/2 a true majority does not make.)
[18:19:14] <GothAlice> The behaviour would be the same in a 3 node setup where all nodes become partitioned.
[18:19:18] <chasep_work> GothAlice: and I understand that and don't have an issue with it being the default behavior... it'd just be nice if you could override it
[18:19:57] <chasep_work> Availability is more important than consistency, so, if data is out of sync, I can live with it, just keep storing stuff for me
[18:20:23] <GothAlice> chasep_work: And you'd potentially lose data randomly during cluster recovery. ;)
[18:20:57] <chasep_work> again, something I'm willing to live with/work around
[18:21:16] <chasep_work> as I said, I wouldn't want that to be the default setting, but, it would be nice if it were an option
[18:27:02] <doug1> Having trouble adding a user to a db...
[18:32:32] <doug1> GothAlice: this isn't the admin user, it's an app user, and the issue here is how to call it
[18:32:35] <GothAlice> doug1: If you haven't set up users, but have enabled authentication, you can use a "localhost" exception to insert the user without having to authenticate (against accounts which don't exist yet) first.
[18:33:29] <doug1> GothAlice: ok, but what database am I connecting to? I thought the localhost exception went away as soon as you added the admin user?
[18:33:33] <GothAlice> Ah, in that circumstance yeah, you'll have to authenticate. You could add "-u admin -p foo --authenticationDatabase admin" to the "| mongo" on that line and change the /admin db reference to your target db.
[19:08:50] <shoerain> for those that use mongoose/javascript, can I filter Model.find({}...) with nested attributes? i.e. http://sprunge.us/cEfZ
[19:11:08] <GothAlice> shoerain: Aye. Also {"topic.included": true) may work.
[19:11:15] <niczak> So this is interesting... I have two replica sets, one of which is set to priority = 0, hidden = true because I want to sync all the data from one of set to that one. Problem is that nothing seems to be syncing but when I run rs.status() the optimeDate values are equal. When I login to the newly provisioned server that should be getting the data sync'd over I don't see any collections. Thoughts?
[19:27:40] <shoerain> Well, I got the model wrong. it's actually a list of topics... Is this the valid way to do it? Campaign.count({topics: [{ included : true }]}, function (err, count) { ... });
[19:28:11] <shoerain> I should be getting >0, but I get 0 for regardless of the value of included, so I guess something else is off.
[19:29:06] <GothAlice> shoerain: Try my suggestion of {"topic.included": true}
[19:29:28] <GothAlice> Might require an $elemMatch, though.
[19:30:16] <Streemo> so i was reading the manual, and I came across the "publisher/books" embedding vs. referencing example, and i can either give every book a publisher_id property, or give every publisher a books_id property which will be an array. What if i want to be able to do something like access a book's publisher from the book AND access a publisher's books from the publisher? Wouldi just have to have both properties on both documents?
[19:31:40] <GothAlice> Streemo: Separate the queries, or normalize (combine) the data as you suggest.
[19:32:21] <Streemo> what do oyu mean by option 1?
[19:32:37] <GothAlice> Normalize? Denormalize? Ugh, binary words. Streemo: I go for the separate queries approach so as to reduce my need to spread additional indexes everywhere.
[19:33:24] <GothAlice> "from the book", "access a book's publisher", "publisher's books"
[19:34:12] <GothAlice> That'd translate to a findOne() on db.books, findOne() on db.publishers, and a find() on db.books again. Note that to get the listing of all books for the publisher you can skip the middle part, but likely you want other information about the publisher at the same time.
[19:34:13] <Streemo> but you can only acces a book's publisher if i have something like {book: blahblahblah publisher_id:blahblahblah'sPublisher}
[19:39:28] <GothAlice> Streemo: In my Very Large Dataset™ relationships are stored in a very complex way: all documents have a parent ref, parents list of refs, path coalesced string, and a children list of refs.
[19:40:58] <GothAlice> This lets me query for all ancestors (findOne on the record I wish to inspect, pull out the parents list, find() on that list), all descendants (all records whose parents list includes the record I'm inspecting), preserve order of children (findOne on the parent, read out the children list), etc., etc.
[19:41:21] <GothAlice> And makes /foo/bar/baz lookups fast (the path string).
[19:42:46] <GothAlice> Streemo: Before I migrated this from Postgres to MongoDB four years ago, the structure was worse. Nested set (parent ref) adjacency list (left/right integers) where left and right are calculated by walking around the outside edge of the tree counter-clockwise. (You could count how many child elements a record has by subtracting the right from the left and dividing by two. ;)
[19:48:23] <GothAlice> Complicated, yes. Complex, less so. But it is very, very query-able because of the apparent duplication of data.
[19:51:40] <Streemo> Alice, in your example why couldn't I just do db.books.find({publisher: ThePublisherIAmInterestedIn}). I mean I see that what you have for ThePubIAmIntIn is a query that will return said publisher, but it seems unecessary
[19:52:08] <Streemo> because presumably i know the publisher i want to find
[19:52:18] <GothAlice> Streemo: You weren't clear on what your initial value was. I read it as having a known book by the desired publisher and not already knowing which publisher it is.
[19:53:12] <Streemo> no initial conditions ... you provided general solution. fair enough!
[19:58:31] <GothAlice> You are retrieving a "scalar value" called "publisher". I.e. http://stackoverflow.com/questions/4752654/select-a-scalar-value-from-a-table for an SQL-world example.
[19:59:19] <GothAlice> The trick to .scalar() in MongoEngine is that you get a queryset/cursor back. So you can continue to filter, get the count, limit/skip, etc. .get() does a findOne, though.
[19:59:48] <Streemo> reminds me a bit of django's orm
[20:00:14] <GothAlice> Streemo: It's heavily inspired. Most declarative schema systems share their behaviour pretty closely, though.
[20:00:49] <Streemo> but i had to ditch python entirely unfortunately
[20:00:52] <GothAlice> Django's query planner does hideous, hideous things.
[20:00:59] <gansbrest> hi. I started 3 mongo boxes and went ahead to initialize the cluster, after doing rs.initiate() and thne rs.conf() I got short hostname in the host field, which could not be resolved by other hosts
[20:01:18] <gansbrest> is there a way make it take full hostname ?
[20:01:47] <GothAlice> gansbrest: At work we use a carefully managed /etc/hosts file to map short names. Ensure that `hostname -f` results in a full publicly available DNS name on each of the boxen.
[20:02:15] <gansbrest> hostname -f works as expected
[20:02:28] <gansbrest> returns fully qualified name
[20:02:41] <GothAlice> What was your rs.conf() line to add the boxen to the set?
[20:03:06] <gansbrest> I just added 2 other boxes like this
[20:08:55] <GothAlice> "It's like" and "it is" is where I'd like you to check.
[20:09:31] <GothAlice> The shortening of the hostname would indicate to me that the first entry after the external IP and/or the first entry after 127.0.0.1 need to be updated to the FQDN for the host.
[20:10:32] <GothAlice> 10.208.x.x primary.db.example.ca \n 10.208.x.x data01.db.example.ca # from mine
[20:11:51] <GothAlice> I also note that I don't even include short names, which may be effecting things. (In production the actual names are kinda hideous and long, being machine generated.)
[20:12:20] <gansbrest> well, I guess it just takes wherever hostname returns
[20:12:30] <gansbrest> hostname command returns short by default
[20:19:41] <GothAlice> Heh; you may be able to inadvertently resolve your issue by adding a search domain to /etc/resolv.conf (i.e. to add the missing dnsdomainname), but getting the full names into the configuration would be optimal. :/
[20:20:30] <gansbrest> yes, I just did reconfig and now I have full names
[20:21:23] <gansbrest> yep, now everything seem to be fine and other nodes can connect
[20:22:37] <GothAlice> Aaaah; I didn't grok that you actually used the short name initially. You gave me "rs.add("mongodb1.example.net")" so I assumed incorrectly.
[20:29:44] <gansbrest> now I need to figure how to import data from one set to another one
[20:31:08] <clayzermk1> Hello everyone! I would like to use UUIDs on a field (not _id) in my documents. 1) UUID() seems to generate a subtype 3 (old UUID) instead of a subtype 4 - why? 2) Is there any real advantage to using a BSON UUID over a string? Cheers!
[20:31:54] <GothAlice> clayzermk1: Strings in BSON, AFIK, are utf-8. The binary representation of a UUID is not utf-8 safe.
[20:32:13] <GothAlice> Storing a UUID in hex is inefficient.
[20:32:49] <GothAlice> (Technically a UUID is just a 128-bit number.)
[20:33:25] <clayzermk1> I get the inefficiency bit (in this case I don't really care about that small bit of data - for now anyway ;) )
[20:33:45] <clayzermk1> I'm mostly curious about index performance on the strings vs BSON UUIDs
[20:37:35] <GothAlice> clayzermk1: Heh. I'm retentive about some things, i.e. I optimize space by using single-character attributes on my documents. If you do store the UUID as a canonical-form string, you're going to want to use a hashed index on it since it's unlikely you'll be doing anything other than equality checks against a UUID.
[20:38:34] <GothAlice> That'll need some benchmarking of real-world queries to properly identify, though.
[20:39:15] <clayzermk1> Awesome, thank you! Yep, equality checks only. Still curious about the UUID function returning type 3 instead of 4.
[20:40:03] <clayzermk1> I read there are byte order compatibility issues with type 3
[20:40:42] <clayzermk1> Maybe I'm stressing out about nothing. Anyway, thank you :)
[20:40:45] <GothAlice> Yeah, that's kinda weird. Since I'm using a client driver that isn't JS, I can pick which UUID type I want and the driver just BinData(3, "…")s the base64 result.
[20:41:13] <clayzermk1> sorry, UUID() is in the shell
[20:41:27] <clayzermk1> but goes against the BSON spec recommendation of using type 4
[20:46:00] <gansbrest> guys, what's the best strategy for me to move data from our stage cluster to prod
[20:46:24] <GothAlice> gansbrest: I mongodump/mongorestore.
[20:50:39] <GothAlice> With the True Power of UNIX™ you can combine mongodump, xz (compression), netcat (or an SSH connection), with unxz and mongorestore on the other side, streaming the data rather than exporting in one stage, transferring, then restoring.
[20:50:54] <GothAlice> (mongodump -o - # output to standard out so you can pipe it along)
[20:51:17] <gansbrest> do you have full command? )
[20:54:50] <Streemo> whats up with the mongodb manual repeating whole sentences and images several times
[20:55:28] <Streemo> something i noticed in the first 4 chapters
[20:58:02] <GothAlice> gansbrest: Huh, turns out I'm wrong. mongodump supports output to stdout, mongorestore doesn't support input from stdin. Why would you even *have* a stdout option if you can't use it that way? (It'd be something like this, if it would have worked: mongodump -d somedb -o - | ssh -C mongorestore -d somedb -
[21:11:43] <gansbrest> one other general question - do I need to send writes to the primary from the application, or I can send to any and it will transfer write internally to the primary?
[21:15:00] <Boomtime> gansbrest: writes can only go to a primary, however the driver takes care of this automatically so you don't "send" to any particular host
[21:15:53] <gansbrest> by driver you mean app library, or part of the code on each mongod receiving request?
[21:16:27] <Boomtime> yes, usually. what language are you using?
[21:16:42] <gansbrest> we will be using it from node
[21:22:08] <Boomtime> the driver will do it for you
[21:22:23] <Boomtime> that does not mean you get a free ride
[21:22:39] <Boomtime> if the primary dies, you'll get failed commands for a few seconds until a new primary steps up
[21:34:25] <gansbrest> Boomtime: thanks. I'm storing mongo data on EBS, is it safe to snapshot it for backups?
[21:38:14] <jeffmcnd> Hello everyone, could someone send me a link explaining mongodb aliases? I don't understand their function. Are they used when you want to name a db the same as another but also want to differentiate?
[22:35:45] <GothAlice> Boomtime: That's one of the big things I love about MongoDB: it seems to have a pretty excellent separation of concerns. (BSON and ObjectId generation/encoding driver-side, etc., etc.) These things make everything simpler. :)
[23:05:13] <hejbacon> does anyone know if I can use the $and condition in mongoose ? example: people.findOneAndUpdate({ $and: [{ occupation: 'musician', name: 'jason borne' }]}, { $set: { language: 'swedish' }}, { upsert: true }, callback), so far I´ve only been able to use the query.and() way, but can´t combine that with .findOneAndUpdate()
[23:21:30] <joannac> hejbacon: why do you need $and for that?
[23:22:07] <hejbacon> joannac: if I don´t use .and(), it returns documents without an exact match