[01:38:18] <ailaG> Hi, I'm new to Mongo, learning Meteor. I want to update an entire collection from a JSON file periodically (replace the whole thing). Where do I start reading?
[02:04:14] <Jonno_FTW> joannac: nvm I got it using the $elemMatch operator
[02:04:41] <Boomtime> ailaG: then a series of .update commands would never delete documents
[02:04:52] <ailaG> It's a list of lectures in an event, exported from the CMS that manages them. I'm working on a meteor project that displays them on alternative platforms.
[02:04:57] <Boomtime> ailaG: you can instead drop the collection first, then just do a series of .insert
[02:05:08] <ailaG> I just ran .update({}, {foo: 1}) and the previous data was deleted...
[02:06:01] <Boomtime> ailaG: do you only have one document?
[02:06:52] <Boomtime> .update({},... <- this predicate means "anything" - i.e "update anything"
[02:07:23] <Boomtime> if you only have one document then the match is fine, but if you have two documents.. i wonder which one you'll update?
[02:10:11] <ailaG> Boomtime: No, it was a test…. Gotcha
[02:30:30] <ailaG> Goodnight everyone, thanks for the help.
[02:40:35] <freeone3000> I'm getting the error "replSet not trying to sync from 54.153.95.17:27018, it is vetoed for 218 more seconds". Why would it be vetod? It's my primary.
[03:00:02] <joannac> freeone3000: oplog insufficient? look further up in the logs
[03:01:41] <freeone3000> Ah. Connection refused. Which is odd.
[03:17:25] <freeone3000> joannac: Okay, connection is not refused, error log no longer prints that, but my optime on the secondary is still not increasing. What gives?
[03:59:41] <tylerdmace> Can anyone with mongoose.js experience point me the way on how to change the settings of an existing connection (host, port, user, and pass)
[03:59:50] <tylerdmace> without having to create a whole new connection
[04:00:00] <tylerdmace> is it possible to change those settings and reconnect somehow
[04:06:05] <jaitaiwan> tylerdmace: no matter what it would always be a new connection. What are you trying to achieve?
[04:08:35] <tylerdmace> well I have some express.js middleware that on particular routes, it checks a mongoose connection manager I wrote for an existing database connection associated with the user. If it doesn't find a connection, it creates one. I need to add one more feature in where it checks a setting for a new host, port, user, or password and if any of that information has changed, we need to update the current connection to use those settings
[04:08:50] <tylerdmace> but if a new connection is required, I'll just tackle it that way :) thank you
[07:50:00] <mskalick> "Run db.upgradeCheckAllDBs() to find current keys that violate this limit and correct as appropriate. Preferably, run the test before upgrading; i.e. connect the 2.6 mongo shell to your MongoDB 2.4 database and run the method."
[07:50:28] <mosquito87> I have two addresses ... one pickup and one handover address. Therefore I need two chained "$geoNear"
[07:50:45] <mosquito87> result is an error: Can't canonicalize query: BadValue Too many geoNear expressions
[07:50:47] <mskalick> Is it problem to run upgradeCheckAllDBs() alter upgrade to 2.6? (with running mongod 2.6)
[07:51:49] <joannac> mskalick: sure, I guess. for what purpose?
[07:52:41] <joannac> i presume you mean "after upgrade"
[07:53:06] <Boomtime> mosquito87: again, what result are you expecting? how can you order documents by nearest to two disparate points? in what order would you place points along the route?
[07:53:48] <mosquito87> I have an document with two addresses. Results should be "near" address1 and "near" address2
[07:55:36] <mosquito87> As an example: Document1 with address1: [8.3314179, 49.5112888], address2: [20.3314179, 49.5112888]. Now I want to be able to find all documents with address1 near [8.33..., 49.511...] AND address2 near [20.331..., 49.511...].
[07:58:52] <Boomtime> mosquito87: i don't think you can do this, you need to perform two wholly independent geoqueries and merge their result-set, the geonear operator can only be used once in a query, or a pipeline
[07:59:30] <mosquito87> Yep. I read that. But I expect a bad performance in future ... Is there any "workaround"?
[08:00:18] <mosquito87> Let's say I have a million docs where "address1" fits. Now I check if "address2" fits, too. And the result is that only 2 docs fit. Can't imagine a good performance doing it like that.
[08:00:48] <Boomtime> perhaps you should define what it is you actually want to achieve
[08:01:15] <Boomtime> it sounds like what you want is not "nearest to these two points" but "points inside this polygon"
[08:01:15] <mosquito87> "fits" = near, so in the radius the user has specified
[08:01:38] <Boomtime> what if the radius is less than the distance between those two points?
[08:01:47] <mosquito87> first address is a pickup address. Second address is a handover address. Both have to be near the addresses of another user.
[08:02:24] <mskalick> joanac: yes, after upgrade. I asked, because in "Compativility Changes in MongoDB2.6" there is "Preferably, run the test before upgrading" but in other documents there are instructions to first upgrade to newer version...
[08:02:35] <mosquito87> let's assume my pickup address is Germany, handover address is USA. If your pickup address is Germany too, but handover address is not USA, but Spain, then the result should be "false"
[08:02:40] <mosquito87> this is what I mean by "fit"
[08:03:07] <Boomtime> mosquito87: what radius would you use for that query?
[08:03:16] <mosquito87> The user can set his radius
[08:03:35] <mosquito87> of course the real address isn't just "Germany", but a real address, with street, zip code, etc
[08:03:41] <mosquito87> so I have latitude and longtitude of this address
[08:03:50] <Boomtime> whatever, is your example even remotely possible or not?
[08:04:20] <joannac> mskalick: source? none of our docs should be telling you to upgrade without due dilligence; if they do I'll make sure they get fixed
[08:04:23] <mosquito87> what do you mean by remotely?
[08:04:30] <Boomtime> i don't think you've thought through your requirement - i conjecture that your requirement is absurd, so i want you to give an actual working example
[08:06:15] <mosquito87> User 1: Pickup address is Germany, Handover address is USA. User 2: Pickup address is Germany, Handover address is USA. User 3: Pickup address is Spain, Handover address France. User 4: Pickup address is Japan, Handover address China. Now User 1 wants all addresses which fit to his pickup AND handover address.
[08:06:24] <mosquito87> The result should be the address of user 2.
[08:06:35] <mosquito87> As only user 2 has the "same" addresses.
[08:07:51] <Boomtime> excellent, it is not a single point the user resides at, you have two seperate addresses and two seperate matches
[08:08:07] <mosquito87> so two "geoNear" is wrong for that use case?
[08:08:27] <Boomtime> i understand, and your use case is valid
[08:08:58] <Boomtime> but it can't be done... easily... if at all
[08:09:24] <mosquito87> I could first check the pickup address. Then check the handover address.
[08:09:37] <mosquito87> But I think this will result in a terrible performance when having millions of addresses.
[08:10:17] <morenoh149> how should you model a two-way relationship between documents? is there a good way?
[08:10:19] <Boomtime> yes, it is an interesting problem, i may have to ponder it for a while
[08:10:26] <mskalick> joannac: I probably wrongly understood "Package Upgrades¶ If you installed MongoDB from the MongoDB apt or yum repositories, upgrade to 2.6 using the package manager."
[08:10:27] <mosquito87> User 1: Pickup address is Germany, Handover address is USA. User 2: Pickup address is Germany, Handover address is USA. User 3: Pickup address is Germany, Handover address France. User 4: Pickup address is Germany, Handover address China.
[08:10:50] <mosquito87> If I check for the pickup address first, then I will get user 2, user 3 and user 4. Then check handover address. User 2 will be left.
[08:11:05] <mskalick> joannac: So install 2.6 shell, run db.upgradeCheckAllDBs(), fix and install mongod 2.6?
[08:11:43] <mskalick> joannac: different order is unsafe... right?
[08:13:08] <zivix> morenoh149: If you just need a 1:1, create it in one of the documents and query against that when looking at the other type. So e.g. A->B and B. When looking at B query for A where ->B
[08:14:07] <zivix> If you need many-to-many you probably want a join collection that points to A and B. Not sure what your data looks like, though.
[08:14:58] <zivix> mosquito87: when you're querying for proximity you can discard large groups of addresses, right? Can you assign them a score of some kind and use that as a reference?
[08:15:41] <zivix> So consider: I query for address A, based on proximity I can calculate a score (distance, for example). maybe I only am interested in records where score < 50
[08:15:53] <morenoh149> are there any examples of join collections?
[08:15:59] <Boomtime> mosquito87: sorry, i do not see an easy solution, certainly i don't think you can construct a single query to do it
[08:16:13] <mosquito87> @zivix: Could you give an example?
[08:16:22] <mosquito87> @Boomtime: Do you share my concerns about performance?
[08:16:36] <joannac> mskalick: well, the check is to fix problems before you upgrade to 2.6. If you've already upgraded, it's less useful.
[08:17:21] <Boomtime> mosquito87: i can only assume you mean performance on the client though i think you can ensure it won't be too bad by limiting your result-sets and capping radius
[08:18:14] <mskalick> joannac: I thought only upgrade (=start mongod), without usage, inserts, ... ?
[08:18:18] <zivix> Let's say you calculate distance in km when you run the first query. Only look at the 50 closest addresses. So your set limit is 50. You can run the second query on another set of 50 and see if there is any overlap.
[08:18:51] <zivix> So if you have individual homes it's going to miss but if you have warehouses or distribution centers you'll probably hit frequently.
[08:19:00] <mosquito87> Address1 can be very close ... but this doesnt mean address2 is close as well
[08:19:11] <zivix> You still want to use it if address 2 is not close?
[08:19:41] <mosquito87> but I can have millions of documents where address1 is very close. But address2 is very far away from the address2 I want to compare to
[08:20:35] <zivix> How do you know which address 2 you want to compare to if not based on distance?
[08:21:11] <mosquito87> my user has address1 and address2. I want to find all other users, where address1 is near to address1 of user and address2 is near to address2 of user
[08:22:06] <zivix> I agree that it would be straightforward if you could use two geoNear queries but in absence of that is there another way to score / query for proximity?
[08:22:21] <mosquito87> how would such a score look like?
[08:22:45] <zivix> If you have lat/long stored for example you could create a score based on the difference between address 1 and address 1 vs address 2 and address 2
[08:23:24] <mosquito87> but I have millions of addresses ... so I want to compare both addresses of the user to million other addresses
[08:24:26] <mosquito87> User 1: Pickup address is Germany, Handover address is USA. User 2: Pickup address is Germany, Handover address is USA. User 3: Pickup address is Germany, Handover address France. User 4: Pickup address is Germany, Handover address China.
[08:24:35] <mosquito87> How would the scores look like?
[08:27:28] <zivix> So in germany to germany. Let's say you're looking at Regensburg and Frankfurt. Regensburg is 51 12 and Frankfurt is 50 9
[08:27:30] <joannac> mskalick: okay, you can run the check now, I guess.
[08:28:21] <zivix> You could do some fancy math for triangles but for simple stuff, abs(51-50) is 1 and abs(12-9) is 3. So your distance score is 4
[08:28:48] <zivix> Repeat for address 2. Let's say score is 3.
[08:29:02] <zivix> Total score is 7, which is overall proximity for both addresses.
[08:29:13] <zivix> Then you sort by lowest score to see who's close by
[08:31:30] <zivix> I think geoNear does some fancy things to use triangle math on a spherical surface so you get actual distance but you might not actually need that level of precision.
[08:32:33] <mosquito87> Problem is that the user can define the precision by defining a radius
[08:34:26] <mosquito87> So I can still just do two near queries
[08:34:32] <mosquito87> first for address1, then for address2
[08:34:58] <zivix> Erm. You can't do that with inline javascript and compute score on the server?
[08:35:07] <zivix> That would save you having to pull 2 giant resultsets down.
[08:36:18] <mosquito87> It's a node.js server ... so I guess I could.
[08:36:23] <zivix> You still have to iterate your entire dataset but I don't think you can escape that unless you do some kind of pre-processing to build an index.
[08:36:27] <mskalick> joannac: Or could it work to define functions from src/mongo/shell/upgrade_check.js in mongo 2.4 shell and run them there?
[08:36:56] <zivix> You can do an eval as part of your query
[08:37:06] <mosquito87> But again if I have 1 million documents where address 1 is very near ... And only 1 document where address 2 is very near as well ... I still would have to iterate through 1 million docs (where address 1 is very near)
[08:37:56] <zivix> Hm... so for that you might want to use the aggregation piece. Or change the way you query it.
[08:38:34] <zivix> For example, if geoNear can use an index you can scout your potential matches before you run the aggregate.
[08:38:46] <zivix> And potentially say "Yeah there's no match for address 2 so don't bother."
[08:39:39] <zivix> You might still run the aggregate and find that there's no overlap between near address 1 and near address 2, but I think you have to bite the bullet somewhere and run that operation.
[09:12:14] <stiffler> hello, Im quite new with mongo, but Im just wondering, how deep can I go in documents, for example how many arrays can I have in an array?
[12:05:58] <pamp> its possible restore the admin db from one server to another, but this second server also has an admin db
[12:06:18] <pamp> i need to merge the older and new admin database
[12:14:14] <eirikr> i think its ok, mongorestore operations are inserts, so if you restore db in a exist db all new documents will inserts
[12:21:04] <StephenLynx> heey, mongo update on ubuntu repos
[12:22:22] <pamp> i already restore all dbs with success, but when restore admin db i get this error
[12:22:24] <pamp> Error creating index admin.system.version: 13 err: "not authorized to create index on admin.system.version"
[12:25:56] <panshulG> Hello people.... I am using findAndModify... and if my query returns multiple documents... will all the returned documents be updated?
[13:09:31] <eirikr> @pamp : ok i understand, your user with you want to restore db have "restore" role ? its important.
[13:11:38] <stiffler> hi, I have a problem with returning results of db.collection.find() by function. It says that is undefined but console.log says smth different
[13:22:51] <stiffler> so could you help me to solve this problem?
[13:23:03] <stiffler> i would like to return it after will be fetched
[13:27:54] <iksik> stiffler: You have two choices here - non of them is returning anything from inside of nested function the way You would like to... first solution is to use simple callback for getTimes (like: getTimes: function ( lineNr, stopNr, dirNr, callback ) - where 'callback' function can be called instead your current 'return', like callback(body.stops[i].timetable. Second solution is
[13:27:54] <iksik> to use promises - but You just need to read a bit about them to understand how they work (also it's a bit off topic for this channel)
[13:29:37] <StephenLynx> are you using node or io?
[13:37:55] <iksik> yea, for now... but You need to improve your code to handle all scenarios (your if conditions)... You need to be sure that callback is ALWAYS triggered
[13:38:36] <stiffler> basicly you mean error validation ?
[13:39:57] <iksik> look, with this example: Stops.getTimes(lineNr, stopNr, dirNr, function(body) { .......this code will never execute if You wont fire callback inside getTimes..... })
[13:55:20] <StephenLynx> MVC is the user loading always the same HTML and javascript, then contacting the server to load data and the javascript manipulates DOM to present the loaded data.
[13:55:34] <StephenLynx> look at my other project, lynxchan, it works like that.
[13:56:00] <stiffler> always the same view ? what if you page would have different views on each subpage
[13:56:14] <GothAlice> Uhm, well, no, MVC itself has very little to do with the mechanism of content delivery. It doesn't technically require a single loaded presentation layer, that's just one option.
[13:56:15] <stiffler> like contact gallery, about us, catalogue
[13:56:50] <StephenLynx> stiffler you use routing on the server that provides the static files. Like lynxchan does.
[13:57:07] <StephenLynx> GothAlice is more about the controller having knowledge and manipulating objects of the view.
[13:57:29] <stiffler> StephenLynx: Im gona take a look on it later
[14:00:40] <GothAlice> StephenLynx: WebCore 2 does MVC via: URL -> dispatch -> controller -> returned value -> view registry lookup -> view, with the controller interfacing with the model, and views usually "rendering" something about a model. I.e. the "view" for a "file-like object" is to stream the file (with proper caching, range matching, etc.) A controller can thus just return one, and the view will fire up and Do The Right Thing™.
[14:01:27] <GothAlice> (The view for a model object when the request has the XHR header is to return the public JSON-serialized version of that model object, as another example.)
[14:03:05] <StephenLynx> it has separate controllers that takes in the content from the main controllers and renders an appropriate view?
[14:03:55] <StephenLynx> lunch, will read when I get back
[14:03:59] <GothAlice> That's possibly one way to look at it, but that's not quite right. The registered views get the request context, yes, but aren't supposed to _do_ things, only transform the value returned by the controller in some way for consumption over HTTP.
[14:10:33] <GothAlice> StephenLynx: https://github.com/marrow/WebCore/blob/rewrite/example/controller.py?ts=4#L23-L28 < example controllers, highlight demonstrates that "endpoints" (not just callable functions) are perfectly valid, with the static() factory creating a callable endpoint that loads and returns file objects from disk.
[14:10:38] <GothAlice> StephenLynx: https://github.com/marrow/WebCore/blob/rewrite/web/ext/base/handler.py?ts=4#L30-L58 is the view to handle those file-like objects. https://github.com/marrow/WebCore/blob/rewrite/web/ext/template/handler.py?ts=4#L11 is the view to handle 2- or 3-tuple (template, data) rendering.
[14:20:45] <hayer> How can I get the name of all fields in a collection? Like document1[field1, field2] document2[field1, field3, fieldY] -- should return field1, field2, field3, fieldY ..
[14:20:55] <GothAlice> hayer: You'll have to use map/reduce for that.
[14:21:33] <hayer> GothAlice: Ah, okey. Just to be sure that I made my self clear. I want the name of the fields, not the values.
[14:21:56] <GothAlice> In your map, emit the field names from each document (may require recursion if you nest fields!) and reduce to the unique set of them.
[14:32:08] <hayer> GothAlice: Thanks! That was actually quite simple after you told me what "attack vector" to use @ problem.
[14:45:06] <jerome-> is there an official way to create a database ?
[14:50:55] <blaubarschbube> could somebody explain to me why this -> http://pastebin.com/8N7pwh8X works for one node but does not for the other one? in pupped-dashboard it reads: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class default_packages for inventory.hamburg.contentfleet.com on node ...
[14:54:32] <GothAlice> Creating indexes on a collection that didn't exist before would also do it. Technically what's happening is the database is created when the first namespace is allocated. (Inserting a record creates an _id namespace, adding an index would create a namespace for that index.)
[14:55:49] <jerome-> well, I will try to do without the tmp collection in mongohub
[14:56:41] <jerome-> (I mean accept that the database can't be created empty, and keep around a virtual database until mongodb creates it)
[14:59:20] <StephenLynx> why do you need to have an empty db in the first place?
[15:00:11] <jerome-> because the usual workflow in mongohub is to create a database before creating collection
[15:11:54] <GothAlice> It's not a very good tool. For years resizing the window would exhibit "we've attached these UX widgets to the wrong side of the window" silliness, even.
[15:12:26] <jerome-> GothAlice try another fork...
[15:12:41] <GothAlice> Or just use an interactive shell like the gods intended. ;)
[15:15:40] <StephenLynx> ctrl+alt+T > mongo is all you actually need.
[15:16:58] <pamp> hi, should I put the bin folder and data in different hd's? or is irrelevant ?
[15:17:36] <pamp> i've data and logs in different drives
[15:17:39] <GothAlice> pamp: For the most part, it won't matter. The binaries will get loaded once on startup.
[15:18:18] <GothAlice> Having logs separate is not just a good idea. (I warehouse logs on a separate set of servers, even, to protect against log loss on catastrophic machine failure.)
[15:19:42] <pamp> at this time i only have a single machine, but in the future I will use sharding
[15:20:10] <pamp> but always with data logs and backups in defferent drives
[15:20:54] <GothAlice> "Do you have backups?" means "I can't fix this." Having a single machine is a bad idea from a data safety perspective, even ignoring high-availability. (Two replicas will give you safety, three gives you reliability.)
[15:22:02] <GothAlice> Sharding itself won't increase the safety of your data. In fact, because statistics multiply, each additional shard you add will roughly halve your mean time between failure.
[15:29:13] <pamp> GothAlice: this machine is in the azure cloud
[15:31:44] <pamp> they ensure data security, i think
[15:32:16] <pamp> but yes, i will use replication in the future
[15:37:45] <GothAlice> The issue is less about security and more about your data going *pif*. ;)
[15:46:29] <boutell> Hi. Does mongodump support URIs for connection to databases? It looks like it only supports all the old school —user, —password junk?
[15:47:16] <roadrunneratwast> what do people use to model relations between mongodb Schemas? ERD Entity Relationship Diagrams?
[15:47:37] <roadrunneratwast> Has anyone ever created a taxonomy or ontology in Mongo? Examples?
[15:47:54] <GothAlice> boutell: URIs with combined username/passwords in them can be tricky to parse due to the potential for a duplicated colon.
[15:48:34] <GothAlice> roadrunneratwast: Yes, though generally storing graphs is better done in a dedicated graph database, since MongoDB has no concept of joins, and any branch traversal would effectively require multiple roundtrips—not very efficient.
[15:49:29] <GothAlice> roadrunneratwast: https://gist.github.com/amcgregor/4361bbd8f16d80a44387 is my taxonomy model mix-in for the MongoEngine ODM.
[15:49:58] <GothAlice> (Stores immediate parent, list of all parents, coalesced path, and numeric order and has all the management methods needed to maintain that structure.)
[15:50:09] <GothAlice> (Following jQuery's DOM manipulation API.)
[15:51:59] <StephenLynx> roadrunneratwast they can use several tools that mimics relations. including field references, but in pratice they just perform aditional queries.
[15:52:21] <GothAlice> roadrunneratwast: See also: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html
[15:52:57] <boutell> GothAlice: that sounds painful, but the mongo client supports URIs, why wouldn’t mongodump have the same options? Are URIs deprecated now?
[15:53:27] <StephenLynx> it is possible to use these fake relations if your application won't need to perform multiple queries because of them.
[15:53:48] <GothAlice> boutell: No, it's just that the command-line tools are discrete tools, not exposed API. They have no particular requirement to behave the same way, and don't mostly for historical reasons, as lame as that can be.
[15:54:28] <StephenLynx> lets say you have an entity with a list of something. because of reasons, having a subarray with this is does not work very well. so you make a separate collection that holds these objects, each object containing a field that indicates the parent entity.
[15:54:45] <StephenLynx> and you can gather all you need from this collection with a single query.
[15:54:54] <boutell> GothAlice: I’m whinging because we standardized on configuring URIs, which seemed to be the modern thing, and one prefers not to start parsing URIs with bash just to write a dump/restore script
[15:55:03] <GothAlice> By storing so many different references, my taxonomy model is able to answer many different types of queries without recursion. I.e. all parents of document X, all descendants of document Y, siblings of document Z in the correct order, etc.
[15:55:54] <StephenLynx> but lets say you would have to perform one query for each parent object, or one query for each object, or have to assemble data from both in your application.
[15:55:56] <GothAlice> My favourite: give me the deepest document matching a given path.
[15:56:15] <StephenLynx> that's when you should avoid using these fake relations.
[15:56:24] <boutell> yes, materialized path + rank + depth covers a lot of ground.
[15:57:10] <GothAlice> boutell: Formerly this structure was a combined nested set + adjacency list. But the overhead of updating left/right references (potentially touching every document in the taxonomy…) became too extreme.
[15:58:16] <boutell> GothAlice: never mind the race conditions.
[15:58:32] <GothAlice> Those are mostly resolved naturally by atomic increments.
[15:58:46] <GothAlice> I.e. it won't matter if two operations interleave, the end result will be the same regardless of the order of operations.
[15:58:47] <boutell> I used a nested set model in a SQL driven CMS site once. One day the page tree came unmoored from Earth’s orbit and floated into the sun.
[15:59:10] <boutell> after that we stopped trusting the library we were using and added beacoup locking logic
[15:59:36] <GothAlice> Heh. Yeah, I needed a "defragmenter" that would run the adjacency list and rebuild the nested set data. Sometimes things got a little stuck…
[15:59:41] <boutell> but, in the next generation of our CMS, we used the materialized path model where this is not a thing. The worst case is two people insert a page simultaneously, and get the same rank among their peers, which does nothing terrible.
[15:59:58] <boutell> I recall writing that rescue task too
[16:00:30] <boutell> I do, however, still wonder if there’s a way to address that issue of two peers getting the same rank without locks.
[16:00:53] <boutell> you mentioned increments. It occurs to me that the parent page could have a nextChildRank property
[16:01:11] <GothAlice> My CMS model, v1: https://gist.github.com/amcgregor/ee96bbaf2ef023aa235f#file-contentment-v0-py-L110-L114 (https://gist.github.com/amcgregor/ee96bbaf2ef023aa235f#file-contentment-v0-py-L172-L267 being the attachment code—I couldn't think of a better name for a function opening/closing holes in the left/right structure than "stargate")
[16:01:15] <StephenLynx> hey, is it possible to only project a field if another field is set to false?
[16:01:59] <GothAlice> boutell: And that taxonomy I linked first is part of the v3 model factored out like it should be. ;)
[16:02:18] <StephenLynx> hm, thats unfortunate. because I have this field on posts on my forum that dictates if the post is anonymous. what I have is to not project the name if this field is set to true.
[16:02:25] <StephenLynx> currently I do this with application code.
[16:02:34] <GothAlice> StephenLynx: Ah, you have data security auditing requirements.
[16:03:26] <GothAlice> I keep forgetting about things only added in 2.6… clearly I've been using MongoDB for too long. ;)
[16:07:33] <GothAlice> boutell: Hope I didn't overload you, there. ^_^
[16:08:35] <hashpuppy> i'm setting up a new replica set. 2 databases + 1 arbiter. i've configured the replica set. i now see mongodb1 as primary and mongodb2 as secondary. but mongoarbiter is status UNKOWN (and after restarting stuck at status DOWN) with "still initializing". When i log into that server and rs.status() i see this message ""loading local.system.replset config (LOADINGCONFIG)" w/ startupStatus 1
[16:09:01] <hashpuppy> what did i do wrong? or how can i get the arbiter running in that replica set
[16:09:47] <GothAlice> hashpuppy: Might be simple enough to nuke your existing arbiter completely (and remove it from the set on the primary), then re-add it using http://docs.mongodb.org/manual/tutorial/add-replica-set-arbiter/
[16:14:16] <StephenLynx> "The argument can be any valid expression as long as it resolves to $$DESCEND, $$PRUNE, or $$KEEP system variables. For more information on expressions, see Expressions."
[16:14:27] <StephenLynx> then you read these variables
[16:14:36] <StephenLynx> " $redact returns the fields"
[16:14:42] <StephenLynx> " $redact excludes all fields"
[16:14:47] <StephenLynx> " $redact returns or keeps all fields"
[16:14:59] <GothAlice> One of the examples demonstrates redacting individual members of a list of sub-documents, the next demonstrates nuking subsets of fields in general, and the see more link is a complete tutorial on field-level redaction.
[16:19:22] <StephenLynx> I still can't find an example where you can omit a top level field on a document.
[16:19:33] <hashpuppy> GothAlice: that didn't seem to work
[16:21:40] <GothAlice> StephenLynx: A simpler approach may be an expression in your projection.
[16:22:01] <StephenLynx> I agree, and I was thinking about that.
[16:36:38] <StephenLynx> I really don't think I can use an expression on the project block for what I want. it just projects the field AS the result of the expression.
[16:36:57] <GothAlice> Hmm. You need the result of the expression to be zero or one…
[16:37:34] <GothAlice> Or… project the value of the author field as the contents of the author field if expr is true, otherwise null.
[16:41:32] <StephenLynx> it just projects the boolean resulting from $eq
[16:41:39] <StephenLynx> so no, being true or false does not suffice.
[16:42:40] <pamp> GotAlice: note that ManagedObjects is an array, and also props
[16:43:09] <pamp> "errmsg" : "cannot use the part (ManagedObjects of ManagedObjects.props) to traverse the element
[16:51:29] <StephenLynx> so far it seems I'm fucked :^)
[16:51:54] <GothAlice> You're not. Use $cond, true value is the value of the field, false value is null.
[16:53:35] <StephenLynx> It seems that list of query and projection operators will not suffice as reference.
[16:56:31] <StephenLynx> yup, that was exactly what I needed. thanks m8
[17:03:21] <roadrunneratwast> StephenLynx: thanx 4 the lynx
[17:25:18] <pamp> is possible in the {"$project"} generate a new _id??
[17:26:02] <pamp> I am creating a new collection from an aggregation
[17:48:21] <carlosf> hi there, can you help me out?
[17:50:35] <carlosf> is there any recent benchmarks between OrientDB and MongoDB? what are the pros and cons of OrienteDB vs MongoDB?
[17:50:59] <StephenLynx> don't know about orientedb
[17:51:04] <codetroth> You know it drives me nuts. I am looking at a MongoDB statement and clearly see fields of _id, from, and mytype.
[17:51:18] <codetroth> I can get data from every value except from which comes back as undefined
[17:51:22] <StephenLynx> but mongo is known for performance and being easy to include additional servers in a cluster.
[17:52:13] <codetroth> is the word from a reserved word in Mongo at all?
[17:52:15] <boutell> carlosf: MongoDB is widely used and supported. If nobody here has heard of / used OrienteDB, and you don’t feel up to evaluating them yourself, you probably want to use MongoDB, and get back to worrying about your actual project.
[17:52:35] <StephenLynx> codetroth print a find() and show me
[17:52:39] <boutell> codetroth: I don’t think there are any magic words in mongo that don’t start with a $
[17:53:59] <StephenLynx> orient seems to be useful for graphs
[17:54:02] <boutell> carlosf: to be slightly more helpful… skimming the homepage tells me orientdb is a graph database. If your data is actually a connected graph, that could be a win for you, maybe. It’s not the only graph database.
[18:17:16] <StephenLynx> I avoid it like the plague.
[18:17:35] <codetroth> First time really work indepth with node or mongo and I am begining to feel the same way
[18:17:41] <codetroth> This project will be the last time I use it
[18:18:02] <codetroth> I already dumped the twilio library in favor of using their rest api directly
[18:20:12] <StephenLynx> by default I don't add a dependency to a project.
[18:20:44] <StephenLynx> my first question is "why do I need it " instead of "why would I not use it"
[18:21:14] <StephenLynx> dependencies I have used so far in node/io: db drivers, bcrypt, nodemailer
[18:21:24] <StephenLynx> they do ONE thing and do it well.
[19:44:21] <nobody18188181> how can i change mongod settings without restarting the service?
[19:51:30] <nobody18188181> or, is there a way to force mongo to reload the config file?
[19:56:45] <nobody18188181> doesnt look like it: https://github.com/mongodb/mongo/blob/82daa0725d7f26bd7ccaf7e4280932ad548f549c/src/mongo/util/signal_handlers.cpp
[20:20:47] <tera_> Having trouble with a scenario. I have a document (employee) that has an array of objects (salary) that contain historical values for their salary. I want to select the employee and their current salary (just one item in that array). I used $elemMatch on the projection and successfully get just the current salary but I lose all other fields on the document unless I go and explicitly set them to 1. According to an error it is not possible to remove items
[20:20:47] <tera_> with the positional notation {"salaries.$": 0}. So it seems the only way to get the rest of my document is to know all the possible fields and set them to on?
[20:26:24] <StephenLynx> if you use projection to say "I want to project this", mongo does not projects anything else that you don't tell it to project too.
[20:26:58] <GothAlice> StephenLynx: Unless one is using Mongoose, then all hell can break loose. (It'll automatically include sub-documents…)
[20:36:30] <tera_> I figured that was the case with projection but is there an equivalent projection like {"salaries.$": 0} where it will omit non-matches and include the rest of the document? Enumerating all possible fields on the document would be tedious at best.
[20:36:46] <tera_> Or perhaps a "work around" other then turning all the other fields on
[20:37:14] <StephenLynx> you can tell mongo to don't project stuff, yes
[20:37:26] <StephenLynx> and then it will project everything you don't tell him about.
[20:37:59] <tera_> Yep thats why I tried the {"salaries.$": 0} but I get an error "Cannot exclude array elements with the positional operator (currently unsupported)."
[20:38:23] <StephenLynx> I suspect you might want to make salaries a separate collection.
[20:39:51] <tera_> I'm not sure what the structure of that would look like. The current array of salaries contains the fromdate, todate, and salary amount
[20:40:22] <StephenLynx> first of all you would have to put a field to identify the person it belongs to.
[20:40:31] <StephenLynx> then the data it already contains.
[20:42:26] <tera_> Yea but I'm back to the same problem. I'm thinking perhaps put all the historical in a salaries collection and keep the current with the actual employee document
[20:42:47] <tera_> Then there is no reason to project on an array with the employees document
[20:45:08] <lost1nfound> hey guys, hopefully this is the right place and if not im sorry, but, im attempting to upgrade my 2.4.12 development instances to 2.6.8, and when I connect to it with a 2.6 client and run the db.upgradeCheckAllDBs() command, I get lots of "DollarPrefixedFieldName: $gt is not valid for storage." type of errors. not sure if its something im misunderstanding or if our data is just messed up from errenuou
[20:46:22] <lost1nfound> the documents contain data like: "deviceeventid" : { "$gt" : 27141 } }
[20:48:41] <StephenLynx> yeah, that is an operator.
[20:51:06] <lost1nfound> ah i see what theyre doing... so its a logging table where they're logging the original query along with results, and the $gt was a query condition from the original query. guess this is just a design problem i have to fix and rename all those fields... on that note, anyone know how i could find all fields named "x" and rename them? :)
[20:51:38] <StephenLynx> theres is an operator that checks if a field exists
[20:57:01] <lost1nfound> while im here...;) this security vulnerability that 2.4.13/2.6.8 fixes, is it exploitable if the server isnt exposed anywhere? like could someone insert something into a web form that allows code execution? or just if you send a raw malformed BSON doc to the server?
[20:59:40] <flusive> hi, i have mongod and i don't know what happen but VIRT parametr on htop is red and is equal 105GB what does it mean?
[21:02:08] <flusive> and additionally VIRT value still increase, could someone explain what does mean?
[21:06:27] <GothAlice> lost1nfound: In situations where I need to store $ in field names, I store _S_ instead.
[21:06:39] <GothAlice> Saves much hair-pulling later, as you're noticing. ;)
[21:08:26] <lost1nfound> GothAlice: good idea, thanks. would be simple to translate it back and forth in the code
[21:11:20] <GothAlice> flusive: There are generally two metrics for memory usage of an application. RSS (resident set size) and VSZ (virtual size). RSS is the actual amount of physical memory allocated to the application. VSZ (or VIRT in your listing) is how much has been "promised" to your application. The app sees all of that RAM, but as it accesses chunks of it the kernel "pages" the data in and out of physical RAM on behalf of the app.
[21:12:26] <GothAlice> flusive: MongoDB makes extensive use of a feature called "memory mapped files", i.e. asking the kernel to treat a file on-disk as if it were RAM. In this case the VSZ will always be much larger than the RSS as the kernel is pretty smart about only loading from disk what it needs to complete a request (like "write X bytes to position Y of this file" only needs to load the chunks around position Y.)
[21:13:40] <GothAlice> Basically, learn to love the VSZ. It's RSS that indicates danger. :)
[21:15:43] <GothAlice> flusive: As an aside, for the most part, you *want* htop to be running on the redline for RAM usage. This means RAM is being fully utilized, which is the most efficient. I also ballpark "load average". The optimum load average is roughly 1 for every core you have. 8 cores = optimum LA of 8. (This would indicate 100% usage, and no wasted time. Higher numbers than the number of cores indicate time wasted waiting.)
[21:17:14] <flusive> GothAlice, I have 64GB ram where 2GB its used, rest is cached
[21:17:28] <flusive> I have 12 cores and load is only 1 so one core is used
[21:17:31] <GothAlice> That's glorious. Disk caches make things fast.
[21:17:53] <GothAlice> And yeah, that server has been over-allocated CPU. (It's wasting cycles… and power… by being idle.)
[21:18:47] <flusive> so GothAlice when VIRT parameter will be decrease?
[21:20:11] <GothAlice> But VIRT/VSZ is pretty meaningless. It's simply the size of the "virtual memory" given to the app. That virtual memory might not be in RAM (swap), might be in RAM (hot data), might be on disk (memory mapped files, cold), or might even be shared between processes (FIFO, shmem, etc.)
[21:20:17] <flusive> hmm I don't understand, generally mongo used that VIRT (which is disk space?) for increase performance?
[21:22:42] <GothAlice> MongoDB's virtual memory would include the code and data of the mongod binary (memory mapped), the execution stack, all malloc'd (explicitly allocated) memory areas, all socket shared buffers (these are shared with the kernel, and the kernel shares with the network device itself via DMA), and all memory mapped files (on-disk data files), amongst other goodies.
[21:24:31] <flusive> so It's normal? and everything is ok?
[21:24:44] <GothAlice> flusive: MongoDB uses memory mapped files this way because it greatly simplifies MongoDB's own code. Everything is normal, keep calm and carry on. :)
[21:25:36] <flusive> but why this params still increase?
[21:27:20] <GothAlice> Because MongoDB gets new connections and needs to allocate memory. Or for any of a bajillion other reasons. The key point to take away from this: VSZ/VIRT _isn't real_. It's completely fake. It's the kernel lying to the application and saying, "Yeah, there's 100GB of RAM. Sure. We'll go with that." The kernel then acts as a proxy between the application's fake RAM and real RAM. (The RSZ measurement.)
[21:28:16] <GothAlice> (This, BTW, is why "page faults" are a thing. A page fault is an app asking for a chunk of it's virtual RAM that doesn't exist, and the kernel doesn't know how to handle the bad request, so the app goes *boom*.)
[21:31:03] <joannac> but GothAlice is right about everything else I've read in backscroll so far. so listen to her
[21:31:18] <joannac> and stop worrying about your virtual memory
[21:32:02] <flusive> I'm afraid because I had sharding but disk space was full so I bought new dedicated server with much more space and now I have not mongo shard but only one mongod instance
[21:32:58] <flusive> I still don't understand how mongo store data I did mongodump on old shard and my dump has 110GB but my whole sharding has around 400GB :/
[21:39:30] <flusive> joannac, if you have few minutes could you explain me how much data i have, why so big space is used and what should I will do in the future to solved this
[21:41:45] <joannac> you have 200gb of data and 80gb of indexes
[21:42:08] <joannac> making 280gb of "stuff" and 400gb of storage
[21:42:49] <flusive> so why when i did mongodump my dump weight 110GB?
[21:43:03] <joannac> it's probably a combination of fragmentation and orphaned documents
[21:43:15] <flusive> so generally 120GB is empty store?
[21:47:49] <arduinoob> If I have a mongod reading and writing to the database on server A, but the actual database files are stored on a shared filesystem, can I use mongo client to read the database on a different server?
[21:48:48] <arduinoob> or can I have two mongod pointint to the same dbpath at the same time, I'm assuming that's a bad idea
[21:48:55] <GothAlice> flusive: Several reasons one might have more data than expected. There is per-document overhead called "head room" where records can grow into if you update them, so if you never update, that's wasted room. Fields take up room, too, per-document, so longer field names will use more space the more they are used. For this reason I use one- or two-character field names.
[21:49:03] <flusive> joannac, or explain me how can I predict how my fileSize will be increasing?
[21:49:18] <joannac> arduinoob: the second, bad idea. the first, sure
[21:49:49] <joannac> arduinoob: you you have a mongod running on A.example.com:27017, as long as server B can reach A.example.com, you can connect with the mongo shell
[21:50:34] <GothAlice> flusive: Lastly, if you have many indexes, your indexes can rapidly grow your dataset size. Certain types of pre-aggregation can optimize both the headroom issue and the index issue. http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework covers the effects (in terms of DB storage and query performance stats) of different ways of modelling data.
[21:51:12] <arduinoob> joannac: I see, so I'll have to bind to the public interface and allow remote connections to 27017
[21:52:10] <GothAlice> flusive: FTA, the naive approach requires 133MB of data stored in 166MB of disk space, another requires 600MB on disk to store 122MB of data, another requires 55MB to store 39MB of data. Note that all of these are *the same data*!
[21:54:10] <flusive> GothAlice, so It could be problem with database structure?
[21:54:40] <GothAlice> flusive: Certainly; some structures are more efficient for storage than others.
[21:56:26] <flusive> GothAlice, thx I will think about it
[22:02:34] <flusive> so I need to compact each collection separately?
[22:03:26] <joannac> wait, why do you have a different number of collections on each shard?
[22:04:04] <flusive> joannac, I don't know? This sharding was realised automaticly
[22:04:26] <lost1nfound> flusive: You can script it with a .forEach. I can share my script with you if you need
[22:04:49] <flusive> lost1nfound, would be great, thx a lot
[22:06:16] <lost1nfound> flusive: http://pastebin.com/raw.php?i=DNvsL88a :) that's mine. not sure which mongo version you're on, but the "usePowerOf2Sizes" isn't relevant in 2.6 anymore, it just automatically does that. so you can take out the ", usePowerOf2Sizes: true" part.
[22:06:27] <joannac> flusive: in that case I'm dubious about whether your 110gb mongodump actually has all your data
[22:06:59] <lost1nfound> flusive: also note that I pass in "paddingFactor: 1.1" which leaves 10% extra room for records to grow
[22:07:20] <flusive> joannac, I did that using router not shards
[22:08:41] <joannac> flusive: and? the router will only give you data it knows about
[22:09:06] <flusive> lost1nfound, ok thx, how many time and how often do you use that script?
[22:09:36] <flusive> joannac, each writes and reads are realised using router and every datas was always good so why dump could be bad?
[22:10:22] <joannac> because the number of collections on each shard is different
[22:11:54] <lost1nfound> flusive: did it for the first time a couple months ago, ran it again this month; ill probably make it part of monthly regularly-scheduled maintenance. but, see, we had been stuck on 2.2, now on 2.4, and more recent versions of mongo seem to be much more efficient at keeping fragmentation low. so unclear how often it's "necessary" but ill probably do it monthly
[22:12:21] <flusive> mayby because chunks are devided different? I don't know I'm not specialist but I used that data for generating charts and everything was good, so it's possible if i read data normally all is ok but when I used mongodump data are not completly?
[22:13:23] <joannac> flusive: if you have all the data you need, then cool. but your comparison isn't valid anymore
[22:15:00] <flusive> and now Is to late to thinking about it :) lost1nfound thx for that script I will test it tommorow, joannac and GothAlice also thx :)
[22:15:12] <GothAlice> flusive: On our dataset at work (maybe a few million documents total) we've never had to compact. OTOH, our delete operation counter currently reads: 8. (For the two year lifespan of the proejct.)
[22:16:09] <GothAlice> flusive: On my dataset at home (trillions of documents), it's actually grown beyond the level where I can even compact it. (Dataset size * 1.2 > free space. Happens when your dataset is 26 TiB in size…)
[22:17:38] <flusive> GothAlice, I had 4 shard with 120GB of disk space now I buy server with 4TB disk space
[22:18:19] <GothAlice> flusive: Our production DB hosts at work don't have permanent storage. >:D
[22:19:04] <lost1nfound> GothAlice: yeah, we were doing a lot of deletes and updates. (probably architecturally-incorrectly) we're using a couple of collections as queues that push a good amount of messages. so we definitely have seen some frag problems there
[22:19:23] <GothAlice> lost1nfound: I hope you're using a capped collection for those queues.
[22:19:30] <GothAlice> Then your fragmentation on those collections will be literally zero.
[22:19:44] <GothAlice> (Also, no per-document padding on those.)
[22:20:13] <flusive> GothAlice, at my work I had the problem when I have to buy new server and cloud is to expensive for us :(
[22:20:40] <flusive> so its a reason why i'm interesting where is my space :D
[22:21:09] <lost1nfound> GothAlice: I've looked into that, but, the problem there is it's hard to anticipate my max queue length, and I'd hate to lose messages if we get full. so id either have to massively-overallocate the size, or guard against filling it up in the app which isnt ideal. ive sorta been thinking mongo isnt the best choice for the queue specifically, and that maybe we should be using a traditional pubsub servi
[22:21:10] <GothAlice> flusive: If I hosted my personal dataset "in the cloud" it'd cost me $500,000/month. ¬_¬ Even using Amazon EC2, the cost would be enough that I could afford to instead buy three RAID arrays and still have enough money left over to buy a replacement drive for every drive in every array every month…
[22:21:15] <lost1nfound> ce. but thats something ive looked into
[22:22:08] <GothAlice> lost1nfound: For the Batchelor Facebook game we allocated a single 8GB capped collection to store, without loss, all of the activity for one million simultaneous active games. You use http://bsonspec.org/ to do some napkin-calculations on projected storage churn, then allocate the capped collection appropriately.
[22:23:00] <GothAlice> flusive: So I bought the arrays. Paid for themselves in under three months vs. "cloud" hosting it.
[22:23:45] <flusive> and when do you have that array?
[22:24:09] <GothAlice> Ah, spare room in my apartment. Free electricity covered in my rental agreement. ;)
[22:24:10] <lost1nfound> GothAlice: oh wow, thanks so much for that, that might just be our solution. that gives me inspiration to give it a try. so much simpler architecturally if we can just keep it in mongo. we could have monitoring/alerts when it starts filling up and surely have time to respond
[22:24:59] <GothAlice> lost1nfound: Yup! Note that my message bus relates to background distributed task execution, and I use a real collection (not capped) to store the task data itself. Everything in the message bus can be re-constructed from nothing if needed.
[22:25:27] <GothAlice> Thinh: Yeah, the three Drobo 8-something-i arrays go nicely with the three 1U Dell boxen. :3
[22:26:02] <lost1nfound> GothAlice: I see, I see, makes sense. Yeah, we could keep some kind of "queue log" we could reconstruct from, and then rotate it out periodically.
[22:27:51] <GothAlice> flusive: The drobos have their own filesystem based on distributing smaller stripes amongst drives, each formatted ext3, I believe. However I'm technically running ZFS on top of iSCSI on top of that, to allow me to snapshot and export those snapshots to my desktop for offsite backup, amongst other goodies.
[22:28:30] <GothAlice> At work we use moosefs for our distributed /home folders.
[22:29:13] <flusive> and for mongo do you use zfs?
[22:29:43] <GothAlice> (With certain things turned off, like inode compression…)
[22:30:59] <flusive> GothAlice, sounds good :) but it will hard to explain it for my boss :)
[22:31:41] <GothAlice> flusive: Yeah. Also note that my at-home dataset has been growing and in development since 2001 or so. (The earliest data in it originates from September 2000.) It's a bit of an organic mess. ;)
[22:33:44] <GothAlice> And the project is called "Exocortex".
[22:34:37] <flusive> Computer Graphics and Simulation Software?
[22:36:13] <GothAlice> flusive: Nope, a permanent archive of every bit of digital information I have ever touched since the system went operational. Transparent HTTP proxy, ARP MITM listening to all traffic on my home network, etc. Much of it gets filtered out, of course. It's primarily organized as a metadata filesystem with heavy use of arbitrary string tags (and key/value tags) with a neural network linking tags together. (Synonyms, antonyms, etc.) NLP to
[22:36:13] <GothAlice> automatically determine tags from content as it arrives.