pmxbot IRC Log Viewer

[01:38:18] <ailaG> Hi, I'm new to Mongo, learning Meteor. I want to update an entire collection from a JSON file periodically (replace the whole thing). Where do I start reading?

[01:38:45] <joannac> mongoimport

[01:39:18] <ailaG> joannac Thanks. Came across that, wasn't sure till now :)

[01:47:59] <ailaG> Will mongoimport work somehow after I deploy to meteor.com?

[01:48:52] <ailaG> Oh and how scalable is putting my app on meteor.com? It's not huge but it's important and it'll get traffic

[01:49:04] <ailaG> Scratch last one, we're not in #meteor

[01:49:12] <ailaG> It's late and I'm tired :)

[01:55:00] <Jonno_FTW> hi, I have data with a value that is a list of object, how do I query by those inner object?

[02:01:05] <joannac> Jonno_FTW: pastebin a sample document and what you want to find

[02:02:42] <ailaG> What if I just use .update({}, …) instead of mongoimport? Will that work just as well?

[02:03:37] <Boomtime> ailaG: are documents ever deleted? or only updated/inserted?

[02:03:48] <ailaG> Boomtime: They may be deleted

[02:04:14] <Jonno_FTW> joannac: nvm I got it using the $elemMatch operator

[02:04:41] <Boomtime> ailaG: then a series of .update commands would never delete documents

[02:04:52] <ailaG> It's a list of lectures in an event, exported from the CMS that manages them. I'm working on a meteor project that displays them on alternative platforms.

[02:04:57] <Boomtime> ailaG: you can instead drop the collection first, then just do a series of .insert

[02:05:08] <ailaG> I just ran .update({}, {foo: 1}) and the previous data was deleted...

[02:06:01] <Boomtime> ailaG: do you only have one document?

[02:06:52] <Boomtime> .update({},... <- this predicate means "anything" - i.e "update anything"

[02:07:23] <Boomtime> if you only have one document then the match is fine, but if you have two documents.. i wonder which one you'll update?

[02:10:11] <ailaG> Boomtime: No, it was a test…. Gotcha

[02:30:30] <ailaG> Goodnight everyone, thanks for the help.

[02:40:35] <freeone3000> I'm getting the error "replSet not trying to sync from 54.153.95.17:27018, it is vetoed for 218 more seconds". Why would it be vetod? It's my primary.

[03:00:02] <joannac> freeone3000: oplog insufficient? look further up in the logs

[03:01:41] <freeone3000> Ah. Connection refused. Which is odd.

[03:17:25] <freeone3000> joannac: Okay, connection is not refused, error log no longer prints that, but my optime on the secondary is still not increasing. What gives?

[03:59:41] <tylerdmace> Can anyone with mongoose.js experience point me the way on how to change the settings of an existing connection (host, port, user, and pass)

[03:59:50] <tylerdmace> without having to create a whole new connection

[04:00:00] <tylerdmace> is it possible to change those settings and reconnect somehow

[04:06:05] <jaitaiwan> tylerdmace: no matter what it would always be a new connection. What are you trying to achieve?

[04:08:35] <tylerdmace> well I have some express.js middleware that on particular routes, it checks a mongoose connection manager I wrote for an existing database connection associated with the user. If it doesn't find a connection, it creates one. I need to add one more feature in where it checks a setting for a new host, port, user, or password and if any of that information has changed, we need to update the current connection to use those settings

[04:08:50] <tylerdmace> but if a new connection is required, I'll just tackle it that way :) thank you

[06:26:03] <jhonnys> hi

[06:28:13] <joannac> hi

[06:32:21] <Boomtime> hi

[06:44:50] <jaitaiwan> hi

[07:43:42] <mosquito87> Hey. Does someone knows how to use $geoNear with multiple (two) locations?

[07:48:22] <zivix> Use an $and query?

[07:48:29] <zivix> http://docs.mongodb.org/manual/reference/operator/query/and/

[07:49:49] <Boomtime> mosquito87: what result are you expecting?

[07:50:00] <mskalick> Hi,

[07:50:00] <mskalick> "Run db.upgradeCheckAllDBs() to find current keys that violate this limit and correct as appropriate. Preferably, run the test before upgrading; i.e. connect the 2.6 mongo shell to your MongoDB 2.4 database and run the method."

[07:50:28] <mosquito87> I have two addresses ... one pickup and one handover address. Therefore I need two chained "$geoNear"

[07:50:45] <mosquito87> result is an error: Can't canonicalize query: BadValue Too many geoNear expressions

[07:50:47] <mskalick> Is it problem to run upgradeCheckAllDBs() alter upgrade to 2.6? (with running mongod 2.6)

[07:51:49] <joannac> mskalick: sure, I guess. for what purpose?

[07:52:41] <joannac> i presume you mean "after upgrade"

[07:53:06] <Boomtime> mosquito87: again, what result are you expecting? how can you order documents by nearest to two disparate points? in what order would you place points along the route?

[07:53:48] <mosquito87> I have an document with two addresses. Results should be "near" address1 and "near" address2

[07:55:28] <Boomtime> mosquito87: aggregation

[07:55:36] <mosquito87> As an example: Document1 with address1: [8.3314179, 49.5112888], address2: [20.3314179, 49.5112888]. Now I want to be able to find all documents with address1 near [8.33..., 49.511...] AND address2 near [20.331..., 49.511...].

[07:58:52] <Boomtime> mosquito87: i don't think you can do this, you need to perform two wholly independent geoqueries and merge their result-set, the geonear operator can only be used once in a query, or a pipeline

[07:58:57] <Boomtime> http://docs.mongodb.org/manual/reference/operator/aggregation/geoNear/#behavior

[07:59:30] <mosquito87> Yep. I read that. But I expect a bad performance in future ... Is there any "workaround"?

[08:00:18] <mosquito87> Let's say I have a million docs where "address1" fits. Now I check if "address2" fits, too. And the result is that only 2 docs fit. Can't imagine a good performance doing it like that.

[08:00:45] <Boomtime> "fits"

[08:00:48] <Boomtime> perhaps you should define what it is you actually want to achieve

[08:01:15] <Boomtime> it sounds like what you want is not "nearest to these two points" but "points inside this polygon"

[08:01:15] <mosquito87> "fits" = near, so in the radius the user has specified

[08:01:38] <Boomtime> what if the radius is less than the distance between those two points?

[08:01:47] <mosquito87> first address is a pickup address. Second address is a handover address. Both have to be near the addresses of another user.

[08:02:24] <mskalick> joanac: yes, after upgrade. I asked, because in "Compativility Changes in MongoDB2.6" there is "Preferably, run the test before upgrading" but in other documents there are instructions to first upgrade to newer version...

[08:02:35] <mosquito87> let's assume my pickup address is Germany, handover address is USA. If your pickup address is Germany too, but handover address is not USA, but Spain, then the result should be "false"

[08:02:40] <mosquito87> this is what I mean by "fit"

[08:03:07] <Boomtime> mosquito87: what radius would you use for that query?

[08:03:16] <mosquito87> The user can set his radius

[08:03:21] <Boomtime> so tell me

[08:03:30] <Boomtime> pretend you are the user

[08:03:35] <mosquito87> of course the real address isn't just "Germany", but a real address, with street, zip code, etc

[08:03:41] <mosquito87> so I have latitude and longtitude of this address

[08:03:50] <Boomtime> whatever, is your example even remotely possible or not?

[08:04:20] <joannac> mskalick: source? none of our docs should be telling you to upgrade without due dilligence; if they do I'll make sure they get fixed

[08:04:23] <mosquito87> what do you mean by remotely?

[08:04:30] <Boomtime> i don't think you've thought through your requirement - i conjecture that your requirement is absurd, so i want you to give an actual working example

[08:06:15] <mosquito87> User 1: Pickup address is Germany, Handover address is USA. User 2: Pickup address is Germany, Handover address is USA. User 3: Pickup address is Spain, Handover address France. User 4: Pickup address is Japan, Handover address China. Now User 1 wants all addresses which fit to his pickup AND handover address.

[08:06:24] <mosquito87> The result should be the address of user 2.

[08:06:35] <mosquito87> As only user 2 has the "same" addresses.

[08:07:51] <Boomtime> excellent, it is not a single point the user resides at, you have two seperate addresses and two seperate matches

[08:08:00] <mosquito87> yes

[08:08:07] <mosquito87> so two "geoNear" is wrong for that use case?

[08:08:27] <Boomtime> i understand, and your use case is valid

[08:08:58] <Boomtime> but it can't be done... easily... if at all

[08:09:24] <mosquito87> I could first check the pickup address. Then check the handover address.

[08:09:37] <mosquito87> But I think this will result in a terrible performance when having millions of addresses.

[08:10:17] <morenoh149> how should you model a two-way relationship between documents? is there a good way?

[08:10:19] <Boomtime> yes, it is an interesting problem, i may have to ponder it for a while

[08:10:26] <mskalick> joannac: I probably wrongly understood "Package Upgrades¶ If you installed MongoDB from the MongoDB apt or yum repositories, upgrade to 2.6 using the package manager."

[08:10:27] <mosquito87> User 1: Pickup address is Germany, Handover address is USA. User 2: Pickup address is Germany, Handover address is USA. User 3: Pickup address is Germany, Handover address France. User 4: Pickup address is Germany, Handover address China.

[08:10:50] <mosquito87> If I check for the pickup address first, then I will get user 2, user 3 and user 4. Then check handover address. User 2 will be left.

[08:11:05] <mskalick> joannac: So install 2.6 shell, run db.upgradeCheckAllDBs(), fix and install mongod 2.6?

[08:11:43] <mskalick> joannac: different order is unsafe... right?

[08:13:08] <zivix> morenoh149: If you just need a 1:1, create it in one of the documents and query against that when looking at the other type. So e.g. A->B and B. When looking at B query for A where ->B

[08:14:07] <zivix> If you need many-to-many you probably want a join collection that points to A and B. Not sure what your data looks like, though.

[08:14:58] <zivix> mosquito87: when you're querying for proximity you can discard large groups of addresses, right? Can you assign them a score of some kind and use that as a reference?

[08:15:41] <zivix> So consider: I query for address A, based on proximity I can calculate a score (distance, for example). maybe I only am interested in records where score < 50

[08:15:53] <morenoh149> are there any examples of join collections?

[08:15:59] <Boomtime> mosquito87: sorry, i do not see an easy solution, certainly i don't think you can construct a single query to do it

[08:16:13] <mosquito87> @zivix: Could you give an example?

[08:16:22] <mosquito87> @Boomtime: Do you share my concerns about performance?

[08:16:36] <joannac> mskalick: well, the check is to fix problems before you upgrade to 2.6. If you've already upgraded, it's less useful.

[08:16:56] <zivix> morenoh149: http://docs.mongodb.org/master/MongoDB-data-models-guide.pdf

[08:17:21] <Boomtime> mosquito87: i can only assume you mean performance on the client though i think you can ensure it won't be too bad by limiting your result-sets and capping radius

[08:17:34] <zivix> Right.

[08:18:14] <mskalick> joannac: I thought only upgrade (=start mongod), without usage, inserts, ... ?

[08:18:18] <zivix> Let's say you calculate distance in km when you run the first query. Only look at the 50 closest addresses. So your set limit is 50. You can run the second query on another set of 50 and see if there is any overlap.

[08:18:51] <zivix> So if you have individual homes it's going to miss but if you have warehouses or distribution centers you'll probably hit frequently.

[08:19:00] <mosquito87> Address1 can be very close ... but this doesnt mean address2 is close as well

[08:19:11] <zivix> You still want to use it if address 2 is not close?

[08:19:18] <mosquito87> no

[08:19:41] <mosquito87> but I can have millions of documents where address1 is very close. But address2 is very far away from the address2 I want to compare to

[08:20:35] <zivix> How do you know which address 2 you want to compare to if not based on distance?

[08:21:11] <mosquito87> my user has address1 and address2. I want to find all other users, where address1 is near to address1 of user and address2 is near to address2 of user

[08:21:28] <zivix> Right.

[08:21:35] <zivix> So you can score both of those independently, right?

[08:21:42] <mosquito87> yes

[08:22:06] <zivix> I agree that it would be straightforward if you could use two geoNear queries but in absence of that is there another way to score / query for proximity?

[08:22:21] <mosquito87> how would such a score look like?

[08:22:45] <zivix> If you have lat/long stored for example you could create a score based on the difference between address 1 and address 1 vs address 2 and address 2

[08:23:24] <mosquito87> but I have millions of addresses ... so I want to compare both addresses of the user to million other addresses

[08:23:27] <mosquito87> as I have many users

[08:24:05] <zivix> Right

[08:24:26] <mosquito87> User 1: Pickup address is Germany, Handover address is USA. User 2: Pickup address is Germany, Handover address is USA. User 3: Pickup address is Germany, Handover address France. User 4: Pickup address is Germany, Handover address China.

[08:24:30] <mosquito87> That as an example.

[08:24:35] <mosquito87> How would the scores look like?

[08:27:28] <zivix> So in germany to germany. Let's say you're looking at Regensburg and Frankfurt. Regensburg is 51 12 and Frankfurt is 50 9

[08:27:30] <joannac> mskalick: okay, you can run the check now, I guess.

[08:28:21] <zivix> You could do some fancy math for triangles but for simple stuff, abs(51-50) is 1 and abs(12-9) is 3. So your distance score is 4

[08:28:48] <zivix> Repeat for address 2. Let's say score is 3.

[08:29:02] <zivix> Total score is 7, which is overall proximity for both addresses.

[08:29:13] <zivix> Then you sort by lowest score to see who's close by

[08:31:30] <zivix> I think geoNear does some fancy things to use triangle math on a spherical surface so you get actual distance but you might not actually need that level of precision.

[08:32:33] <mosquito87> Problem is that the user can define the precision by defining a radius

[08:32:43] <mosquito87> Could be 10, 100, 500km

[08:33:26] <zivix> That just sets a score threshold

[08:33:58] <markand> hi

[08:33:58] <zivix> So for 500km that's like... 4-5 degrees lat/long

[08:34:01] <mosquito87> Problem with this idea is that I still have to run through all addresses

[08:34:03] <mosquito87> and calculate

[08:34:09] <zivix> You can just discard over a certain distance

[08:34:15] <markand> anyone successfully compiled mongo-c-driver for MinGW ?

[08:34:15] <zivix> True.

[08:34:26] <mosquito87> So I can still just do two near queries

[08:34:32] <mosquito87> first for address1, then for address2

[08:34:58] <zivix> Erm. You can't do that with inline javascript and compute score on the server?

[08:35:07] <zivix> That would save you having to pull 2 giant resultsets down.

[08:36:18] <mosquito87> It's a node.js server ... so I guess I could.

[08:36:23] <zivix> You still have to iterate your entire dataset but I don't think you can escape that unless you do some kind of pre-processing to build an index.

[08:36:27] <mskalick> joannac: Or could it work to define functions from src/mongo/shell/upgrade_check.js in mongo 2.4 shell and run them there?

[08:36:46] <zivix> I meant javascript in mongodb.

[08:36:56] <zivix> You can do an eval as part of your query

[08:37:06] <mosquito87> But again if I have 1 million documents where address 1 is very near ... And only 1 document where address 2 is very near as well ... I still would have to iterate through 1 million docs (where address 1 is very near)

[08:37:56] <zivix> Hm... so for that you might want to use the aggregation piece. Or change the way you query it.

[08:38:34] <zivix> For example, if geoNear can use an index you can scout your potential matches before you run the aggregate.

[08:38:46] <zivix> And potentially say "Yeah there's no match for address 2 so don't bother."

[08:39:39] <zivix> You might still run the aggregate and find that there's no overlap between near address 1 and near address 2, but I think you have to bite the bullet somewhere and run that operation.

[08:39:48] <mosquito87> ok

[08:49:01] <mosquito87> Thanks guys. Especially to zivix and Boomtime. Will have to think about the possible solutions.

[08:49:14] <zivix> Good luck! :)

[09:02:44] <markand> nobody knows for mingw ?

[09:12:14] <stiffler> hello, Im quite new with mongo, but Im just wondering, how deep can I go in documents, for example how many arrays can I have in an array?

[09:12:41] <stiffler> is any limit ?

[09:16:10] <zivix> 100 levels, I think

[09:16:11] <zivix> http://docs.mongodb.org/manual/reference/limits/

[09:17:32] <stiffler> ok thanks

[09:17:42] <stiffler> so Im on third nest so far :)

[09:18:45] <zivix> Plenty of room to grow. ;)

[09:50:37] <guest9998> hello. im using elemmatch to match a sub document in an array. is it possible to only return results which has two matches?

[10:43:12] <markand> nobody knows for mingw ?

[11:01:30] <robopuff> Hi guys, can you help me with this one: http://stackoverflow.com/questions/28740593/mongodb-conditional-aggregate

[11:29:08] <pamp> hi

[11:29:22] <pamp> .\mongorestore.exe --username user --password pwd --db pamp 'C:\Program Files\MongoDB 2.6 Enterprise\bin\dump\pamp'

[11:29:37] <pamp> why i get authentication error inthis operation

[11:30:28] <pamp> this user and password is admin with unrestricted access

[11:30:55] <pamp> my mongod instance is running

[11:35:45] <eirikr> hi you forgot --authenticationDatabase <databasename>

[12:04:28] <pamp> @eirikr thanks

[12:05:58] <pamp> its possible restore the admin db from one server to another, but this second server also has an admin db

[12:06:18] <pamp> i need to merge the older and new admin database

[12:14:14] <eirikr> i think its ok, mongorestore operations are inserts, so if you restore db in a exist db all new documents will inserts

[12:21:04] <StephenLynx> heey, mongo update on ubuntu repos

[12:22:22] <pamp> i already restore all dbs with success, but when restore admin db i get this error

[12:22:24] <pamp> Error creating index admin.system.version: 13 err: "not authorized to create index on admin.system.version"

[12:25:56] <panshulG> Hello people.... I am using findAndModify... and if my query returns multiple documents... will all the returned documents be updated?

[12:33:03] <StephenLynx> 2.6.8 update is out

[13:09:31] <eirikr> @pamp : ok i understand, your user with you want to restore db have "restore" role ? its important.

[13:11:38] <stiffler> hi, I have a problem with returning results of db.collection.find() by function. It says that is undefined but console.log says smth different

[13:11:50] <stiffler> http://pastebin.com/ZA85vyHM

[13:11:52] <stiffler> here is my function

[13:11:59] <stiffler> but return doesnt work

[13:17:49] <iksik> stiffler: well You are not actually returning anything from getTimes function

[13:19:58] <stiffler> what about return body.stops[i].timetable;

[13:20:00] <stiffler> ?

[13:20:32] <iksik> ;]

[13:20:39] <iksik> it returns from exec callback

[13:22:43] <stiffler> oh

[13:22:51] <stiffler> so could you help me to solve this problem?

[13:23:03] <stiffler> i would like to return it after will be fetched

[13:27:54] <iksik> stiffler: You have two choices here - non of them is returning anything from inside of nested function the way You would like to... first solution is to use simple callback for getTimes (like: getTimes: function ( lineNr, stopNr, dirNr, callback ) - where 'callback' function can be called instead your current 'return', like callback(body.stops[i].timetable. Second solution is

[13:27:54] <iksik> to use promises - but You just need to read a bit about them to understand how they work (also it's a bit off topic for this channel)

[13:29:37] <StephenLynx> are you using node or io?

[13:29:41] <stiffler> node

[13:29:46] <stiffler> and that is my problem

[13:29:48] <StephenLynx> i have experience with it, tell me what you need

[13:29:57] <stiffler> i have problem with understanding callback and async functions

[13:30:01] <StephenLynx> oh this

[13:30:15] <StephenLynx> its quite simple. a callback is just a variable that holds a function

[13:30:19] <stiffler> iksik: how callback function shoul look like?

[13:30:25] <StephenLynx> so you can pass this variable around

[13:30:35] <StephenLynx> and just execute it when you are done with your async work.

[13:31:02] <stiffler> so callback should looks like: function callback(value) {return value } ?

[13:31:06] <iksik> stiffler: it's just a function, so it can look however You like

[13:31:30] <StephenLynx> no, you can't work with return. a callback function must pass its values using its parameters.

[13:32:17] <stiffler> http://pastebin.com/GKEzTfcL

[13:32:19] <StephenLynx> you call a function like this lol.doAsync(myValue, function done(error,response){});

[13:32:24] <stiffler> i have tried this but it does not work

[13:32:45] <StephenLynx> dAsync will do a bunch of stuff and pass this second parameter that holds the callback around

[13:32:56] <iksik> stiffler: and how are You calling getTimes?

[13:33:10] <StephenLynx> when doAsync its done it will just call callback(null,'ok');

[13:33:25] <iksik> also, think what happens when if from line 6 is false? ;)

[13:33:27] <StephenLynx> and then the done function will be executed

[13:33:33] <stiffler> Stops.getTimes(lineNr,stopNr,dirNr)

[13:33:51] <iksik> stiffler: you are missing 4'th argument - a callback function ;]

[13:34:02] <stiffler> o

[13:34:10] <stiffler> so what to place there? just callback

[13:34:15] <stiffler> of callback(){} ?

[13:34:22] <iksik> ;D

[13:34:26] <stiffler> *or

[13:34:37] <iksik> Stops.getTimes(lineNr, stopNr, dirNr, function(body) { console.log(body); })

[13:35:09] <stiffler> i have no idea what it will do

[13:35:12] <stiffler> but im gone try it

[13:35:13] <stiffler> waity

[13:36:20] <stiffler> lol

[13:36:22] <stiffler> it works

[13:36:24] <StephenLynx> http://pastebin.com/VPGaPKa7

[13:37:12] <stiffler> StephenLynx: thanks

[13:37:14] <StephenLynx> np

[13:37:16] <stiffler> iksik: and you of'course

[13:37:18] <stiffler> it works

[13:37:28] <stiffler> but i have to thing about this

[13:37:33] <stiffler> coz it something new for me

[13:37:41] <stiffler> i was using php so far

[13:37:48] <stiffler> and I decided to try node

[13:37:55] <iksik> yea, for now... but You need to improve your code to handle all scenarios (your if conditions)... You need to be sure that callback is ALWAYS triggered

[13:38:36] <stiffler> basicly you mean error validation ?

[13:38:39] <stiffler> *errors

[13:38:54] <stiffler> if error retry ?

[13:39:04] <StephenLynx> if thats what you wish

[13:39:17] <StephenLynx> usually I just spit "error" to the user and log it.

[13:39:19] <stiffler> StephenLynx: Im not sure but that how does it look like

[13:39:22] <StephenLynx> want to read my code?

[13:39:29] <stiffler> StephenLynx: sure

[13:39:33] <stiffler> im at work right now

[13:39:38] <stiffler> they will pay me for this

[13:39:39] <stiffler> :)

[13:39:57] <iksik> look, with this example: Stops.getTimes(lineNr, stopNr, dirNr, function(body) { .......this code will never execute if You wont fire callback inside getTimes..... })

[13:40:06] <StephenLynx> https://gitlab.com/mrseth/bck_lynxhub

[13:40:17] <StephenLynx> I follow these guidelines: https://github.com/felixge/node-style-guide

[13:40:29] <flusive> what is the best solution to copy big database from remote mongos to local mongod?

[13:41:14] <StephenLynx> the most important thing about callbacks is keeping in mind it is just a variable that holds a function.

[13:41:33] <stiffler> StephenLynx: your code will be my boilerplate

[13:41:33] <StephenLynx> you don't care about the asynchronous execution.

[13:41:40] <iksik> wow, new gitlab looks a whole lot different :D

[13:41:51] <StephenLynx> the function you just called is that must care about async.

[13:42:00] <stiffler> ok

[13:42:06] <StephenLynx> you just provide a function as one of the parameters.

[13:42:32] <stiffler> ok ill have to remember it, and change a lots of lines in my projects

[13:42:35] <StephenLynx> that will be executed when the function is done

[13:42:56] <StephenLynx> yeah, I did HUGE refactors on my project too when I started following these guidelines.

[13:43:24] <StephenLynx> un-nesting stuff on callbacks

[13:43:35] <stiffler> ill have to as well

[13:43:39] <StephenLynx> trimming functions

[13:43:42] <stiffler> StephenLynx: iksik thanks for help

[13:43:47] <stiffler> i will be back i feel it ;)

[13:43:53] <stiffler> but I have to go now

[13:43:55] <stiffler> :)

[13:44:50] <robopuff> hey guys, can you help me with this task: http://stackoverflow.com/questions/28740593/mongodb-conditional-aggregate

[13:48:43] <stiffler> oh one more thing

[13:48:58] <stiffler> I cant do console.log in callback

[13:49:02] <stiffler> but how to pass it to view

[13:49:04] <stiffler> then

[13:50:32] <StephenLynx> view?

[13:50:42] <StephenLynx> and you can use console.log anywhere.

[13:51:06] <stiffler> im using express

[13:51:10] <StephenLynx> ugh

[13:51:17] <StephenLynx> I avoid it like the plague.

[13:51:20] <stiffler> so I need to do render function in this combat

[13:51:25] <stiffler> callback

[13:51:27] <StephenLynx> no idea, man.

[13:51:32] <stiffler> ah ok

[13:51:35] <stiffler> but why is so bad?

[13:51:49] <StephenLynx> first it eats about 20% of your performance

[13:52:14] <StephenLynx> second it throws vanilla workflow out of the window

[13:52:25] <StephenLynx> third it adds zero functionality

[13:52:58] <StephenLynx> the problem about n 2 is that the regular knowledge is useless when you use something like this.

[13:53:00] <stiffler> so basicly you prefer to use pure node?

[13:53:04] <StephenLynx> yes.

[13:53:21] <StephenLynx> not to mention a fourth point that is breaking MVC architecture.

[13:53:29] <stiffler> is breaking?:>

[13:53:40] <StephenLynx> ofcourse, you are generating the view on the controller.

[13:53:42] <stiffler> I thought that mvc there is pos

[13:53:59] <stiffler> yes I do, as always in mvc

[13:54:08] <StephenLynx> that is not MVC.

[13:54:40] <stiffler> why Im used to manage view in controller

[13:54:42] <StephenLynx> MVC is having always the same view that loads data and then dinamically presents the loaded data.

[13:54:46] <stiffler> pass from model to view

[13:55:20] <StephenLynx> MVC is the user loading always the same HTML and javascript, then contacting the server to load data and the javascript manipulates DOM to present the loaded data.

[13:55:34] <StephenLynx> look at my other project, lynxchan, it works like that.

[13:56:00] <stiffler> always the same view ? what if you page would have different views on each subpage

[13:56:14] <GothAlice> Uhm, well, no, MVC itself has very little to do with the mechanism of content delivery. It doesn't technically require a single loaded presentation layer, that's just one option.

[13:56:15] <stiffler> like contact gallery, about us, catalogue

[13:56:50] <StephenLynx> stiffler you use routing on the server that provides the static files. Like lynxchan does.

[13:57:07] <StephenLynx> GothAlice is more about the controller having knowledge and manipulating objects of the view.

[13:57:29] <stiffler> StephenLynx: Im gona take a look on it later

[13:57:38] <stiffler> but my time is over today

[13:57:39] <stiffler> i gtg

[13:57:40] <StephenLynx> it does it in the boot.js file.

[13:57:47] <stiffler> thanks for help once again

[13:57:48] <stiffler> bye

[13:57:50] <StephenLynx> np

[14:00:40] <GothAlice> StephenLynx: WebCore 2 does MVC via: URL -> dispatch -> controller -> returned value -> view registry lookup -> view, with the controller interfacing with the model, and views usually "rendering" something about a model. I.e. the "view" for a "file-like object" is to stream the file (with proper caching, range matching, etc.) A controller can thus just return one, and the view will fire up and Do The Right Thing™.

[14:01:27] <GothAlice> (The view for a model object when the request has the XHR header is to return the public JSON-serialized version of that model object, as another example.)

[14:03:05] <StephenLynx> it has separate controllers that takes in the content from the main controllers and renders an appropriate view?

[14:03:55] <StephenLynx> lunch, will read when I get back

[14:03:59] <GothAlice> That's possibly one way to look at it, but that's not quite right. The registered views get the request context, yes, but aren't supposed to _do_ things, only transform the value returned by the controller in some way for consumption over HTTP.

[14:10:33] <GothAlice> StephenLynx: https://github.com/marrow/WebCore/blob/rewrite/example/controller.py?ts=4#L23-L28 < example controllers, highlight demonstrates that "endpoints" (not just callable functions) are perfectly valid, with the static() factory creating a callable endpoint that loads and returns file objects from disk.

[14:10:38] <GothAlice> StephenLynx: https://github.com/marrow/WebCore/blob/rewrite/web/ext/base/handler.py?ts=4#L30-L58 is the view to handle those file-like objects. https://github.com/marrow/WebCore/blob/rewrite/web/ext/template/handler.py?ts=4#L11 is the view to handle 2- or 3-tuple (template, data) rendering.

[14:20:45] <hayer> How can I get the name of all fields in a collection? Like document1[field1, field2] document2[field1, field3, fieldY] -- should return field1, field2, field3, fieldY ..

[14:20:55] <GothAlice> hayer: You'll have to use map/reduce for that.

[14:21:33] <hayer> GothAlice: Ah, okey. Just to be sure that I made my self clear. I want the name of the fields, not the values.

[14:21:56] <GothAlice> In your map, emit the field names from each document (may require recursion if you nest fields!) and reduce to the unique set of them.

[14:32:08] <hayer> GothAlice: Thanks! That was actually quite simple after you told me what "attack vector" to use @ problem.

[14:45:06] <jerome-> is there an official way to create a database ?

[14:45:17] <jerome-> (in a mongodb server)

[14:45:31] <jerome-> (without creating a collection in it)

[14:49:59] <markand> nobody knows for mingw ?

[14:50:55] <blaubarschbube> could somebody explain to me why this -> http://pastebin.com/8N7pwh8X works for one node but does not for the other one? in pupped-dashboard it reads: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class default_packages for inventory.hamburg.contentfleet.com on node ...

[14:51:17] <blaubarschbube> oops..

[14:52:45] <GothAlice> jerome-: Create a collection in it is the official way.

[14:53:08] <jerome-> I wonder how robomongo is able to do it...

[14:53:52] <GothAlice> db.keepalive.insert({}); db.keepalive.drop() # Simple enough.

[14:54:32] <GothAlice> Creating indexes on a collection that didn't exist before would also do it. Technically what's happening is the database is created when the first namespace is allocated. (Inserting a record creates an _id namespace, adding an index would create a namespace for that index.)

[14:55:49] <jerome-> well, I will try to do without the tmp collection in mongohub

[14:56:41] <jerome-> (I mean accept that the database can't be created empty, and keep around a virtual database until mongodb creates it)

[14:59:20] <StephenLynx> why do you need to have an empty db in the first place?

[15:00:11] <jerome-> because the usual workflow in mongohub is to create a database before creating collection

[15:01:04] <StephenLynx> mongohub?

[15:01:10] <StephenLynx> what is that?

[15:01:39] <jerome-> just a tool on os x

[15:01:46] <StephenLynx> it sounds bad.

[15:01:53] <StephenLynx> have you tried not using it?

[15:01:54] <jerome-> ok

[15:02:08] <jerome-> thanks

[15:02:12] <jerome-> I'm just trying to do it

[15:11:54] <GothAlice> It's not a very good tool. For years resizing the window would exhibit "we've attached these UX widgets to the wrong side of the window" silliness, even.

[15:12:26] <jerome-> GothAlice try another fork...

[15:12:41] <GothAlice> Or just use an interactive shell like the gods intended. ;)

[15:15:21] <StephenLynx> amen

[15:15:40] <StephenLynx> ctrl+alt+T > mongo is all you actually need.

[15:16:58] <pamp> hi, should I put the bin folder and data in different hd's? or is irrelevant ?

[15:17:36] <pamp> i've data and logs in different drives

[15:17:39] <GothAlice> pamp: For the most part, it won't matter. The binaries will get loaded once on startup.

[15:18:18] <GothAlice> Having logs separate is not just a good idea. (I warehouse logs on a separate set of servers, even, to protect against log loss on catastrophic machine failure.)

[15:18:34] <pamp> ok thanks

[15:19:42] <pamp> at this time i only have a single machine, but in the future I will use sharding

[15:20:10] <pamp> but always with data logs and backups in defferent drives

[15:20:54] <GothAlice> "Do you have backups?" means "I can't fix this." Having a single machine is a bad idea from a data safety perspective, even ignoring high-availability. (Two replicas will give you safety, three gives you reliability.)

[15:22:02] <GothAlice> Sharding itself won't increase the safety of your data. In fact, because statistics multiply, each additional shard you add will roughly halve your mean time between failure.

[15:29:13] <pamp> GothAlice: this machine is in the azure cloud

[15:29:21] <GothAlice> So?

[15:29:29] <GothAlice> Cloud != magic.

[15:31:44] <pamp> they ensure data security, i think

[15:32:16] <pamp> but yes, i will use replication in the future

[15:37:45] <GothAlice> The issue is less about security and more about your data going *pif*. ;)

[15:46:29] <boutell> Hi. Does mongodump support URIs for connection to databases? It looks like it only supports all the old school —user, —password junk?

[15:47:16] <roadrunneratwast> what do people use to model relations between mongodb Schemas? ERD Entity Relationship Diagrams?

[15:47:37] <roadrunneratwast> Has anyone ever created a taxonomy or ontology in Mongo? Examples?

[15:47:54] <GothAlice> boutell: URIs with combined username/passwords in them can be tricky to parse due to the potential for a duplicated colon.

[15:48:34] <GothAlice> roadrunneratwast: Yes, though generally storing graphs is better done in a dedicated graph database, since MongoDB has no concept of joins, and any branch traversal would effectively require multiple roundtrips—not very efficient.

[15:49:29] <GothAlice> roadrunneratwast: https://gist.github.com/amcgregor/4361bbd8f16d80a44387 is my taxonomy model mix-in for the MongoEngine ODM.

[15:49:58] <GothAlice> (Stores immediate parent, list of all parents, coalesced path, and numeric order and has all the management methods needed to maintain that structure.)

[15:50:09] <GothAlice> (Following jQuery's DOM manipulation API.)

[15:51:59] <StephenLynx> roadrunneratwast they can use several tools that mimics relations. including field references, but in pratice they just perform aditional queries.

[15:52:21] <GothAlice> roadrunneratwast: See also: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html

[15:52:54] <StephenLynx> said that

[15:52:57] <boutell> GothAlice: that sounds painful, but the mongo client supports URIs, why wouldn’t mongodump have the same options? Are URIs deprecated now?

[15:53:27] <StephenLynx> it is possible to use these fake relations if your application won't need to perform multiple queries because of them.

[15:53:48] <GothAlice> boutell: No, it's just that the command-line tools are discrete tools, not exposed API. They have no particular requirement to behave the same way, and don't mostly for historical reasons, as lame as that can be.

[15:54:28] <StephenLynx> lets say you have an entity with a list of something. because of reasons, having a subarray with this is does not work very well. so you make a separate collection that holds these objects, each object containing a field that indicates the parent entity.

[15:54:45] <StephenLynx> and you can gather all you need from this collection with a single query.

[15:54:54] <boutell> GothAlice: I’m whinging because we standardized on configuring URIs, which seemed to be the modern thing, and one prefers not to start parsing URIs with bash just to write a dump/restore script

[15:55:03] <GothAlice> By storing so many different references, my taxonomy model is able to answer many different types of queries without recursion. I.e. all parents of document X, all descendants of document Y, siblings of document Z in the correct order, etc.

[15:55:54] <StephenLynx> but lets say you would have to perform one query for each parent object, or one query for each object, or have to assemble data from both in your application.

[15:55:56] <GothAlice> My favourite: give me the deepest document matching a given path.

[15:56:15] <StephenLynx> that's when you should avoid using these fake relations.

[15:56:24] <boutell> yes, materialized path + rank + depth covers a lot of ground.

[15:57:10] <GothAlice> boutell: Formerly this structure was a combined nested set + adjacency list. But the overhead of updating left/right references (potentially touching every document in the taxonomy…) became too extreme.

[15:58:16] <boutell> GothAlice: never mind the race conditions.

[15:58:21] <GothAlice> Yeeeah.

[15:58:24] <boutell> *brr*

[15:58:32] <GothAlice> Those are mostly resolved naturally by atomic increments.

[15:58:46] <GothAlice> I.e. it won't matter if two operations interleave, the end result will be the same regardless of the order of operations.

[15:58:47] <boutell> I used a nested set model in a SQL driven CMS site once. One day the page tree came unmoored from Earth’s orbit and floated into the sun.

[15:59:10] <boutell> after that we stopped trusting the library we were using and added beacoup locking logic

[15:59:36] <GothAlice> Heh. Yeah, I needed a "defragmenter" that would run the adjacency list and rebuild the nested set data. Sometimes things got a little stuck…

[15:59:41] <boutell> but, in the next generation of our CMS, we used the materialized path model where this is not a thing. The worst case is two people insert a page simultaneously, and get the same rank among their peers, which does nothing terrible.

[15:59:58] <boutell> I recall writing that rescue task too

[16:00:30] <boutell> I do, however, still wonder if there’s a way to address that issue of two peers getting the same rank without locks.

[16:00:53] <boutell> you mentioned increments. It occurs to me that the parent page could have a nextChildRank property

[16:00:55] <boutell> and use $inc

[16:01:11] <GothAlice> My CMS model, v1: https://gist.github.com/amcgregor/ee96bbaf2ef023aa235f#file-contentment-v0-py-L110-L114 (https://gist.github.com/amcgregor/ee96bbaf2ef023aa235f#file-contentment-v0-py-L172-L267 being the attachment code—I couldn't think of a better name for a function opening/closing holes in the left/right structure than "stargate")

[16:01:15] <StephenLynx> hey, is it possible to only project a field if another field is set to false?

[16:01:40] <GothAlice> Contentment v2 model: https://github.com/marrow/contentment/blob/develop/web/extras/contentment/components/asset/model/__init__.py#L58-L63 (https://github.com/marrow/contentment/blob/develop/web/extras/contentment/components/asset/model/__init__.py#L153-L182 attachment code)

[16:01:41] <GothAlice> StephenLynx: Not AFIK.

[16:01:59] <GothAlice> boutell: And that taxonomy I linked first is part of the v3 model factored out like it should be. ;)

[16:02:18] <StephenLynx> hm, thats unfortunate. because I have this field on posts on my forum that dictates if the post is anonymous. what I have is to not project the name if this field is set to true.

[16:02:25] <StephenLynx> currently I do this with application code.

[16:02:34] <GothAlice> StephenLynx: Ah, you have data security auditing requirements.

[16:02:54] <GothAlice> StephenLynx: http://docs.mongodb.org/manual/reference/operator/aggregation/redact/

[16:03:26] <GothAlice> I keep forgetting about things only added in 2.6… clearly I've been using MongoDB for too long. ;)

[16:07:33] <GothAlice> boutell: Hope I didn't overload you, there. ^_^

[16:08:35] <hashpuppy> i'm setting up a new replica set. 2 databases + 1 arbiter. i've configured the replica set. i now see mongodb1 as primary and mongodb2 as secondary. but mongoarbiter is status UNKOWN (and after restarting stuck at status DOWN) with "still initializing". When i log into that server and rs.status() i see this message ""loading local.system.replset config (LOADINGCONFIG)" w/ startupStatus 1

[16:09:01] <hashpuppy> what did i do wrong? or how can i get the arbiter running in that replica set

[16:09:47] <GothAlice> hashpuppy: Might be simple enough to nuke your existing arbiter completely (and remove it from the set on the primary), then re-add it using http://docs.mongodb.org/manual/tutorial/add-replica-set-arbiter/

[16:09:57] <GothAlice> s/re-add/re-create

[16:11:17] <boutell> GothAlice: interested, just have to focus on something else for a few mins

[16:12:21] <StephenLynx> it seems redact can only remove documents?

[16:13:20] <GothAlice> No?

[16:14:16] <StephenLynx> "The argument can be any valid expression as long as it resolves to $$DESCEND, $$PRUNE, or $$KEEP system variables. For more information on expressions, see Expressions."

[16:14:27] <StephenLynx> then you read these variables

[16:14:36] <StephenLynx> " $redact returns the fields"

[16:14:42] <StephenLynx> " $redact excludes all fields"

[16:14:47] <StephenLynx> " $redact returns or keeps all fields"

[16:14:59] <GothAlice> One of the examples demonstrates redacting individual members of a list of sub-documents, the next demonstrates nuking subsets of fields in general, and the see more link is a complete tutorial on field-level redaction.

[16:19:22] <StephenLynx> I still can't find an example where you can omit a top level field on a document.

[16:19:33] <hashpuppy> GothAlice: that didn't seem to work

[16:21:40] <GothAlice> StephenLynx: A simpler approach may be an expression in your projection.

[16:22:01] <StephenLynx> I agree, and I was thinking about that.

[16:22:31] <StephenLynx> will try to use $cond

[16:22:36] <GothAlice> {$project: {author: {$cmp: ['$anonymous', true]}}}

[16:23:09] <hashpuppy> is there a way to view the replSet it was configured with

[16:23:22] <hashpuppy> just want to make sure it's actually picking it up

[16:23:28] <StephenLynx> $cmp, haven't heard about that operator, gonna look into it.

[16:23:56] <GothAlice> StephenLynx: $cmp might not be right in this situation.

[16:24:13] <GothAlice> $eq is likely a better choice

[16:27:48] <StephenLynx> {$project:{_id:0,name:{$eq:['$project',true]}}} is returning me if project is true or false.

[16:28:29] <GothAlice> You have a field called "project" in that document?

[16:28:39] <StephenLynx> just for test purposes.

[16:28:50] <StephenLynx> on my project the field is called "anon"

[16:28:56] <StephenLynx> if anon is true I don't project "name"

[16:30:36] <StephenLynx> the project block I pasted returns me the result of $eq

[16:30:50] <StephenLynx> wait

[16:30:58] <StephenLynx> I think I might have to use $cmp

[16:31:07] <StephenLynx> $eq will compare the field itself, in this case name.

[16:32:52] <pamp> what is the best strategy to rename the name of an attribute, here is an example of a document http://dpaste.com/1KPEKH9

[16:33:11] <pamp> i need to rename the array "props" to "P"

[16:33:33] <roadrunneratwast> GothAlice: thanks for the link

[16:33:58] <pamp> i used this method http://dpaste.com/2PK9YJ0

[16:34:14] <GothAlice> pamp: db.foo.update({'orig_dest': {'$exists': True}}, {'$rename': {'orig_dest': 'de'}}, {multi: True})

[16:34:28] <pamp> but this is taking longer than 24 hours

[16:34:37] <GothAlice> Yeah, that approach is kinda nuts.

[16:34:41] <GothAlice> Just $rename the field…

[16:35:26] <pamp> hmm i will try that

[16:36:38] <StephenLynx> I really don't think I can use an expression on the project block for what I want. it just projects the field AS the result of the expression.

[16:36:57] <GothAlice> Hmm. You need the result of the expression to be zero or one…

[16:37:34] <GothAlice> Or… project the value of the author field as the contents of the author field if expr is true, otherwise null.

[16:41:32] <StephenLynx> it just projects the boolean resulting from $eq

[16:41:39] <StephenLynx> so no, being true or false does not suffice.

[16:42:40] <pamp> GotAlice: note that ManagedObjects is an array, and also props

[16:43:09] <pamp> "errmsg" : "cannot use the part (ManagedObjects of ManagedObjects.props) to traverse the element

[16:51:29] <StephenLynx> so far it seems I'm fucked :^)

[16:51:54] <GothAlice> You're not. Use $cond, true value is the value of the field, false value is null.

[16:53:16] <StephenLynx> ah, $cond

[16:53:35] <StephenLynx> It seems that list of query and projection operators will not suffice as reference.

[16:56:31] <StephenLynx> yup, that was exactly what I needed. thanks m8

[17:03:21] <roadrunneratwast> StephenLynx: thanx 4 the lynx

[17:25:18] <pamp> is possible in the {"$project"} generate a new _id??

[17:26:02] <pamp> I am creating a new collection from an aggregation

[17:48:21] <carlosf> hi there, can you help me out?

[17:50:35] <carlosf> is there any recent benchmarks between OrientDB and MongoDB? what are the pros and cons of OrienteDB vs MongoDB?

[17:50:59] <StephenLynx> don't know about orientedb

[17:51:04] <codetroth> You know it drives me nuts. I am looking at a MongoDB statement and clearly see fields of _id, from, and mytype.

[17:51:18] <codetroth> I can get data from every value except from which comes back as undefined

[17:51:22] <StephenLynx> but mongo is known for performance and being easy to include additional servers in a cluster.

[17:52:13] <codetroth> is the word from a reserved word in Mongo at all?

[17:52:15] <boutell> carlosf: MongoDB is widely used and supported. If nobody here has heard of / used OrienteDB, and you don’t feel up to evaluating them yourself, you probably want to use MongoDB, and get back to worrying about your actual project.

[17:52:35] <StephenLynx> codetroth print a find() and show me

[17:52:39] <boutell> codetroth: I don’t think there are any magic words in mongo that don’t start with a $

[17:53:59] <StephenLynx> orient seems to be useful for graphs

[17:54:01] <codetroth> Getting to together

[17:54:02] <boutell> carlosf: to be slightly more helpful… skimming the homepage tells me orientdb is a graph database. If your data is actually a connected graph, that could be a win for you, maybe. It’s not the only graph database.

[17:54:44] <codetroth> http://pastebin.com/XupQhQfi

[17:54:55] <codetroth> That is the output from within my program and from the mongo console

[17:55:11] <StephenLynx> orient does not seems to be FOSS

[17:55:33] <StephenLynx> at least it is open source

[17:56:37] <StephenLynx> codetroth ok, everything seens to be alright with mytype field.

[17:56:46] <StephenLynx> what is your issue with it?

[17:57:01] <codetroth> the from field

[17:57:15] <codetroth> When I run console.log(results[0].from it is undefined

[17:57:34] <StephenLynx> it doesn't seem so, from your pastebin

[17:57:39] <codetroth> I know

[17:57:42] <StephenLynx> so

[17:58:01] <StephenLynx> I mean, in the first paste, the one you say it is from your application code

[17:58:06] <StephenLynx> it has "from"

[17:58:11] <codetroth> Yep

[17:58:20] <StephenLynx> so when is it undefined?

[17:58:21] <codetroth> Thats when I do a full dump of my database que results

[17:58:27] <StephenLynx> ok

[17:58:27] <codetroth> when I just try to call from on its own

[17:58:33] <StephenLynx> show me your code.

[18:00:29] <codetroth> http://pastebin.com/JVedGQTX

[18:00:31] <codetroth> There

[18:00:39] <codetroth> I also included the output when I run that

[18:02:04] <StephenLynx> I believe it is happening because find in node return a list of pointers, not objects.

[18:02:25] <StephenLynx> the tostring function of this pointer prints the object

[18:02:51] <StephenLynx> but when you try to access from, it looks for the from property of the pointer, not the object the pointer is pointing to.

[18:03:04] <StephenLynx> but I don't use find, so I might be wrong.

[18:03:48] <codetroth> I am going to try changing the field name and see if that helps

[18:04:03] <StephenLynx> it won't.

[18:04:43] <StephenLynx> if that was an object, you would have the value of the from field with that code.

[18:04:50] <StephenLynx> try using aggregate.

[18:04:59] <StephenLynx> it returns an actual array.

[18:06:15] <StephenLynx> http://mongodb.github.io/node-mongodb-native/markdown-docs/queries.html#cursors

[18:06:21] <StephenLynx> cursor is the right term.

[18:06:26] <StephenLynx> what you have there is a cursor.

[18:06:38] <codetroth> Ok and I was just checking my code and I have a portion that handles phone calls which works fine having a from field

[18:06:41] <codetroth> It sjust this one place

[18:06:43] <StephenLynx> make an experiment and print the property count.

[18:06:57] <StephenLynx> it should print the function code.

[18:07:18] <StephenLynx> when you handle phone calls, do you use find or other function to retrieve data

[18:08:00] <codetroth> I will check in a moment trying your solution really quick

[18:15:55] <codetroth> found the problem

[18:16:24] <StephenLynx> what was it?

[18:16:46] <codetroth> I am using Mongoose for my DB interactions and I have a Schema defined. I had the wrong name in the Schema

[18:16:54] <StephenLynx> lol mongoose

[18:17:16] <StephenLynx> I avoid it like the plague.

[18:17:35] <codetroth> First time really work indepth with node or mongo and I am begining to feel the same way

[18:17:41] <codetroth> This project will be the last time I use it

[18:18:02] <codetroth> I already dumped the twilio library in favor of using their rest api directly

[18:20:12] <StephenLynx> by default I don't add a dependency to a project.

[18:20:44] <StephenLynx> my first question is "why do I need it " instead of "why would I not use it"

[18:21:14] <StephenLynx> dependencies I have used so far in node/io: db drivers, bcrypt, nodemailer

[18:21:24] <StephenLynx> they do ONE thing and do it well.

[19:44:21] <nobody18188181> how can i change mongod settings without restarting the service?

[19:51:30] <nobody18188181> or, is there a way to force mongo to reload the config file?

[19:56:45] <nobody18188181> doesnt look like it: https://github.com/mongodb/mongo/blob/82daa0725d7f26bd7ccaf7e4280932ad548f549c/src/mongo/util/signal_handlers.cpp

[20:20:47] <tera_> Having trouble with a scenario. I have a document (employee) that has an array of objects (salary) that contain historical values for their salary. I want to select the employee and their current salary (just one item in that array). I used $elemMatch on the projection and successfully get just the current salary but I lose all other fields on the document unless I go and explicitly set them to 1. According to an error it is not possible to remove items

[20:20:47] <tera_> with the positional notation {"salaries.$": 0}. So it seems the only way to get the rest of my document is to know all the possible fields and set them to on?

[20:25:57] <StephenLynx> yes.

[20:26:24] <StephenLynx> if you use projection to say "I want to project this", mongo does not projects anything else that you don't tell it to project too.

[20:26:58] <GothAlice> StephenLynx: Unless one is using Mongoose, then all hell can break loose. (It'll automatically include sub-documents…)

[20:27:21] <StephenLynx> lol mongoose

[20:34:42] <tera_> heh

[20:36:30] <tera_> I figured that was the case with projection but is there an equivalent projection like {"salaries.$": 0} where it will omit non-matches and include the rest of the document? Enumerating all possible fields on the document would be tedious at best.

[20:36:46] <tera_> Or perhaps a "work around" other then turning all the other fields on

[20:36:52] <StephenLynx> don't think so.

[20:37:01] <StephenLynx> oh

[20:37:14] <StephenLynx> you can tell mongo to don't project stuff, yes

[20:37:26] <StephenLynx> and then it will project everything you don't tell him about.

[20:37:59] <tera_> Yep thats why I tried the {"salaries.$": 0} but I get an error "Cannot exclude array elements with the positional operator (currently unsupported)."

[20:38:08] <StephenLynx> :^)

[20:38:13] <StephenLynx> the joys of subarrays

[20:38:23] <StephenLynx> I suspect you might want to make salaries a separate collection.

[20:39:51] <tera_> I'm not sure what the structure of that would look like. The current array of salaries contains the fromdate, todate, and salary amount

[20:40:22] <StephenLynx> first of all you would have to put a field to identify the person it belongs to.

[20:40:31] <StephenLynx> then the data it already contains.

[20:42:26] <tera_> Yea but I'm back to the same problem. I'm thinking perhaps put all the historical in a salaries collection and keep the current with the actual employee document

[20:42:47] <tera_> Then there is no reason to project on an array with the employees document

[20:45:08] <lost1nfound> hey guys, hopefully this is the right place and if not im sorry, but, im attempting to upgrade my 2.4.12 development instances to 2.6.8, and when I connect to it with a 2.6 client and run the db.upgradeCheckAllDBs() command, I get lots of "DollarPrefixedFieldName: $gt is not valid for storage." type of errors. not sure if its something im misunderstanding or if our data is just messed up from errenuou

[20:45:14] <lost1nfound> s inserts

[20:46:22] <lost1nfound> the documents contain data like: "deviceeventid" : { "$gt" : 27141 } }

[20:48:41] <StephenLynx> yeah, that is an operator.

[20:51:06] <lost1nfound> ah i see what theyre doing... so its a logging table where they're logging the original query along with results, and the $gt was a query condition from the original query. guess this is just a design problem i have to fix and rename all those fields... on that note, anyone know how i could find all fields named "x" and rename them? :)

[20:51:38] <StephenLynx> theres is an operator that checks if a field exists

[20:51:45] <StephenLynx> i guess its $exists

[20:52:51] <lost1nfound> oh ok cool thanks! so I can just see if it exists, create a properly-named field with the data, then delete the $gt field

[20:53:28] <lost1nfound> oh or $rename :)

[20:57:01] <lost1nfound> while im here...;) this security vulnerability that 2.4.13/2.6.8 fixes, is it exploitable if the server isnt exposed anywhere? like could someone insert something into a web form that allows code execution? or just if you send a raw malformed BSON doc to the server?

[20:59:40] <flusive> hi, i have mongod and i don't know what happen but VIRT parametr on htop is red and is equal 105GB what does it mean?

[21:02:08] <flusive> and additionally VIRT value still increase, could someone explain what does mean?

[21:06:27] <GothAlice> lost1nfound: In situations where I need to store $ in field names, I store _S_ instead.

[21:06:39] <GothAlice> Saves much hair-pulling later, as you're noticing. ;)

[21:08:26] <lost1nfound> GothAlice: good idea, thanks. would be simple to translate it back and forth in the code

[21:11:20] <GothAlice> flusive: There are generally two metrics for memory usage of an application. RSS (resident set size) and VSZ (virtual size). RSS is the actual amount of physical memory allocated to the application. VSZ (or VIRT in your listing) is how much has been "promised" to your application. The app sees all of that RAM, but as it accesses chunks of it the kernel "pages" the data in and out of physical RAM on behalf of the app.

[21:12:26] <GothAlice> flusive: MongoDB makes extensive use of a feature called "memory mapped files", i.e. asking the kernel to treat a file on-disk as if it were RAM. In this case the VSZ will always be much larger than the RSS as the kernel is pretty smart about only loading from disk what it needs to complete a request (like "write X bytes to position Y of this file" only needs to load the chunks around position Y.)

[21:13:40] <GothAlice> Basically, learn to love the VSZ. It's RSS that indicates danger. :)

[21:15:43] <GothAlice> flusive: As an aside, for the most part, you *want* htop to be running on the redline for RAM usage. This means RAM is being fully utilized, which is the most efficient. I also ballpark "load average". The optimum load average is roughly 1 for every core you have. 8 cores = optimum LA of 8. (This would indicate 100% usage, and no wasted time. Higher numbers than the number of cores indicate time wasted waiting.)

[21:17:14] <flusive> GothAlice, I have 64GB ram where 2GB its used, rest is cached

[21:17:28] <flusive> I have 12 cores and load is only 1 so one core is used

[21:17:31] <GothAlice> That's glorious. Disk caches make things fast.

[21:17:53] <GothAlice> And yeah, that server has been over-allocated CPU. (It's wasting cycles… and power… by being idle.)

[21:18:47] <flusive> so GothAlice when VIRT parameter will be decrease?

[21:19:01] <GothAlice> Well, sometimes never.

[21:20:11] <GothAlice> But VIRT/VSZ is pretty meaningless. It's simply the size of the "virtual memory" given to the app. That virtual memory might not be in RAM (swap), might be in RAM (hot data), might be on disk (memory mapped files, cold), or might even be shared between processes (FIFO, shmem, etc.)

[21:20:17] <flusive> hmm I don't understand, generally mongo used that VIRT (which is disk space?) for increase performance?

[21:22:42] <GothAlice> MongoDB's virtual memory would include the code and data of the mongod binary (memory mapped), the execution stack, all malloc'd (explicitly allocated) memory areas, all socket shared buffers (these are shared with the kernel, and the kernel shares with the network device itself via DMA), and all memory mapped files (on-disk data files), amongst other goodies.

[21:24:31] <flusive> so It's normal? and everything is ok?

[21:24:44] <GothAlice> flusive: MongoDB uses memory mapped files this way because it greatly simplifies MongoDB's own code. Everything is normal, keep calm and carry on. :)

[21:25:36] <flusive> but why this params still increase?

[21:27:20] <GothAlice> Because MongoDB gets new connections and needs to allocate memory. Or for any of a bajillion other reasons. The key point to take away from this: VSZ/VIRT _isn't real_. It's completely fake. It's the kernel lying to the application and saying, "Yeah, there's 100GB of RAM. Sure. We'll go with that." The kernel then acts as a proxy between the application's fake RAM and real RAM. (The RSZ measurement.)

[21:28:16] <GothAlice> (This, BTW, is why "page faults" are a thing. A page fault is an app asking for a chunk of it's virtual RAM that doesn't exist, and the kernel doesn't know how to handle the bad request, so the app goes *boom*.)

[21:29:24] <joannac> the app does not explode :p

[21:30:09] <flusive> :P

[21:31:03] <joannac> but GothAlice is right about everything else I've read in backscroll so far. so listen to her

[21:31:18] <joannac> and stop worrying about your virtual memory

[21:32:02] <flusive> I'm afraid because I had sharding but disk space was full so I bought new dedicated server with much more space and now I have not mongo shard but only one mongod instance

[21:32:58] <flusive> I still don't understand how mongo store data I did mongodump on old shard and my dump has 110GB but my whole sharding has around 400GB :/

[21:33:16] <joannac> fragmentation?

[21:33:17] <flusive> but if everything is ok so good to know :)

[21:33:32] <flusive> joannac, maybe but why so big?

[21:34:07] <flusive> i haven't updates on collections only add new objects

[21:34:23] <flusive> and writes are realised second by second generally

[21:35:09] <joannac> multiple databases?

[21:35:18] <flusive> no only one database

[21:35:22] <joannac> i dunno, run db.stats()

[21:36:41] <flusive> ok one sec

[21:38:25] <flusive> http://pastie.org/9985874

[21:39:30] <flusive> joannac, if you have few minutes could you explain me how much data i have, why so big space is used and what should I will do in the future to solved this

[21:41:45] <joannac> you have 200gb of data and 80gb of indexes

[21:42:08] <joannac> making 280gb of "stuff" and 400gb of storage

[21:42:49] <flusive> so why when i did mongodump my dump weight 110GB?

[21:43:03] <joannac> it's probably a combination of fragmentation and orphaned documents

[21:43:15] <flusive> so generally 120GB is empty store?

[21:43:18] <flusive> storage

[21:43:45] <flusive> what is the reason? its problem with inserts second by second?

[21:44:31] <joannac> http://docs.mongodb.org/manual/faq/storage/#why-are-the-files-in-my-data-directory-larger-than-the-data-in-my-database

[21:46:09] <flusive> i red it hundrets time, I should do sometimes repairdatabasE?

[21:46:53] <flusive> read

[21:47:49] <arduinoob> If I have a mongod reading and writing to the database on server A, but the actual database files are stored on a shared filesystem, can I use mongo client to read the database on a different server?

[21:48:48] <arduinoob> or can I have two mongod pointint to the same dbpath at the same time, I'm assuming that's a bad idea

[21:48:55] <GothAlice> flusive: Several reasons one might have more data than expected. There is per-document overhead called "head room" where records can grow into if you update them, so if you never update, that's wasted room. Fields take up room, too, per-document, so longer field names will use more space the more they are used. For this reason I use one- or two-character field names.

[21:49:03] <flusive> joannac, or explain me how can I predict how my fileSize will be increasing?

[21:49:18] <joannac> arduinoob: the second, bad idea. the first, sure

[21:49:49] <joannac> arduinoob: you you have a mongod running on A.example.com:27017, as long as server B can reach A.example.com, you can connect with the mongo shell

[21:50:34] <GothAlice> flusive: Lastly, if you have many indexes, your indexes can rapidly grow your dataset size. Certain types of pre-aggregation can optimize both the headroom issue and the index issue. http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework covers the effects (in terms of DB storage and query performance stats) of different ways of modelling data.

[21:51:12] <arduinoob> joannac: I see, so I'll have to bind to the public interface and allow remote connections to 27017

[21:51:20] <joannac> yes

[21:52:10] <GothAlice> flusive: FTA, the naive approach requires 133MB of data stored in 166MB of disk space, another requires 600MB on disk to store 122MB of data, another requires 55MB to store 39MB of data. Note that all of these are *the same data*!

[21:54:10] <flusive> GothAlice, so It could be problem with database structure?

[21:54:40] <GothAlice> flusive: Certainly; some structures are more efficient for storage than others.

[21:56:26] <flusive> GothAlice, thx I will think about it

[21:56:42] <flusive> thx for that link

[21:56:59] <GothAlice> It never hurts to help. :)

[21:59:33] <flusive> :)

[22:00:11] <flusive> I have to do some test with different structures and I will show how it looks, thx one more :)

[22:00:54] <flusive> One more question, to defragmentation I have to use repairDatabase?

[22:01:02] <lost1nfound> someone should update topic for .8/.13 ;)

[22:01:05] <GothAlice> flusive: Compact, not repair.

[22:01:20] <GothAlice> flusive: http://docs.mongodb.org/manual/reference/command/compact/

[22:02:34] <flusive> so I need to compact each collection separately?

[22:03:26] <joannac> wait, why do you have a different number of collections on each shard?

[22:04:04] <flusive> joannac, I don't know? This sharding was realised automaticly

[22:04:26] <lost1nfound> flusive: You can script it with a .forEach. I can share my script with you if you need

[22:04:49] <flusive> lost1nfound, would be great, thx a lot

[22:06:16] <lost1nfound> flusive: http://pastebin.com/raw.php?i=DNvsL88a :) that's mine. not sure which mongo version you're on, but the "usePowerOf2Sizes" isn't relevant in 2.6 anymore, it just automatically does that. so you can take out the ", usePowerOf2Sizes: true" part.

[22:06:27] <joannac> flusive: in that case I'm dubious about whether your 110gb mongodump actually has all your data

[22:06:59] <lost1nfound> flusive: also note that I pass in "paddingFactor: 1.1" which leaves 10% extra room for records to grow

[22:07:04] <lost1nfound> but thats optional too

[22:07:20] <flusive> joannac, I did that using router not shards

[22:08:41] <joannac> flusive: and? the router will only give you data it knows about

[22:09:06] <flusive> lost1nfound, ok thx, how many time and how often do you use that script?

[22:09:36] <flusive> joannac, each writes and reads are realised using router and every datas was always good so why dump could be bad?

[22:10:22] <joannac> because the number of collections on each shard is different

[22:11:54] <lost1nfound> flusive: did it for the first time a couple months ago, ran it again this month; ill probably make it part of monthly regularly-scheduled maintenance. but, see, we had been stuck on 2.2, now on 2.4, and more recent versions of mongo seem to be much more efficient at keeping fragmentation low. so unclear how often it's "necessary" but ill probably do it monthly

[22:12:21] <flusive> mayby because chunks are devided different? I don't know I'm not specialist but I used that data for generating charts and everything was good, so it's possible if i read data normally all is ok but when I used mongodump data are not completly?

[22:13:23] <joannac> flusive: if you have all the data you need, then cool. but your comparison isn't valid anymore

[22:14:08] <flusive> ehm It's strange for me :/

[22:15:00] <flusive> and now Is to late to thinking about it :) lost1nfound thx for that script I will test it tommorow, joannac and GothAlice also thx :)

[22:15:12] <GothAlice> flusive: On our dataset at work (maybe a few million documents total) we've never had to compact. OTOH, our delete operation counter currently reads: 8. (For the two year lifespan of the proejct.)

[22:16:09] <GothAlice> flusive: On my dataset at home (trillions of documents), it's actually grown beyond the level where I can even compact it. (Dataset size * 1.2 > free space. Happens when your dataset is 26 TiB in size…)

[22:17:32] <joannac> get more disks!

[22:17:38] <flusive> GothAlice, I had 4 shard with 120GB of disk space now I buy server with 4TB disk space

[22:18:19] <GothAlice> flusive: Our production DB hosts at work don't have permanent storage. >:D

[22:19:04] <lost1nfound> GothAlice: yeah, we were doing a lot of deletes and updates. (probably architecturally-incorrectly) we're using a couple of collections as queues that push a good amount of messages. so we definitely have seen some frag problems there

[22:19:23] <GothAlice> lost1nfound: I hope you're using a capped collection for those queues.

[22:19:30] <GothAlice> Then your fragmentation on those collections will be literally zero.

[22:19:44] <GothAlice> (Also, no per-document padding on those.)

[22:20:13] <flusive> GothAlice, at my work I had the problem when I have to buy new server and cloud is to expensive for us :(

[22:20:40] <flusive> so its a reason why i'm interesting where is my space :D

[22:21:09] <lost1nfound> GothAlice: I've looked into that, but, the problem there is it's hard to anticipate my max queue length, and I'd hate to lose messages if we get full. so id either have to massively-overallocate the size, or guard against filling it up in the app which isnt ideal. ive sorta been thinking mongo isnt the best choice for the queue specifically, and that maybe we should be using a traditional pubsub servi

[22:21:10] <GothAlice> flusive: If I hosted my personal dataset "in the cloud" it'd cost me $500,000/month. ¬_¬ Even using Amazon EC2, the cost would be enough that I could afford to instead buy three RAID arrays and still have enough money left over to buy a replacement drive for every drive in every array every month…

[22:21:15] <lost1nfound> ce. but thats something ive looked into

[22:22:08] <GothAlice> lost1nfound: For the Batchelor Facebook game we allocated a single 8GB capped collection to store, without loss, all of the activity for one million simultaneous active games. You use http://bsonspec.org/ to do some napkin-calculations on projected storage churn, then allocate the capped collection appropriately.

[22:23:00] <GothAlice> flusive: So I bought the arrays. Paid for themselves in under three months vs. "cloud" hosting it.

[22:23:45] <flusive> and when do you have that array?

[22:23:55] <GothAlice> flusive: When?

[22:23:55] <flusive> at home? at work building?

[22:23:59] <flusive> where

[22:24:00] <flusive> sory

[22:24:09] <GothAlice> Ah, spare room in my apartment. Free electricity covered in my rental agreement. ;)

[22:24:10] <lost1nfound> GothAlice: oh wow, thanks so much for that, that might just be our solution. that gives me inspiration to give it a try. so much simpler architecturally if we can just keep it in mongo. we could have monitoring/alerts when it starts filling up and surely have time to respond

[22:24:37] <Thinh> GothAlice: LOL

[22:24:59] <GothAlice> lost1nfound: Yup! Note that my message bus relates to background distributed task execution, and I use a real collection (not capped) to store the task data itself. Everything in the message bus can be re-constructed from nothing if needed.

[22:25:27] <GothAlice> Thinh: Yeah, the three Drobo 8-something-i arrays go nicely with the three 1U Dell boxen. :3

[22:25:44] <Thinh> Hows the SLA? :))

[22:25:50] <flusive> :D

[22:26:02] <lost1nfound> GothAlice: I see, I see, makes sense. Yeah, we could keep some kind of "queue log" we could reconstruct from, and then rotate it out periodically.

[22:26:07] <GothAlice> Thinh: http://cl.ly/image/071I2D0B1T2a

[22:26:18] <flusive> GothAlice, and do you use some parallel FS?

[22:26:24] <flusive> distributed FS?

[22:26:31] <flusive> or which FS do you use :)

[22:26:44] <Thinh> GothAlice: Hahaha that's awesome

[22:27:20] <flusive> gr8 uptime :)

[22:27:51] <GothAlice> flusive: The drobos have their own filesystem based on distributing smaller stripes amongst drives, each formatted ext3, I believe. However I'm technically running ZFS on top of iSCSI on top of that, to allow me to snapshot and export those snapshots to my desktop for offsite backup, amongst other goodies.

[22:28:30] <GothAlice> At work we use moosefs for our distributed /home folders.

[22:29:13] <flusive> and for mongo do you use zfs?

[22:29:17] <GothAlice> Aye.

[22:29:43] <GothAlice> (With certain things turned off, like inode compression…)

[22:30:59] <flusive> GothAlice, sounds good :) but it will hard to explain it for my boss :)

[22:31:41] <GothAlice> flusive: Yeah. Also note that my at-home dataset has been growing and in development since 2001 or so. (The earliest data in it originates from September 2000.) It's a bit of an organic mess. ;)

[22:33:20] <flusive> How big is that dataset?

[22:33:26] <GothAlice> 26 TiB.

[22:33:44] <GothAlice> And the project is called "Exocortex".

[22:34:37] <flusive> Computer Graphics and Simulation Software?

[22:36:13] <GothAlice> flusive: Nope, a permanent archive of every bit of digital information I have ever touched since the system went operational. Transparent HTTP proxy, ARP MITM listening to all traffic on my home network, etc. Much of it gets filtered out, of course. It's primarily organized as a metadata filesystem with heavy use of arbitrary string tags (and key/value tags) with a neural network linking tags together. (Synonyms, antonyms, etc.) NLP to

[22:36:13] <GothAlice> automatically determine tags from content as it arrives.

[22:37:32] <flusive> interesting :)

[22:38:12] <flusive> ok I have to go to sleep :( thanks for your help one more time :)

[22:38:13] <GothAlice> Exocortex is effectively my plan for uploading. (~2029 is my estimate.) :D

[22:38:20] <GothAlice> No worries. Have a great one!

[22:40:31] <flusive> ok buy :) I will be here also tommorow I think :)

Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 26th of February, 2015