[03:54:39] <harly> Question: i'm currently waiting on a new member in an RS to complete the initial sync. There's 2.5TB of data. It appears that all the files have been synced and it is now building indexes. How can I get an idea of when it will finish? db.currentOp() only shoes the current index task %, not how many have been done or are remaining...
[04:06:06] <Boomtime> @harly: sadly, not really - https://jira.mongodb.org/browse/SERVER-9115
[04:07:09] <harly> thanks @Boomtime. I'd stumbled across that ticket but assumed it was merely a high-level request, like what would appear in rs.status()
[04:07:33] <harly> trying to work out lower level ways to get at it. I'm also unfortunately (for me) quite new to MD. :)
[04:09:13] <harly> this task I've taken on has been running for almost 24 hours. I also stumbled across a suggestion ( https://www.kchodorow.com/blog/2012/06/26/replica-set-internals-part-v-initial-sync/ ) that it's faster to pause an existing member, sync the files to the new one, and then have it resume. no index rebuiding needed.
[04:12:14] <Boomtime> it will often be faster to clone at the filesystem a new member than use initial sync - you lose some of the beneficial side-effects, such as compaction, but you can gain a lot of time
[04:12:46] <harly> Time might be more important in this instance. When you say compaction do you mean the benefit of a freshly built index?
[04:15:01] <Boomtime> index effect is only slight at the end of the day - compaction of the data files can be significant though
[04:17:40] <harly> ok. well i can check how much that benefited when this one finishes. we'll have some tens more to do afterwards though, at which if the compaction isn't significant speed will win :)
[04:18:27] <harly> looks like compacting can be setup to run continuously, if a member can afford to go into recovery mode.
[05:33:43] <cart_man> Hi everyone... I can not seem to find a way to UPDATE an entry in MongoDB. I AM Using Mongoose!
[07:17:03] <Derick> Zelest_: for now, yes, new MongoDB\BSON\UTCDateTime(time()*1000), but we'll fix that soon enough
[07:26:04] <Zelest> Derick, Ah, thanks for the reply :)
[08:35:50] <granite> we have a list of integer values for a field, say ids. My team lead want to store it as csv string, but I prefer a list which seems more intuitive.
[08:37:14] <Zelest> define "a list"? how do you separate each item in the list?
[08:38:18] <Zelest> mostly for the benefit of being able to query it easier
[08:38:24] <granite> it's a collection of some location code
[08:39:04] <granite> the reason from my team lead is that storing as csv avoid the need to parse json, (by the mongodb)
[08:39:21] <granite> so that's a string split versus json parsing
[08:39:52] <granite> he suggest storing as csv is more performance wise
[08:41:25] <granite> But I think since the nature of this field being a "collection of same type", it would suit more as an array (or list)
[08:41:40] <kurushiyama> Forgive my french, but his arguing is bullcrap.
[08:42:28] <kurushiyama> When parsing BSON into your native language type, you'd have raw CSV values in your field
[08:46:50] <selckin> is there a generic way to copy dbobjects in java ?
[08:47:07] <selckin> using jongo and it returns some lazy bson version (its won impl) and won't let me modify it
[08:47:33] <kurushiyama> granite: Well, of course it is not a primitive data type. But sooner or later you'd need to parse the CSV anyway. Plus: how the heck is he going to query/aggregate those ids?
[08:47:41] <granite> we retrieve it as a bson document, much like a map
[08:48:19] <granite> Just use Mongodb's native Java api
[08:49:02] <kurushiyama> Ok. Then, again: how is he going to query this field? what does he want to do with the field?
[08:49:25] <granite> my lead's point is: when we retrieve a Document from mongodb, it need to parse the "ids" array.
[08:49:49] <granite> But if we store it as a csv string, then it just return it. And we'll split the string
[08:50:12] <granite> which is a json array parsing (by the mongodb), versus String split (by application code)
[08:50:42] <granite> he think we don't need to query on that field
[08:52:22] <granite> So it's a preference between intuitive data structure versus performance (if csv really provides performance gain)
[08:53:31] <kurushiyama> Wait a sec? He _really_ thinks that parsing into a non primive data type is done faster in JAVA than C++?
[08:54:49] <granite> kurushiyama the problem is, it would be hard to measure how much time mongodb spend paring the array in the document
[08:55:07] <granite> so I can't just go and make a measure
[08:56:45] <kurushiyama> Right. Because we are taliking of memory mapped files and C++ parsing of a []byte into an array. Ok, and you need the ids array(or string) as an CSV later down the road?
[08:57:47] <kurushiyama> Well, that is easy to measure.
[08:59:26] <granite> that's another mess, our interface for other team provide an string, and it's client's job to parse it. I would like to change this later to provide a list of integer, for the convience of client.
[09:00:05] <kurushiyama> Create a list of 1M strings. Parse them into a Java Array. The time it takes is the net time you need longer for 1M operations. Because iirc, bson arrays are returned as Java Arrays.
[12:17:43] <kurushiyama> granite: I was able to measure the difference in a naive way. The difference of array vs string was between 4 and 11 µs in average over 1M docs, each.
[12:19:47] <kurushiyama> granite: Without parsing it into the data you need.
[12:31:17] <StephenLynx> the new gridfs api from the node.js driver is pretty neat.
[12:31:25] <StephenLynx> even though it went away with a few features.
[12:37:44] <StephenLynx> such as being able to overwrite a file and delete files by name
[12:38:14] <StephenLynx> but given how easier it became to stream, I think it was a positive change overall
[12:38:48] <StephenLynx> specially streaming specific ranges
[13:44:27] <kba> Anyway, I first tried with find(...).toArray(), but that ran out of memory really quickly
[13:44:54] <kba> I'm now trying with find(...).each(), thinking that wouldn't try to load everything into memory at once, and it does get further than the other one, but also crashes
[13:45:01] <scmp> how complex is the transformation?
[13:45:18] <scmp> just field mapping or any content changes?
[13:50:56] <kurushiyama> Well, great. Have profiled a benchmark for granite. should actually prove that reading an array seems to be faster than a string. Now he is gone.
[13:51:32] <cheeser> that'll teach you to go the extra mile. there's never someone waiting at the end.
[13:54:07] <StephenLynx> oh yeah, mongodb implemented actual array for the lulz
[13:54:13] <kba> StephenLynx: the 2nd example from http://mongodb.github.io/node-mongodb-native/2.1/api/Cursor.html#next seems a lot easier to implement
[13:54:21] <kba> is there a particular reason why you suggested the recursive method?
[13:55:56] <StephenLynx> if I were you, I would give a long, hard look on promises before using them.
[13:56:01] <StephenLynx> specially regarding their performance.
[13:56:18] <kurushiyama> Turned out that the time spent reading the documents with an actual array is consistently lower than with a string of the same elements. However, depending on how the raw documents are dealt with, it can actually happen that parsing the csv string and creating the required data structure by hand from the raw values might be faster than bson unmarshalling into the data structure.
[13:56:31] <kba> I'll just use the first approach -- I assumed the 2nd was sync
[13:56:49] <StephenLynx> you never use sync code for large tasks.
[13:57:01] <StephenLynx> you are pretty much locking down the cpu core.
[13:57:24] <StephenLynx> what you can use sync code is reading a 50 byte settings file or something
[13:59:12] <kba> if you're making a sync read operation, the process will yield and sleep until the scheduler wakes it up again
[13:59:18] <noobandnoober> i'm making a messaging app, in terms of perfomance, is it better to insert a document for each chat, or for with each message?(with a chat ID inside)
[14:11:57] <cheeser> needs better tooling and a proper build/dep story
[14:13:16] <kurushiyama> cheeser: Well, I use submodules for vendoring, Works on the commit level ;) as for the builds: you have to get used to it. But works great.
[14:13:35] <cheeser> i wrote a makefile for the project i was on :D
[14:18:29] <kurushiyama> cheeser: Well, I did that, too. Until I noticed that it was basically just me using the given tools wrongly.
[14:21:01] <cheeser> we had a bash script parsing a Godeps that ran git to fetch things blah blah blah
[14:50:20] <WhereIsMySpoon> How do I do something in mongodb like this sql statement: "select count(*), name from MyTable where type="someType" and time < '2016-05-20' and time > '2016-05-19' group by name" ? assuming I have name and time and type as fields in my mongo object too.
[14:54:13] <scmp> "aggregating group operators are unary"? in a simple $group with _id and $push... i don't get it
[14:55:13] <scmp> also, it's not possible to use $push/$each/$sort in a $project stage?
[14:56:20] <oky> scmp: are you doing a $push inside the $_id?
[14:56:54] <scmp> no, i already assumed that is not possible since it's a simple expression
[14:58:12] <WhereIsMySpoon> also i need a having count(*) > 10 in that too
[14:58:41] <cart_man> StephenLynx: Why should I not use Mongoose? What else is there anyway ?
[14:59:09] <cart_man> StephenLynx: I Must say... Mongoose has been quite an abstract learning curve!
[14:59:29] <oky> WhereIsMySpoon: take a look at aggregation framework. the operations you mentioned are part of it
[14:59:35] <WhereIsMySpoon> yea..im just about to paste what i have
[14:59:39] <WhereIsMySpoon> its not -quite- working
[15:00:09] <WhereIsMySpoon> oky: https://gist.github.com/anonymous/68b2f9787dbd96367c55cf7056528944 this is what I have, i just need to add the having count bit on and i cant figure out how to get that in there
[15:01:03] <WhereIsMySpoon> i tried putting a total:{$gt:10} in the cond bit but that didnt work
[15:01:23] <cart_man> Does anybody know how I can remove entries from a DB using Mongoose?
[15:02:05] <oky> scmp: i think it is saying that $group should not have two keys inside it
[15:02:20] <scmp> oky: the plan was to use group_id in the second $group
[15:02:35] <scmp> the first $group gives the error
[15:02:36] <oky> scmp: nvm. it'll take me a few moments
[15:05:29] <oky> scmp: does it work if you change the array in $push to a dict?
[15:06:48] <oky> WhereIsMySpoon: you can add a new pipeline stage with $match operator
[15:08:22] <oky> WhereIsMySpoon: it looks like you are using .group() on a collection. i'm unfamiliar with how to add a 'HAVING' clause to that (possible in the finalize), but i know you can do it reasonably easily with aggregation framework
[15:08:48] <WhereIsMySpoon> alright, yea, im converting to aggregate now tbh
[15:08:52] <WhereIsMySpoon> https://docs.mongodb.com/v3.0/reference/sql-aggregation-comparison/ this is so nice
[15:08:58] <WhereIsMySpoon> i wish all the operations had stuff like this
[15:09:06] <WhereIsMySpoon> or maybe my google fu is weak
[15:09:19] <scmp> oky: 1 sec, let me get a better example with what i want to accomplish, had the question here earlier
[15:11:07] <StephenLynx> cart_man, because it doesn't handle ids and dates properly
[15:19:57] <cart_man> StephenLynx: Holy shit ive been having issues the entire day getting _id's ...........FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUK
[15:20:07] <cart_man> Can not believe that it could be faulty libs
[15:21:30] <scmp> oky: https://paste.debian.net/hidden/1a8dd5b9/ _id for the $group needs to be an array of two strings sorted. this gives me "invalid operator '$each'" now
[15:22:39] <kurushiyama> scmp: Um, you can not nest stage operators.
[15:23:57] <kurushiyama> scmp: Think of the aggregation as a UNIX pipeline
[15:24:24] <WhereIsMySpoon> er...eh? mongodb is complaining about needing to use the disk to do an aggregate on only 150k documents?
[15:24:35] <kurushiyama> The output of the current _stage_ is the input for the next.
[15:24:37] <StephenLynx> how large are the documents?
[15:28:02] <StephenLynx> that is really odd, WhereIsMySpoon
[15:28:15] <WhereIsMySpoon> its analogous to select count(*), myId from MyTable where type = 'someType' and time > 'thedateicba' and time < 'thedateicba' group by myId having count(*) > 10
[15:31:52] <kurushiyama> WhereIsMySpoon: May well be, did not look into it. Best approach for getting help for an aggregation is always to give 3 or 4 sample docs, what you did so far and a document demonstrating the expected output.
[15:44:55] <scmp> StephenLynx, oky, kurushiyama thx for the help
[16:10:21] <deshymers> Hello, I ma trying to do chain several $group using the aggregate framework. However I am having trouble passing the previous $group results down the pipeline. Here is my query, https://gist.github.com/deshymers/809e3a7783299517caa37a9feec00ba6 and here is the error I am getting, "exception: the group aggregate field 'activities' must be defined as an expression inside an object"
[16:11:31] <deshymers> I need to group by the object field and a tiem interval. I then need to group those results by the same interval, and then group those results by the subject field
[16:13:10] <deshymers> The reason I am using $push in the initial $group is I need to pull the result out based on the isStatus field, and I do this outside after I get my results back from the server
[16:23:58] <deshymers> oops I see my error in the query above
[16:24:14] <deshymers> had the 'activities' outside the $group
[16:34:05] <kurushiyama> deshymers: Was about to say it ;)
[18:26:25] <kurushiyama> alexi5: Nope. DBA of one.
[18:28:20] <alexi5> I am building a charging portal for a Wifi gateway. the gateway will redirect user to the web application. the web application provides list of wifi packages. user selects a package ,forwarded to a payment gateway (paypal,etc..) . user pays and application gets payment result. if result is successfull provisions gateway with internet package
[18:29:01] <kurushiyama> alexi5: I would surely and absolutely not use MongoDB for that.
[18:31:46] <alexi5> the reason why I thought of mongodb was the data model required. For each package i will create pricing history as well as provisioning setting history associated with the package.
[18:32:33] <alexi5> for each user purchase, keep track of users that use login credentails, if package allows for more than one user
[18:32:54] <alexi5> and ensure that user does not go over maximum allowed for package
[18:34:30] <kurushiyama> alexi5: Ok, you should first get your user stories right. One by one. Then, make priorities, from the most common to the leat common user story, from there, derive the questions your data needs to answer. Then model accordingly.
[18:36:05] <alexi5> it would be nice to see what pros and cons of an existing application using mongodb (not including the one you are managing :D )
[18:36:24] <kurushiyama> alexi5: Modelling in MongoDB works basically the other way around than with SQL. SQL: Entity-Relationship diagram => model => bang your head against the wall to find the left upper beyond JOINS get your questions answered.
[18:36:55] <alexi5> ok. I guess I have to refactor my way of thinking for mongodb
[18:36:56] <kurushiyama> MongoDB: Find the questions => Model them optimized for the most common use cases
[20:38:38] <saml_> how do you get distinct count of an indexed field?
[23:24:32] <uuanton> hmm i not sure which one i should run to test it better
[23:25:22] <Boomtime> ok, if you can't be consistent with your test, then you don't get consistent answers
[23:26:01] <uuanton> ok one sec i will find a better query
[23:26:24] <Boomtime> or, you know, any query - empty predictaes isn't a query, it's just a disk read
[23:29:11] <zsoc> Soo... as I have just learned... /apparently/ one cannot use an _id of a document as a reference in another document.. I'm getting "Mod on _id not allowed". But... I know reference fields work, so what am I doing wrong? Do I have to cast the ObjectId .toString or something first or... am I just misunderstanding how references work?
[23:32:48] <zsoc> Basically I'm trying to save the _id of a 'User' document in the user_id field of an "Account" document. This sort of makes sense to me but maybe I'm missing something.
[23:33:39] <Boomtime> that sounds like something you should be able to do.. but "Mod on _id" is not the right result - that sounds like it's trying to change the _id of an existing document
[23:33:56] <zsoc> I'm using update() with an upsert
[23:34:09] <zsoc> what it's complaining about is a new document tho, not an update
[23:36:37] <zsoc> maybe there's some... var name confusion... there's two objects and one is representing the model and the other is representing the document and the only difference is the string case... but my IDE thinks it's fine (it's js ES6 so it should be...) but let me try... not doing that heh
[23:42:35] <zsoc> Nope. Same thing. Hmmmm. Ok well, I'm assuming the problem is mongoose related.. because that's every problem.
[23:51:30] <zsoc> The application i'm receiving the document from has a literal _id already set
[23:51:43] <zsoc> isn't that interesting. what a bad idea
[23:52:07] <zsoc> Is there anyway to massage or suggest an _id? heh... i guess I'll set it to api_id or something and then delete the _id before trying the upsert
[23:53:44] <zsoc> oh... I can use {ObjectId} and convert the existing string to an ObjectId? How quaint.
[23:59:20] <zsoc> Object passed in must be a string of 12 bytes or 24 hex characters. It's 24 non-hex characters. This API is literally just anti-mongodb lol... alrightly then