PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 25th of May, 2016

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[03:54:39] <harly> Question: i'm currently waiting on a new member in an RS to complete the initial sync. There's 2.5TB of data. It appears that all the files have been synced and it is now building indexes. How can I get an idea of when it will finish? db.currentOp() only shoes the current index task %, not how many have been done or are remaining...
[04:06:06] <Boomtime> @harly: sadly, not really - https://jira.mongodb.org/browse/SERVER-9115
[04:07:09] <harly> thanks @Boomtime. I'd stumbled across that ticket but assumed it was merely a high-level request, like what would appear in rs.status()
[04:07:33] <harly> trying to work out lower level ways to get at it. I'm also unfortunately (for me) quite new to MD. :)
[04:09:13] <harly> this task I've taken on has been running for almost 24 hours. I also stumbled across a suggestion ( https://www.kchodorow.com/blog/2012/06/26/replica-set-internals-part-v-initial-sync/ ) that it's faster to pause an existing member, sync the files to the new one, and then have it resume. no index rebuiding needed.
[04:12:14] <Boomtime> it will often be faster to clone at the filesystem a new member than use initial sync - you lose some of the beneficial side-effects, such as compaction, but you can gain a lot of time
[04:12:46] <harly> Time might be more important in this instance. When you say compaction do you mean the benefit of a freshly built index?
[04:15:01] <Boomtime> index effect is only slight at the end of the day - compaction of the data files can be significant though
[04:17:40] <harly> ok. well i can check how much that benefited when this one finishes. we'll have some tens more to do afterwards though, at which if the compaction isn't significant speed will win :)
[04:18:27] <harly> looks like compacting can be setup to run continuously, if a member can afford to go into recovery mode.
[05:33:43] <cart_man> Hi everyone... I can not seem to find a way to UPDATE an entry in MongoDB. I AM Using Mongoose!
[07:17:03] <Derick> Zelest_: for now, yes, new MongoDB\BSON\UTCDateTime(time()*1000), but we'll fix that soon enough
[07:26:04] <Zelest> Derick, Ah, thanks for the reply :)
[07:27:05] <Derick> plane to catch now though
[08:35:50] <granite> we have a list of integer values for a field, say ids. My team lead want to store it as csv string, but I prefer a list which seems more intuitive.
[08:37:14] <Zelest> define "a list"? how do you separate each item in the list?
[08:38:18] <Zelest> mostly for the benefit of being able to query it easier
[08:38:24] <granite> it's a collection of some location code
[08:39:04] <granite> the reason from my team lead is that storing as csv avoid the need to parse json, (by the mongodb)
[08:39:21] <granite> so that's a string split versus json parsing
[08:39:52] <granite> he suggest storing as csv is more performance wise
[08:41:25] <granite> But I think since the nature of this field being a "collection of same type", it would suit more as an array (or list)
[08:41:40] <kurushiyama> Forgive my french, but his arguing is bullcrap.
[08:42:28] <kurushiyama> When parsing BSON into your native language type, you'd have raw CSV values in your field
[08:42:46] <granite> that's right
[08:42:47] <kurushiyama> Then, you'd need to parse the CSV values
[08:43:05] <kurushiyama> So two parsers involved instead of one.
[08:43:25] <kurushiyama> And, as I might add, they need to run one after the other.
[08:43:44] <granite> yes, so he's point is: the mongodb don't need to parse the array if we store it as csv string
[08:44:11] <kurushiyama> What "parsing of array"? An array is as much a data type as a string is.
[08:44:43] <granite> { _id: 0001, ids: [1, 2, 3] }
[08:44:43] <kurushiyama> We are talking of nanoseconds in difference, maybe
[08:44:58] <kurushiyama> granite: So, where is the parsing involved?
[08:45:14] <granite> and he suggest {_id: 0001, ids: "1, 2, 3"}
[08:46:02] <kurushiyama> Side question: Are we talking of relations here?
[08:46:04] <granite> I guess he think that mongodb need to somehow parse ids: [1, 2, 3]
[08:46:22] <granite> no relation
[08:46:50] <selckin> is there a generic way to copy dbobjects in java ?
[08:47:07] <selckin> using jongo and it returns some lazy bson version (its won impl) and won't let me modify it
[08:47:33] <kurushiyama> granite: Well, of course it is not a primitive data type. But sooner or later you'd need to parse the CSV anyway. Plus: how the heck is he going to query/aggregate those ids?
[08:47:41] <granite> we retrieve it as a bson document, much like a map
[08:47:42] <kurushiyama> Via regex?
[08:48:00] <kurushiyama> granite: What language do you use?
[08:48:04] <granite> Java
[08:48:19] <granite> Just use Mongodb's native Java api
[08:49:02] <kurushiyama> Ok. Then, again: how is he going to query this field? what does he want to do with the field?
[08:49:25] <granite> my lead's point is: when we retrieve a Document from mongodb, it need to parse the "ids" array.
[08:49:49] <granite> But if we store it as a csv string, then it just return it. And we'll split the string
[08:50:12] <granite> which is a json array parsing (by the mongodb), versus String split (by application code)
[08:50:42] <granite> he think we don't need to query on that field
[08:52:22] <granite> So it's a preference between intuitive data structure versus performance (if csv really provides performance gain)
[08:53:31] <kurushiyama> Wait a sec? He _really_ thinks that parsing into a non primive data type is done faster in JAVA than C++?
[08:54:49] <granite> kurushiyama the problem is, it would be hard to measure how much time mongodb spend paring the array in the document
[08:55:07] <granite> so I can't just go and make a measure
[08:56:45] <kurushiyama> Right. Because we are taliking of memory mapped files and C++ parsing of a []byte into an array. Ok, and you need the ids array(or string) as an CSV later down the road?
[08:57:20] <granite> No
[08:57:28] <kurushiyama> You just need a string?
[08:57:37] <granite> I need a list of ids
[08:57:37] <kurushiyama> Or a native Java array?
[08:57:47] <kurushiyama> Well, that is easy to measure.
[08:59:26] <granite> that's another mess, our interface for other team provide an string, and it's client's job to parse it. I would like to change this later to provide a list of integer, for the convience of client.
[08:59:53] <granite> just joining this company
[09:00:05] <kurushiyama> Create a list of 1M strings. Parse them into a Java Array. The time it takes is the net time you need longer for 1M operations. Because iirc, bson arrays are returned as Java Arrays.
[09:01:01] <granite> Great suggestion.
[09:01:54] <kurushiyama> You could even create 1M docs for each model, and then iterate them to get the data type you need.
[09:02:27] <kurushiyama> granite: you can use mgenerate to create those docs.
[09:03:52] <kurushiyama> granite: "Premature optimization is the root of all evil." -- Donald Knuth
[09:04:48] <granite> kurushiyama: No, storing csv is quite common in the team, as suggest by my team lead.
[09:05:08] <granite> kurushiyama: it would at least be an argument if it did provides performance gain
[09:05:16] <granite> kurushiyama: I'll need to measure it
[09:05:46] <granite> kurushiyama: thank you very much/ (what's is iirc? )
[09:06:06] <kurushiyama> granite: As said above. Do an end-to-end test. From doc to data type as you need it. iirc = if i recall correctly
[09:06:55] <kurushiyama> granite: I'll eat my hat if manualy parsing from CSV is slower than using an array right away.
[09:07:19] <kurushiyama> granite: Even without mustard.
[09:07:30] <granite> kurushiyama: you mean faster?
[09:07:39] <kurushiyama> granite: Aye, ofc
[09:07:55] <kurushiyama> granite: ofc = of course
[09:08:14] <granite> kurushiyama: So kind of you to explain :D
[09:09:00] <kurushiyama> granite: I am tempted to implement it in Go to prove the point. And that is native code binaries without runtime.
[09:09:46] <granite> kurushiyama: Maybe I could make repeated test for compensation
[09:10:05] <kurushiyama> granite: Just use a large sample size. Should be more than sufficient
[09:10:20] <granite> but Great idea. Maybe I'll come around to report the result. See u until then (gotta work)
[09:10:46] <kurushiyama> granite: Guess what: I'll help you and implement it in Go for comparison.
[09:11:15] <granite> Damn you rocks!
[12:17:43] <kurushiyama> granite: I was able to measure the difference in a naive way. The difference of array vs string was between 4 and 11 µs in average over 1M docs, each.
[12:19:47] <kurushiyama> granite: Without parsing it into the data you need.
[12:31:17] <StephenLynx> the new gridfs api from the node.js driver is pretty neat.
[12:31:25] <StephenLynx> even though it went away with a few features.
[12:37:44] <StephenLynx> such as being able to overwrite a file and delete files by name
[12:38:14] <StephenLynx> but given how easier it became to stream, I think it was a positive change overall
[12:38:48] <StephenLynx> specially streaming specific ranges
[12:39:36] <enoch> hi all
[12:39:51] <enoch> my mongod (tokumx) gives segmentation false. How to debug it??
[12:42:01] <cheeser> probably by talking to percona
[12:43:27] <StephenLynx> :v
[12:43:40] <StephenLynx> and I think you means segmentation fault
[12:50:06] <enoch> yes
[12:50:22] <enoch> no way to fix it?
[12:52:09] <cheeser> i don't think anyone here really knows much about tokumx
[12:52:16] <cheeser> you're the first i've ever seen using it here.
[12:52:26] <StephenLynx> yeah
[12:52:53] <cheeser> afaict, not even percona's doing much with it.
[12:56:28] <enoch> and trying to fix it using the mongodb way?
[12:56:46] <StephenLynx> what would be the mongodb way? :v
[12:56:48] <granite> kurushiyama: Cool!! haven't got the time to make my measure though.
[13:06:13] <kurushiyama> enoch: Pretty easy: install MongoDB, migrate the data.
[13:07:56] <cheeser> the data that's stuck in the corrupted, unbootable server.
[13:16:07] <cart_man> Hi...does anybody know how I can delete an entry with mongoose?
[13:16:41] <kurushiyama> cheeser: Well... hmmm. Maybe it is a 2.6 and a dump can be done on the datafiles. If he is lucky.
[13:19:42] <cheeser> server has to be up to do a mongodump
[13:25:38] <StephenLynx> cart_man, I suggest you don't use mongoose.
[13:28:53] <kurushiyama> cheeser: Not in 2.6 ;) It was possible to create a dump from the datafiles directly.
[13:29:42] <kurushiyama> cheeser: https://docs.mongodb.com/v2.6/reference/program/mongodump/#cmdoption--dbpath
[13:34:18] <scmp> Hi, in a $group stage _id needs to be an array of two sorted strings. is $each and $sort the best approach or is there another way?
[13:36:19] <StephenLynx> I would create the array on anoter $group
[13:38:08] <cheeser> oh, right!
[13:38:16] <cheeser> whether that works on toku file stores....
[13:40:47] <scmp> StephenLynx: then i would still have two group criterias, one sorted and one unsorted
[13:42:14] <StephenLynx> you could sort on the second group stage
[13:42:19] <StephenLynx> I guess?
[13:42:20] <StephenLynx> :v
[13:43:28] <kba> so I'm trying to write a Node script to migrate a (11 GB), but I'm not very successful. The script just reads the old data, does some
[13:43:45] <StephenLynx> is it a direct migration?
[13:43:46] <kba> makes some simple transformations, then tries to save it again.
[13:43:47] <StephenLynx> 1:1?
[13:43:49] <StephenLynx> ah
[13:43:57] <kba> Yeah, I hit enter a little too quickly there
[13:44:17] <StephenLynx> and what is happening?
[13:44:27] <kba> Anyway, I first tried with find(...).toArray(), but that ran out of memory really quickly
[13:44:54] <kba> I'm now trying with find(...).each(), thinking that wouldn't try to load everything into memory at once, and it does get further than the other one, but also crashes
[13:45:01] <scmp> how complex is the transformation?
[13:45:18] <scmp> just field mapping or any content changes?
[13:45:29] <kba> Mostly just field mapping
[13:45:46] <kba> anyway, the .each approach even crashed the database
[13:45:58] <scmp> $out might be useful
[13:46:58] <StephenLynx> kba, hold on
[13:47:00] <StephenLynx> I know what you need
[13:47:20] <StephenLynx> http://mongodb.github.io/node-mongodb-native/2.1/api/Cursor.html#next
[13:47:35] <kba> I thought .each() used the cursor as well
[13:47:41] <StephenLynx> just remember to only call next once you have already finished processing the document you got before
[13:48:03] <StephenLynx> as in, you need to use asynchronous recursion.
[13:48:20] <StephenLynx> .each does use the cursor
[13:48:31] <StephenLynx> but it gets all documents at once
[13:48:50] <StephenLynx> next gets ONE document
[13:49:02] <StephenLynx> so you get ONE document, process it, get ONE document, process it
[13:49:13] <StephenLynx> once next doesn't give you anything, you know you finished
[13:49:25] <kba> I'll try that, thanks
[13:50:56] <kurushiyama> Well, great. Have profiled a benchmark for granite. should actually prove that reading an array seems to be faster than a string. Now he is gone.
[13:51:32] <cheeser> that'll teach you to go the extra mile. there's never someone waiting at the end.
[13:52:29] <StephenLynx> kek
[13:52:40] <StephenLynx> I missed part of the convo
[13:52:45] <StephenLynx> what was he doing?
[13:53:18] <StephenLynx> "arrayitem1,arraitem2"?
[13:53:25] <kurushiyama> Well, his boss was aguing that saving and dealing with a csv string would be faster. Aye
[13:53:30] <kurushiyama> Exactly
[13:53:41] <kurushiyama> I have written a Go Benchmark and profiled it.
[13:53:48] <StephenLynx> kek
[13:54:07] <StephenLynx> oh yeah, mongodb implemented actual array for the lulz
[13:54:13] <kba> StephenLynx: the 2nd example from http://mongodb.github.io/node-mongodb-native/2.1/api/Cursor.html#next seems a lot easier to implement
[13:54:21] <kba> is there a particular reason why you suggested the recursive method?
[13:54:36] <StephenLynx> IMO
[13:54:38] <StephenLynx> promises are cancer.
[13:54:46] <StephenLynx> but suit yourself, at least now they are standard.
[13:54:58] <StephenLynx> even though I am pretty sure their performance is awful.
[13:55:12] <kba> the 2nd approach just seems to use sync calls
[13:55:17] <StephenLynx> they are not
[13:55:19] <StephenLynx> they are promises.
[13:55:24] <StephenLynx> they abstract callbacks
[13:55:37] <kba> I see
[13:55:56] <StephenLynx> if I were you, I would give a long, hard look on promises before using them.
[13:56:01] <StephenLynx> specially regarding their performance.
[13:56:18] <kurushiyama> Turned out that the time spent reading the documents with an actual array is consistently lower than with a string of the same elements. However, depending on how the raw documents are dealt with, it can actually happen that parsing the csv string and creating the required data structure by hand from the raw values might be faster than bson unmarshalling into the data structure.
[13:56:31] <kba> I'll just use the first approach -- I assumed the 2nd was sync
[13:56:49] <StephenLynx> you never use sync code for large tasks.
[13:57:01] <StephenLynx> you are pretty much locking down the cpu core.
[13:57:24] <StephenLynx> what you can use sync code is reading a 50 byte settings file or something
[13:57:40] <kba> you mean locking down node?
[13:57:43] <StephenLynx> no
[13:57:46] <StephenLynx> the core
[13:57:58] <StephenLynx> that core will do NOTHING but your task.
[13:57:58] <kurushiyama> kba: As in "hardware"
[13:58:01] <noobandnoober> hi
[13:58:16] <kba> usually, the scheduler will put a process to sleep if it's waiting for io
[13:58:16] <StephenLynx> and since your task is IO bound, you are just making the core wait
[13:58:22] <StephenLynx> usually
[13:58:30] <StephenLynx> now on sync code.
[13:58:33] <kurushiyama> noobandnoober: Hoi!
[13:58:34] <StephenLynx> not on*
[13:58:52] <StephenLynx> and even if the core is not locked
[13:58:54] <StephenLynx> your process is.
[13:59:12] <kba> if you're making a sync read operation, the process will yield and sleep until the scheduler wakes it up again
[13:59:18] <noobandnoober> i'm making a messaging app, in terms of perfomance, is it better to insert a document for each chat, or for with each message?(with a chat ID inside)
[13:59:19] <StephenLynx> hm
[13:59:20] <StephenLynx> makes sense.
[13:59:24] <kba> you must be talking about something specific here
[13:59:36] <StephenLynx> but your node process is still locked on sync tasks.
[13:59:41] <kba> indeed
[13:59:44] <StephenLynx> while it could be doing something else.
[13:59:45] <StephenLynx> for example
[13:59:57] <StephenLynx> you could use your process to handle a certain amount of documents at once
[14:00:18] <kba> yes, I'm aware of process.nextTick, the worker queue and such
[14:00:26] <StephenLynx> you don't even need to go as far
[14:00:31] <kba> but since this is just a migration tool, batch work would be efficient
[14:00:42] <kba> but if there's not a sync implementation, it doesn't matter
[14:00:51] <StephenLynx> a simple loop executing an async task would be enough
[14:00:57] <StephenLynx> for example
[14:01:05] <kba> async tasks use process.nexttick
[14:01:08] <StephenLynx> I know
[14:01:12] <StephenLynx> but you don't have to use them yourself
[14:01:16] <StephenLynx> the nexttick
[14:01:25] <StephenLynx> you use them via the cursor
[14:01:28] <StephenLynx> for example
[14:01:30] <kba> yeah
[14:01:35] <kba> anyway, I should get this working, thanks for the help
[14:01:38] <StephenLynx> np
[14:01:52] <StephenLynx> you have a function that gets one document using .next
[14:02:00] <StephenLynx> you start by calling this function ten times
[14:02:15] <StephenLynx> and you track how many documents are being processed
[14:02:26] <StephenLynx> then once you finish, you wait for all pending documents to be processed
[14:02:44] <StephenLynx> and each document that is finished, start another document
[14:02:59] <StephenLynx> that way you have effectively then tasks running on a single core
[14:03:03] <StephenLynx> and a process
[14:03:18] <StephenLynx> which is much simpler than starting and coordinating then processes
[14:03:23] <StephenLynx> ten*
[14:07:20] <kba> I fear for concurrency issues with hasNext/next if I do that
[14:11:42] <cheeser> it's ok
[14:11:57] <cheeser> needs better tooling and a proper build/dep story
[14:13:16] <kurushiyama> cheeser: Well, I use submodules for vendoring, Works on the commit level ;) as for the builds: you have to get used to it. But works great.
[14:13:35] <cheeser> i wrote a makefile for the project i was on :D
[14:18:29] <kurushiyama> cheeser: Well, I did that, too. Until I noticed that it was basically just me using the given tools wrongly.
[14:21:01] <cheeser> we had a bash script parsing a Godeps that ran git to fetch things blah blah blah
[14:50:20] <WhereIsMySpoon> How do I do something in mongodb like this sql statement: "select count(*), name from MyTable where type="someType" and time < '2016-05-20' and time > '2016-05-19' group by name" ? assuming I have name and time and type as fields in my mongo object too.
[14:54:13] <scmp> "aggregating group operators are unary"? in a simple $group with _id and $push... i don't get it
[14:55:13] <scmp> also, it's not possible to use $push/$each/$sort in a $project stage?
[14:56:20] <oky> scmp: are you doing a $push inside the $_id?
[14:56:54] <scmp> no, i already assumed that is not possible since it's a simple expression
[14:57:06] <oky> scmp: can you link your query
[14:57:39] <scmp> yeah, one sec
[14:58:12] <WhereIsMySpoon> also i need a having count(*) > 10 in that too
[14:58:41] <cart_man> StephenLynx: Why should I not use Mongoose? What else is there anyway ?
[14:59:09] <cart_man> StephenLynx: I Must say... Mongoose has been quite an abstract learning curve!
[14:59:29] <oky> WhereIsMySpoon: take a look at aggregation framework. the operations you mentioned are part of it
[14:59:35] <WhereIsMySpoon> yea..im just about to paste what i have
[14:59:39] <WhereIsMySpoon> its not -quite- working
[15:00:09] <WhereIsMySpoon> oky: https://gist.github.com/anonymous/68b2f9787dbd96367c55cf7056528944 this is what I have, i just need to add the having count bit on and i cant figure out how to get that in there
[15:01:03] <WhereIsMySpoon> i tried putting a total:{$gt:10} in the cond bit but that didnt work
[15:01:12] <scmp> oky: https://paste.debian.net/hidden/9ee6bbdc/
[15:01:23] <cart_man> Does anybody know how I can remove entries from a DB using Mongoose?
[15:02:05] <oky> scmp: i think it is saying that $group should not have two keys inside it
[15:02:20] <scmp> oky: the plan was to use group_id in the second $group
[15:02:35] <scmp> the first $group gives the error
[15:02:36] <oky> scmp: nvm. it'll take me a few moments
[15:05:29] <oky> scmp: does it work if you change the array in $push to a dict?
[15:06:48] <oky> WhereIsMySpoon: you can add a new pipeline stage with $match operator
[15:08:22] <oky> WhereIsMySpoon: it looks like you are using .group() on a collection. i'm unfamiliar with how to add a 'HAVING' clause to that (possible in the finalize), but i know you can do it reasonably easily with aggregation framework
[15:08:48] <WhereIsMySpoon> alright, yea, im converting to aggregate now tbh
[15:08:52] <WhereIsMySpoon> https://docs.mongodb.com/v3.0/reference/sql-aggregation-comparison/ this is so nice
[15:08:58] <WhereIsMySpoon> i wish all the operations had stuff like this
[15:09:06] <WhereIsMySpoon> or maybe my google fu is weak
[15:09:19] <scmp> oky: 1 sec, let me get a better example with what i want to accomplish, had the question here earlier
[15:11:07] <StephenLynx> cart_man, because it doesn't handle ids and dates properly
[15:11:10] <StephenLynx> it is too slow
[15:11:40] <StephenLynx> not to mention it forces you to use mongo in a way that is not intended by design
[15:11:44] <StephenLynx> which is like a relational database.
[15:11:52] <StephenLynx> what one should use for node.js is mongodb
[15:11:57] <StephenLynx> the native, 1st party driver
[15:19:19] <kurushiyama> cart_man: http://stackoverflow.com/questions/5809788/how-do-i-remove-documents-using-node-js-mongoose
[15:19:57] <cart_man> StephenLynx: Holy shit ive been having issues the entire day getting _id's ...........FUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUK
[15:20:07] <cart_man> Can not believe that it could be faulty libs
[15:20:10] <StephenLynx> yup
[15:20:15] <cart_man> sigh.............
[15:20:19] <kurushiyama> cart_man: But I can not stress enough how much I agree with StephenLynx
[15:20:19] <cart_man> FML Seriously!
[15:20:24] <StephenLynx> mongoose is garbage
[15:20:34] <cart_man> They really sell it well though
[15:20:45] <StephenLynx> heh
[15:20:54] <StephenLynx> they got a contact on mongodb
[15:20:59] <StephenLynx> I think the dev works for it
[15:21:07] <StephenLynx> thats literally the only reason it ever got any traction
[15:21:11] <StephenLynx> because its utter crap
[15:21:13] <kurushiyama> cart_man: Well, by that logic: "Billions of fleas eat poo"
[15:21:14] <cart_man> Yay to deadlines ... hehh
[15:21:22] <cart_man> lol thats true
[15:21:30] <scmp> oky: https://paste.debian.net/hidden/1a8dd5b9/ _id for the $group needs to be an array of two strings sorted. this gives me "invalid operator '$each'" now
[15:22:39] <kurushiyama> scmp: Um, you can not nest stage operators.
[15:23:57] <kurushiyama> scmp: Think of the aggregation as a UNIX pipeline
[15:24:24] <WhereIsMySpoon> er...eh? mongodb is complaining about needing to use the disk to do an aggregate on only 150k documents?
[15:24:35] <kurushiyama> The output of the current _stage_ is the input for the next.
[15:24:37] <StephenLynx> how large are the documents?
[15:24:52] <kurushiyama> scmp: ^
[15:25:14] <WhereIsMySpoon> StephenLynx: about 600 characters each
[15:25:24] <StephenLynx> hm
[15:25:35] <WhereIsMySpoon> oh great and it didnt even return anything
[15:25:37] <WhereIsMySpoon> sigh
[15:25:45] <WhereIsMySpoon> i really hate doing complex operations using mongo
[15:25:50] <StephenLynx> v:
[15:25:59] <StephenLynx> how complex is your operation, though?
[15:26:05] <WhereIsMySpoon> its an aggregate
[15:26:07] <kurushiyama> WhereIsMySpoon: Here is how I do it. Stage by stage.
[15:26:08] <StephenLynx> ok
[15:26:11] <WhereIsMySpoon> its not even that complex in sql
[15:26:12] <StephenLynx> but how complex is it?
[15:26:19] <WhereIsMySpoon> let me gist it for you
[15:27:28] <WhereIsMySpoon> StephenLynx: https://gist.github.com/anonymous/25cb4b079672b1ae09180b542419c5ab
[15:27:47] <scmp> kurushiyama: i get that, i assumed the $push would be an expression as well
[15:27:54] <StephenLynx> hm
[15:28:02] <StephenLynx> that is really odd, WhereIsMySpoon
[15:28:15] <WhereIsMySpoon> its analogous to select count(*), myId from MyTable where type = 'someType' and time > 'thedateicba' and time < 'thedateicba' group by myId having count(*) > 10
[15:28:21] <StephenLynx> the thing is
[15:28:22] <kurushiyama> scmp: Push is, sort is not
[15:28:24] <WhereIsMySpoon> which is very not complex
[15:28:31] <StephenLynx> on your match it won't have anything to match
[15:28:46] <StephenLynx> because the documents output by the group stage don't have the fields
[15:28:48] <kurushiyama> WhereIsMySpoon: Show us what you did so far.
[15:28:51] <StephenLynx> aside from count
[15:28:57] <WhereIsMySpoon> kurushiyama: that is what i did so far, in the gist
[15:29:01] <StephenLynx> so your match won't match anything.
[15:29:06] <WhereIsMySpoon> StephenLynx: oh, so do i need match first?
[15:29:15] <StephenLynx> yeah
[15:29:16] <WhereIsMySpoon> or do i need 2 match stages?
[15:29:22] <StephenLynx> you might
[15:29:31] <StephenLynx> i dont now what you are trying to achieve
[15:29:41] <kurushiyama> WhereIsMySpoon: For you too: the output of the current stage is the input of the next.
[15:29:44] <StephenLynx> that
[15:29:50] <WhereIsMySpoon> i typed the sql statement i want to achieve
[15:30:38] <scmp> kurushiyama: so there is no way to sort this during aggregation, i have to create an extra field
[15:30:39] <WhereIsMySpoon> aha!
[15:30:42] <WhereIsMySpoon> 2 match stages did it
[15:30:52] <WhereIsMySpoon> one before with the type and time
[15:30:56] <WhereIsMySpoon> and one after with the count gt 10
[15:31:37] <WhereIsMySpoon> thank you all
[15:31:52] <kurushiyama> WhereIsMySpoon: May well be, did not look into it. Best approach for getting help for an aggregation is always to give 3 or 4 sample docs, what you did so far and a document demonstrating the expected output.
[15:32:47] <WhereIsMySpoon> alright
[15:33:03] <WhereIsMySpoon> i just find the whole {{}{}}{}{}{}{{}{}}}{}{{{{}{{}}}} thing with mongo queries annoying
[15:33:09] <WhereIsMySpoon> =/
[15:33:43] <WhereIsMySpoon> sql feels much more concise and less fiddly, but nosql has its place for data, i know
[15:33:48] <WhereIsMySpoon> just venting :p
[15:34:05] <cheeser> it's json so you're gonna get that structure
[15:34:19] <StephenLynx> WhereIsMySpoon, on the other hand, you don't have to have 2 languages on the code.
[15:34:20] <StephenLynx> is all js
[15:34:29] <StephenLynx> also, auto formatting is a necessity.
[15:34:53] <WhereIsMySpoon> StephenLynx: if your backend is built with js, sure
[15:35:03] <StephenLynx> ah
[15:35:12] <StephenLynx> your gist was in js, so I assumed you were using it
[15:35:19] <WhereIsMySpoon> i did that just for formatting
[15:35:23] <WhereIsMySpoon> java for me
[15:36:46] <WhereIsMySpoon> anyhow thanks again all for the help
[15:36:48] <WhereIsMySpoon> <3
[15:44:55] <scmp> StephenLynx, oky, kurushiyama thx for the help
[16:10:21] <deshymers> Hello, I ma trying to do chain several $group using the aggregate framework. However I am having trouble passing the previous $group results down the pipeline. Here is my query, https://gist.github.com/deshymers/809e3a7783299517caa37a9feec00ba6 and here is the error I am getting, "exception: the group aggregate field 'activities' must be defined as an expression inside an object"
[16:11:31] <deshymers> I need to group by the object field and a tiem interval. I then need to group those results by the same interval, and then group those results by the subject field
[16:13:10] <deshymers> The reason I am using $push in the initial $group is I need to pull the result out based on the isStatus field, and I do this outside after I get my results back from the server
[16:23:58] <deshymers> oops I see my error in the query above
[16:24:14] <deshymers> had the 'activities' outside the $group
[16:34:05] <kurushiyama> deshymers: Was about to say it ;)
[18:22:42] <alexi5> hello
[18:23:14] <alexi5> do you guys know any use cases of mongodb in ecommerce ?
[18:23:44] <kurushiyama> alexi5: Yup.
[18:23:58] <alexi5> can you give me some examples ?
[18:25:49] <kurushiyama> alexi5: Nope. NDA.
[18:26:11] <kurushiyama> alexi5: How about telling us what you want to do, and we can help youß
[18:26:11] <alexi5> ahh you are developing one :)
[18:26:25] <kurushiyama> alexi5: Nope. DBA of one.
[18:28:20] <alexi5> I am building a charging portal for a Wifi gateway. the gateway will redirect user to the web application. the web application provides list of wifi packages. user selects a package ,forwarded to a payment gateway (paypal,etc..) . user pays and application gets payment result. if result is successfull provisions gateway with internet package
[18:29:01] <kurushiyama> alexi5: I would surely and absolutely not use MongoDB for that.
[18:29:12] <alexi5> why not ?
[18:29:18] <kurushiyama> alexi5: Overkill
[18:29:30] <alexi5> its simple right :)
[18:31:46] <alexi5> the reason why I thought of mongodb was the data model required. For each package i will create pricing history as well as provisioning setting history associated with the package.
[18:32:33] <alexi5> for each user purchase, keep track of users that use login credentails, if package allows for more than one user
[18:32:54] <alexi5> and ensure that user does not go over maximum allowed for package
[18:34:30] <kurushiyama> alexi5: Ok, you should first get your user stories right. One by one. Then, make priorities, from the most common to the leat common user story, from there, derive the questions your data needs to answer. Then model accordingly.
[18:34:47] <alexi5> ok
[18:36:05] <alexi5> it would be nice to see what pros and cons of an existing application using mongodb (not including the one you are managing :D )
[18:36:24] <kurushiyama> alexi5: Modelling in MongoDB works basically the other way around than with SQL. SQL: Entity-Relationship diagram => model => bang your head against the wall to find the left upper beyond JOINS get your questions answered.
[18:36:55] <alexi5> ok. I guess I have to refactor my way of thinking for mongodb
[18:36:56] <kurushiyama> MongoDB: Find the questions => Model them optimized for the most common use cases
[20:38:38] <saml_> how do you get distinct count of an indexed field?
[20:39:57] <saml_> db.docs.aggregate({$group:{_id:'$field'}}, {$group:{_id:1,c:{$sum:1}}})
[20:41:29] <Derick> I think you;d want '_id: NULL' in the second operator
[21:10:08] <saml_> ReferenceError: NULL is not defined
[21:55:54] <Doyle> Hey. How can you return a list of documents with a specific field?
[22:08:40] <rkgarcia> Doyle, that have a specific field_
[22:08:41] <rkgarcia> ?
[22:08:46] <rkgarcia> or what do you mean?
[22:12:57] <Doyle> I found it with $exists
[22:13:00] <Doyle> thanks
[22:13:19] <uuanton> guys who can help to figure out why database query is slow
[22:13:54] <uuanton> simple query on one machine takes ms and other machine take seconds
[22:19:45] <Boomtime> @uuanton: run the query on both with .explain()
[22:45:36] <uuanton> thanks a lot
[22:47:21] <uuanton> Boomtime what things i need to look after .explain() ?
[22:58:36] <Boomtime> @uuanton: compare the winningPlan section between them to start with
[23:06:42] <uuanton> Boomtime i dont see winningPlan on 2.6.12 mongodb
[23:08:54] <Boomtime> erg.. 2.6, two major versions old.. ok, compare the 'cursor' line
[23:09:33] <Boomtime> in fact, just compare the first few fields of each - it should be pretty clear pretty quick if they are even remotely similar
[23:09:51] <Boomtime> you can always put both of the outputs into gist/pastebin and post a link here
[23:10:18] <uuanton> thanks i will try to do that
[23:11:29] <uuanton> db.getCollection('mycollection').find().limit(10).explain()
[23:11:56] <uuanton> didnt give me helpful information
[23:14:02] <uuanton> in fact they have same numbers
[23:17:15] <Boomtime> uuanton: can you post the complete results to a pastebin?
[23:17:20] <Boomtime> of both outputs
[23:19:22] <uuanton> http://pastebin.com/jqga8xpH
[23:19:35] <uuanton> both outputs identical only server name is different
[23:21:40] <Boomtime> then the result is identical
[23:22:29] <Boomtime> something has changed or there is a problem in your code
[23:23:12] <uuanton> im not running any code im querying database thru shell
[23:23:17] <Boomtime> you can try re-running the explain with a true parameter; explain(true)
[23:23:28] <Boomtime> please do this on both and post both results
[23:23:55] <Boomtime> is that seriously the query you ran?
[23:24:04] <Boomtime> -> db.getCollection('mycollection').find().limit(10).explain()
[23:24:11] <Boomtime> no predicates?
[23:24:32] <uuanton> hmm i not sure which one i should run to test it better
[23:25:22] <Boomtime> ok, if you can't be consistent with your test, then you don't get consistent answers
[23:26:01] <uuanton> ok one sec i will find a better query
[23:26:24] <Boomtime> or, you know, any query - empty predictaes isn't a query, it's just a disk read
[23:29:11] <zsoc> Soo... as I have just learned... /apparently/ one cannot use an _id of a document as a reference in another document.. I'm getting "Mod on _id not allowed". But... I know reference fields work, so what am I doing wrong? Do I have to cast the ObjectId .toString or something first or... am I just misunderstanding how references work?
[23:32:48] <zsoc> Basically I'm trying to save the _id of a 'User' document in the user_id field of an "Account" document. This sort of makes sense to me but maybe I'm missing something.
[23:33:39] <Boomtime> that sounds like something you should be able to do.. but "Mod on _id" is not the right result - that sounds like it's trying to change the _id of an existing document
[23:33:47] <zsoc> hmmm well
[23:33:56] <zsoc> I'm using update() with an upsert
[23:34:09] <zsoc> what it's complaining about is a new document tho, not an update
[23:36:37] <zsoc> maybe there's some... var name confusion... there's two objects and one is representing the model and the other is representing the document and the only difference is the string case... but my IDE thinks it's fine (it's js ES6 so it should be...) but let me try... not doing that heh
[23:42:35] <zsoc> Nope. Same thing. Hmmmm. Ok well, I'm assuming the problem is mongoose related.. because that's every problem.
[23:51:11] <zsoc> OH
[23:51:30] <zsoc> The application i'm receiving the document from has a literal _id already set
[23:51:43] <zsoc> isn't that interesting. what a bad idea
[23:52:07] <zsoc> Is there anyway to massage or suggest an _id? heh... i guess I'll set it to api_id or something and then delete the _id before trying the upsert
[23:53:44] <zsoc> oh... I can use {ObjectId} and convert the existing string to an ObjectId? How quaint.
[23:59:20] <zsoc> Object passed in must be a string of 12 bytes or 24 hex characters. It's 24 non-hex characters. This API is literally just anti-mongodb lol... alrightly then