[03:00:00] <joannac> playing ping-pong is rarely a productive use of time
[03:00:51] <b0o_> forgive me, i've been idling in #openshift for 72 hours trying to get a question answered. perhaps i presumed too much.
[03:02:30] <b0o_> when executing a query and returning a document, i've been given to understand that mongo loads an entire document into memory before returning the requested fileds. does it also load all sub-document collections?
[03:02:41] <cheeser> actually putting the question out there would let anyone show up answer, perhaps while you're afk, and then you'd have an answer when you came back
[03:02:58] <joannac> I don't know what "sub document collections" means
[03:04:01] <b0o_> how would i properly refer to a group of subdocuments?
[03:07:01] <b0o_> i'm pretty new to this way of thinking about data. i was fortunate enough to be able to attend the world conference, and my head's still swimming in the complexities.
[03:09:58] <b0o_> folks from mongodb raised an eyebrow at my data model, and i've been wondering if i'm doing it wrong or how i might do it more efficiently. the direction i take at this point would depend on the answer to my question.
[03:15:25] <b0o_> sorry.. semantics again i think. i'm parsing a flat file that's 210MB which contains a nested data structure that extends to a couple dozen levels deep in some cases, so no single document is more than a few K.
[03:17:10] <b0o_> now that you mention it, i've heard the 16MB cap mentioned a number of times. that would tend to say that it's not loading the entire subdocument structure into memory then, yeah?
[09:15:09] <bluework> Any help with the mongodb c driver ? It's triggering a stream error (Failed to buffer 4 bytes within 300000 milliseconds.) when I'm trying to bulk insert
[09:15:58] <bluework> I'm inserting locally, so I can't really pinpoint where it's triggering this error.
[09:16:02] <bmcgee> Hey guys. I have a collection of objects that have a createdAt timestamp. I want to create an analytics summary in which I aggregate for last 10 mins, last 1 hour and last 24 hours. Is it possible to do this in one shot, as opposed to 3 separate calls?
[09:16:58] <bluework> Returned error is Error: No healthy connections.
[09:25:51] <bmcgee> nevermind, reading through the hierarchical aggregation pattern in the docs
[09:34:19] <richthegeek_> bmcgee: you can do similar stuff with the aggregation framework which is much faster
[09:35:08] <richthegeek_> bmcgee: i did this yesterday in fact, nearly all of the operators work as expected with the exception of the arithmetical operators for which there is no obvious translaiton
[09:35:42] <bmcgee> richthegeek_: hmm, how did you structure it to do the multiple roll ups?
[09:36:00] <richthegeek_> bmcgee: cron jobs and spit
[09:36:49] <richthegeek_> bmcgee: it's essentially the same pipeline in all cases but with the group fields operating on different fields between the initial summary and the subsequent rollups
[09:37:29] <richthegeek_> bmcgee: in my case, the results of each gets stored as a document so I do an $unwind on the results in each rollup, not sure if that's relevant to you
[09:37:53] <richthegeek_> a single document per window, i mean
[09:40:00] <dragonbe> anyone here knows of a good mongodb SaaS supplier? A customer wants to shift to the cloud and implement MongoDB, but couldn’t find any suppliers through normal searches.
[09:40:40] <richthegeek_> dragonbe: mongohq, mongolabs are both well-known but I find them far too expensive personally
[09:41:21] <dragonbe> expensive is relative to the service they offer
[09:41:46] <dragonbe> are these recommended service providers or just a couple you’ve heard of?
[09:42:10] <richthegeek_> they're ones that a lot of people use and have had good experiences with from what i've heard
[09:42:54] <richthegeek_> we've avoided them because setting up a cluster on linode gives us 96GB of SSD backed storage for $120/month... far better value although obviously more effort
[09:44:23] <dragonbe> I understand, well thanks richthegeek_
[09:44:38] <dragonbe> Let me check their pricing models and advice my customers on this
[09:46:04] <richthegeek_> take a look at what MMS are doing with automation as well: https://mms.mongodb.com/learn-more/automation - it's not available yet (or at least, i dont have access) but might give you the balance of flexibility, ease-of-use, and cost that you need
[09:49:28] <joannac> luca3m: replica sets are for redundancy, not read scaling. http://askasya.com/post/canreplicashelpscaling
[11:21:35] <KushS> How do I clear a collection in mongoose?
[12:24:39] <jmccree> Is there any way to get the mongo cli to output strict json?
[13:07:35] <svector> Anyone using time series data?
[13:07:47] <svector> I want suggestions on my case...
[13:08:20] <svector> We collect market price for many commodities over many markets...
[14:08:47] <_boot> i have an old piece of software which tails oplog.rs in the local database, however in 2.6 I can no longer create users in the local database, so I can't give this program access to the oplog - is there a way to work around this?
[14:12:57] <bcave> has anyone successfully used mongooplog?
[14:13:40] <jmccree> bcave, printjson outputs bson instead of json.
[14:14:13] <bcave> In interactive mode, mongo prints the results of operations including the content of all cursors. In scripts, either use the JavaScript print() function or the mongo specific printjson() function which returns formatted JSON.
[14:15:01] <jmccree> bcave, yeah, but it still doesn't output valid regular json.
[14:15:11] <jmccree> ie: that you can parse with any old json parser.
[14:15:16] <bcave> maybe i'm misunderstanding what you want
[14:15:21] <bcave> i print json all the time with that
[14:15:34] <bcave> and use it in non-mongo js applications
[14:15:52] <jmccree> the timestamp objects break json parsers.
[14:17:57] <jmccree> specifically I was wanting to use rs.status() output, but json generated included: "lastHeartbeat" : ISODate("2014-07-01T14:15:18Z"),
[14:18:35] <jmccree> and the ISODate triggered "invalid char". I ended up just pulling in the data via a mongo driver.
[14:18:43] <jmccree> couldn't find any way to do it cli only.
[14:27:54] <bcave> yeah, just had a look at it and didn't see anything overly apparent
[15:16:52] <b0o-> hi. was here last night trying to settle a wager: when querying a document, are all subdocuments also pulled into memory?
[15:45:05] <umquant> does anyone here have first hand experience with logging real time sensor data from multiple sensors?
[15:53:20] <umquant> stefandxm, The plan is to have a daily document that we preallocate with data outside sensor range. We get values back every minute
[15:53:34] <b0o-> interesting. this conversation deals with the subject of another question i had regarding the storage of server metric data.. may i see your modeling approach umquant?
[15:54:22] <umquant> we log data from 50 sensors. So this schema has 0-23 hours each hour has 0-60 minutes and each minute has values from sensor 0-49
[16:00:57] <stefandxm> so i dont see why mongo shouldnt be able to handle it
[16:01:12] <umquant> Do you think it would be bad to have a document per sensor per minute? So every unit has 50 sensors which means 1440 documents per sensor per day
[16:12:42] <b0o-> my data structure is coming from a flat file that's 210MB, which is obviously too large for a single document, but contains many, many nested subdocuments
[16:13:08] <umquant> b0o-, like stefandxm asked regarding my case, how will the data need to be accessed?
[16:13:19] <umquant> How does it need to be grouped, etc
[16:13:55] <b0o-> ultimately, i would have hundreds of these, and since the application is populating a UI i would be pulling back tiny subsets of data from across many or all of the documents at a time
[16:14:56] <umquant> b0o-, In my instance for generalized views I pulled from special non granular documents
[16:15:20] <umquant> so the minute data is only needed when viewing one "unit" during a specific time frame
[16:15:45] <b0o-> each document is the entire configuration mbean tree of a weblogic domain, including all servers, each with its own sprawling tree of configuration items.
[16:16:27] <b0o-> so mine aren't data that change much over time. where for metrics i'm using different collections.
[16:17:40] <b0o-> i've got a half dozen of these documents in a collection now, and it's very fast.
[16:17:49] <b0o-> i just don't want to paint myself into a corner.
[16:23:15] <b0o-> umquant: to map my question to your data model, when accessing a minute within a particular hour, are all of the minutes for that hour being loaded into memory?
[16:23:48] <b0o-> and when accessing an hour, all all of the sensor readings for all of the minutes in that hour being loaded?
[16:24:25] <umquant> I am unsure of the answer to that specific question. I do know how ever when I searched for certain hours minutes ranges the response time was significantly less than fidning the entire daily document
[16:24:50] <b0o-> yeah, that question doesn't really fit your use case...
[16:25:27] <b0o-> out of curiosity, what tools do you recommend for introspecting the query engine runtime?
[16:30:17] <umquant> b0o-, As I said early I am vert new to mongo. I have using the response time of my db client as a rough query time estimate
[17:32:39] <bcave> read up the "does mongodb require a lot of ram" section
[17:35:18] <dman777_alter> hmmm...what exactly is a memory mapped file? I know of OS level syncing where data resides in memory and then is synced to disk.
[17:38:15] <dman777_alter> ah...ok. So then....it does cache it's files. Does this mean if memory is available it will cache all of it's collections and documents?
[17:38:43] <stefandxm> the operating system is in charge of what will be in ram and what will be in disk
[17:40:10] <dman777_alter> stefandxm: oh...ok. If that is the case...'MongoDB automatically uses all free memory on the machine as its cache.'...that is the OS deciding this...not Mongo, correct? Because Linux uses all free memory for I/O cacheing anyways
[17:43:19] <dman777_alter> so it's safe to say that with the beahvior of mongodb, all buffered memory for linux I/O disk...the operating system will give all what is left over from that to Mongo
[17:43:33] <stefandxm> its not safe to say anything
[17:43:56] <dman777_alter> ya, agreed...but logically it has to be that
[17:43:59] <stefandxm> but mongodb likes operatingsystems that gives it well oiled memory mapped files ie in ram
[17:44:19] <stefandxm> you could say that if mongodb doesnt get plenty of ram it will not be an awesome database
[17:44:34] <dman777_alter> are linux systems geared to mongodb likeing?
[18:21:41] <WormDrink> aerospikes community barely exists - nobody in irc, 10 questions on stackoverflow
[18:21:43] <cheeser> couple that with filtered replica sets and it'd be interesting to use for testing against "live" data
[18:42:55] <umquant> I have a document for sensor that shows its values every minute by breaking up minutes in an hour and hours in a day https://gist.github.com/anonymous/d45bde18b79f4947aad3
[18:43:15] <umquant> Could anyone assist me on the find query needed to get hour and or minute ranges?
[18:55:27] <WormDrink> can a AP system be considered ACID ?
[19:33:05] <WormDrink> Can an AP System (CAP theorem related, system that provides availablity in case of partitioning) be ACID (specifically durable) ?
[19:51:49] <tscanausa> WormDrink: If implemented correctly yes. If you do not have a majority of nodes that can write to a value to disk and then on read a majority agree on the answer.
[19:53:24] <WormDrink> tscanausa, aerospike allows writes to minority still
[19:53:39] <WormDrink> in case of partitioning - then they try sync up later
[19:55:21] <tscanausa> In most systems trying to solve the CAP problem make trade offs. in the case of areospike they are trading the consistency needed for ACID for partitioning.
[20:13:01] <betty> I have a question about roles that I am having trouble finding the answer to. What role is required for mapreduce operations?