PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 1st of July, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:26:09] <jigax> how can i update a collection by searching another and if found add a field from the other collection
[00:28:13] <cheeser> two different queries/updates
[00:32:55] <jigax> i have search online and can't really find and reference
[00:33:18] <jigax> do you mind pointing me to some examples
[00:37:41] <joannac> var x = db.coll1.findOne({a:1}, {newfield:1})
[00:38:13] <joannac> db.coll2.update({_id: 1234}, {$set: {mynewfield: x.newfield}})
[00:41:40] <jigax> thanks joannac
[02:56:28] <b0o_> qq inre: mongo's query engine.. anyone wanna take a stab at it?
[02:58:38] <joannac> b0o_: um, why don't you just ask your question?
[02:59:37] <b0o_> i figured that more polite as an intro than "anyone awake?", and it's more succinct than the question itself.
[02:59:39] <b0o_> anyway
[03:00:00] <joannac> playing ping-pong is rarely a productive use of time
[03:00:51] <b0o_> forgive me, i've been idling in #openshift for 72 hours trying to get a question answered. perhaps i presumed too much.
[03:02:30] <b0o_> when executing a query and returning a document, i've been given to understand that mongo loads an entire document into memory before returning the requested fileds. does it also load all sub-document collections?
[03:02:41] <cheeser> actually putting the question out there would let anyone show up answer, perhaps while you're afk, and then you'd have an answer when you came back
[03:02:58] <joannac> I don't know what "sub document collections" means
[03:04:01] <b0o_> how would i properly refer to a group of subdocuments?
[03:04:32] <joannac> an array?
[03:05:39] <joannac> in any case, I believe so
[03:05:51] <joannac> actually...
[03:07:01] <b0o_> i'm pretty new to this way of thinking about data. i was fortunate enough to be able to attend the world conference, and my head's still swimming in the complexities.
[03:09:58] <b0o_> folks from mongodb raised an eyebrow at my data model, and i've been wondering if i'm doing it wrong or how i might do it more efficiently. the direction i take at this point would depend on the answer to my question.
[03:10:22] <joannac> do you have a ticket open?
[03:10:30] <joannac> oh, folks at world
[03:10:42] <b0o_> yes.
[03:11:03] <b0o_> yes = the employees at the conference
[03:11:37] <b0o_> i'm storing the entire mbean tree for an oracle weblogic domain. a single document is at least 200M.
[03:11:54] <cheeser> not in mongo you're not.
[03:12:00] <joannac> not possible. mongodb documents are 16mb
[03:12:03] <joannac> maximum
[03:15:25] <b0o_> sorry.. semantics again i think. i'm parsing a flat file that's 210MB which contains a nested data structure that extends to a couple dozen levels deep in some cases, so no single document is more than a few K.
[03:16:40] <joannac> okay
[03:17:10] <b0o_> now that you mention it, i've heard the 16MB cap mentioned a number of times. that would tend to say that it's not loading the entire subdocument structure into memory then, yeah?
[03:22:16] <joannac> no?
[03:22:27] <joannac> one single document cannot be more than 16MB
[03:23:04] <joannac> I'm not sure how that relates to what you said
[05:57:27] <d0x> Hi, is there a tool like flywaydb.org for Schema migrations for a MongoDB/Java environment?
[05:59:48] <d0x> Ah,i just found one. https://github.com/secondmarket/mongeez/wiki/How-to-use-mongeez
[08:29:22] <salty-horse> is there a way to set a database to be read-only, in the hope of speeding up read operations?
[08:39:34] <Nodex> salty-horse : I don't think so but you can shard / replicate and read from a slave
[08:41:00] <salty-horse> Nodex, don't think I want to go through all that effort. thanks for the reply
[08:45:21] <Nodex> :)
[09:07:03] <luca3m> salty-horse to speed reads you can use a replica set
[09:14:43] <bluework> Hi here.
[09:15:09] <bluework> Any help with the mongodb c driver ? It's triggering a stream error (Failed to buffer 4 bytes within 300000 milliseconds.) when I'm trying to bulk insert
[09:15:58] <bluework> I'm inserting locally, so I can't really pinpoint where it's triggering this error.
[09:16:02] <bmcgee> Hey guys. I have a collection of objects that have a createdAt timestamp. I want to create an analytics summary in which I aggregate for last 10 mins, last 1 hour and last 24 hours. Is it possible to do this in one shot, as opposed to 3 separate calls?
[09:16:58] <bluework> Returned error is Error: No healthy connections.
[09:25:51] <bmcgee> nevermind, reading through the hierarchical aggregation pattern in the docs
[09:34:19] <richthegeek_> bmcgee: you can do similar stuff with the aggregation framework which is much faster
[09:35:08] <richthegeek_> bmcgee: i did this yesterday in fact, nearly all of the operators work as expected with the exception of the arithmetical operators for which there is no obvious translaiton
[09:35:42] <bmcgee> richthegeek_: hmm, how did you structure it to do the multiple roll ups?
[09:36:00] <richthegeek_> bmcgee: cron jobs and spit
[09:36:49] <richthegeek_> bmcgee: it's essentially the same pipeline in all cases but with the group fields operating on different fields between the initial summary and the subsequent rollups
[09:37:29] <richthegeek_> bmcgee: in my case, the results of each gets stored as a document so I do an $unwind on the results in each rollup, not sure if that's relevant to you
[09:37:53] <richthegeek_> a single document per window, i mean
[09:38:08] <bmcgee> hmmm
[09:40:00] <dragonbe> anyone here knows of a good mongodb SaaS supplier? A customer wants to shift to the cloud and implement MongoDB, but couldn’t find any suppliers through normal searches.
[09:40:40] <richthegeek_> dragonbe: mongohq, mongolabs are both well-known but I find them far too expensive personally
[09:41:21] <dragonbe> expensive is relative to the service they offer
[09:41:46] <dragonbe> are these recommended service providers or just a couple you’ve heard of?
[09:42:10] <richthegeek_> they're ones that a lot of people use and have had good experiences with from what i've heard
[09:42:54] <richthegeek_> we've avoided them because setting up a cluster on linode gives us 96GB of SSD backed storage for $120/month... far better value although obviously more effort
[09:44:23] <dragonbe> I understand, well thanks richthegeek_
[09:44:38] <dragonbe> Let me check their pricing models and advice my customers on this
[09:46:04] <richthegeek_> take a look at what MMS are doing with automation as well: https://mms.mongodb.com/learn-more/automation - it's not available yet (or at least, i dont have access) but might give you the balance of flexibility, ease-of-use, and cost that you need
[09:49:28] <joannac> luca3m: replica sets are for redundancy, not read scaling. http://askasya.com/post/canreplicashelpscaling
[11:21:35] <KushS> How do I clear a collection in mongoose?
[12:24:39] <jmccree> Is there any way to get the mongo cli to output strict json?
[12:25:40] <joannac> don't think so
[13:07:35] <svector> Anyone using time series data?
[13:07:47] <svector> I want suggestions on my case...
[13:08:20] <svector> We collect market price for many commodities over many markets...
[14:08:47] <_boot> i have an old piece of software which tails oplog.rs in the local database, however in 2.6 I can no longer create users in the local database, so I can't give this program access to the oplog - is there a way to work around this?
[14:10:02] <saml> let me check
[14:10:17] <tscanausa> That is a very interesting question. ( I dont know the answer but i am curious. )
[14:10:25] <bcave> jmccree: printjson
[14:11:52] <bcave> jmccree: http://docs.mongodb.org/manual/tutorial/write-scripts-for-the-mongo-shell/
[14:12:57] <bcave> has anyone successfully used mongooplog?
[14:13:40] <jmccree> bcave, printjson outputs bson instead of json.
[14:14:13] <bcave> In interactive mode, mongo prints the results of operations including the content of all cursors. In scripts, either use the JavaScript print() function or the mongo specific printjson() function which returns formatted JSON.
[14:15:01] <jmccree> bcave, yeah, but it still doesn't output valid regular json.
[14:15:11] <jmccree> ie: that you can parse with any old json parser.
[14:15:16] <bcave> maybe i'm misunderstanding what you want
[14:15:21] <bcave> i print json all the time with that
[14:15:34] <bcave> and use it in non-mongo js applications
[14:15:52] <jmccree> the timestamp objects break json parsers.
[14:17:20] <bcave> oh, the types?
[14:17:23] <jmccree> yep
[14:17:31] <bcave> for longs and ints
[14:17:42] <bcave> ah, not sure about that
[14:17:57] <jmccree> specifically I was wanting to use rs.status() output, but json generated included: "lastHeartbeat" : ISODate("2014-07-01T14:15:18Z"),
[14:18:35] <jmccree> and the ISODate triggered "invalid char". I ended up just pulling in the data via a mongo driver.
[14:18:43] <jmccree> couldn't find any way to do it cli only.
[14:27:54] <bcave> yeah, just had a look at it and didn't see anything overly apparent
[15:16:52] <b0o-> hi. was here last night trying to settle a wager: when querying a document, are all subdocuments also pulled into memory?
[15:45:05] <umquant> does anyone here have first hand experience with logging real time sensor data from multiple sensors?
[15:45:37] <stefandxm> yup
[15:46:43] <umquant> stefandxm, What interval were you logging at and how many sensors were you getting data back from?
[15:47:53] <stefandxm> we logged to database on second level
[15:48:06] <stefandxm> how many samples where dependant on sensor type
[15:48:59] <umquant> Ok, would you mind giving me your opinion on my prototype schema? I will give you the relevant background information as well
[15:49:28] <tscanausa> MongoWorld had a presentation on that topic
[15:49:43] <stefandxm> tscanausa, with quite a naive approach imo
[15:50:42] <tscanausa> sorry. I dont mean to start a flame war
[15:51:05] <stefandxm> me neither :)
[15:53:20] <umquant> stefandxm, The plan is to have a daily document that we preallocate with data outside sensor range. We get values back every minute
[15:53:34] <b0o-> interesting. this conversation deals with the subject of another question i had regarding the storage of server metric data.. may i see your modeling approach umquant?
[15:54:22] <umquant> we log data from 50 sensors. So this schema has 0-23 hours each hour has 0-60 minutes and each minute has values from sensor 0-49
[15:54:52] <umquant> https://gist.github.com/anonymous/0870cc72df3c2445fab9
[15:55:16] <stefandxm> well. whats the resolution ?
[15:55:34] <stefandxm> i guess the main question is how you wish to read/aggreagate the data once it has been saved
[15:55:49] <stefandxm> just storing sensor values can be done in so many ways
[15:55:53] <umquant> Right
[15:56:02] <b0o-> also, are you planning to typically access the data based on time, or by sensor?
[15:56:19] <umquant> Accessing it by time more than likelyt
[15:56:29] <umquant> since each document is for one "unit" that has 50 sensors
[15:56:54] <umquant> stefandxm, More than likely we will want to see what sensor values were over some date/time range
[15:57:25] <umquant> we were thinking about having daily,weekly,monthly,and yearly documents that are updated at the same time
[15:58:02] <stefandxm> that can be nice
[15:58:39] <umquant> what do you think of the daily schema I linked
[15:59:15] <stefandxm> too tired to think ;)
[15:59:49] <stefandxm> but its dangerous to rely on arrays in documents imo
[15:59:58] <umquant> yea I am running into that now
[16:00:00] <stefandxm> id rather use more documents myself
[16:00:02] <umquant> it is making searching hard
[16:00:10] <umquant> can you explain how your schema worked?
[16:00:13] <stefandxm> and if you add data you might hit the cap
[16:00:17] <stefandxm> we used sql
[16:00:22] <stefandxm> and we bombed away
[16:00:57] <stefandxm> so i dont see why mongo shouldnt be able to handle it
[16:01:12] <umquant> Do you think it would be bad to have a document per sensor per minute? So every unit has 50 sensors which means 1440 documents per sensor per day
[16:01:24] <stefandxm> not at all
[16:04:24] <umquant> That just sounds like a crazy amount of data
[16:04:34] <umquant> I will try it out
[16:04:55] <stefandxm> data is nice
[16:07:22] <umquant> stefandxm, may I ask what type of sensor data you were collecting? Just because I am interested
[16:09:03] <b0o-> stefandxm: also, would you be able to weigh in on my question above?
[16:09:21] <stefandxm> generic sensor data from factories to be able to do realtime alarms/oee
[16:09:29] <b0o-> well, perhaps you'd have an idea as well umquant..
[16:09:45] <umquant> stefandxm, Awesome. My application is very similiar.
[16:09:49] <b0o-> when querying a document, are all subdocuments also pulled into memory?
[16:09:56] <tscanausa> yes
[16:09:58] <umquant> yes
[16:10:09] <cheeser> the entire document is loaded
[16:10:29] <umquant> is this also the case when you specify only certain portions of the document?
[16:10:51] <cheeser> right a query and see but i don't think so
[16:11:24] <umquant> I would say it isn't due to the huge time difference require by the query
[16:11:35] <tscanausa> I think it pull the whole documnet but only sends portions back unless thay are in the index
[16:11:52] <cheeser> very easy to test though.
[16:11:57] <b0o-> cheeser: you were on last night when i came through asking about this.
[16:12:40] <cheeser> i was
[16:12:42] <b0o-> my data structure is coming from a flat file that's 210MB, which is obviously too large for a single document, but contains many, many nested subdocuments
[16:13:08] <umquant> b0o-, like stefandxm asked regarding my case, how will the data need to be accessed?
[16:13:19] <umquant> How does it need to be grouped, etc
[16:13:55] <b0o-> ultimately, i would have hundreds of these, and since the application is populating a UI i would be pulling back tiny subsets of data from across many or all of the documents at a time
[16:14:56] <umquant> b0o-, In my instance for generalized views I pulled from special non granular documents
[16:15:20] <umquant> so the minute data is only needed when viewing one "unit" during a specific time frame
[16:15:45] <b0o-> each document is the entire configuration mbean tree of a weblogic domain, including all servers, each with its own sprawling tree of configuration items.
[16:16:27] <b0o-> so mine aren't data that change much over time. where for metrics i'm using different collections.
[16:17:40] <b0o-> i've got a half dozen of these documents in a collection now, and it's very fast.
[16:17:43] <umquant> ahh
[16:17:49] <b0o-> i just don't want to paint myself into a corner.
[16:23:15] <b0o-> umquant: to map my question to your data model, when accessing a minute within a particular hour, are all of the minutes for that hour being loaded into memory?
[16:23:48] <b0o-> and when accessing an hour, all all of the sensor readings for all of the minutes in that hour being loaded?
[16:24:25] <umquant> I am unsure of the answer to that specific question. I do know how ever when I searched for certain hours minutes ranges the response time was significantly less than fidning the entire daily document
[16:24:50] <b0o-> yeah, that question doesn't really fit your use case...
[16:25:27] <b0o-> out of curiosity, what tools do you recommend for introspecting the query engine runtime?
[16:30:17] <umquant> b0o-, As I said early I am vert new to mongo. I have using the response time of my db client as a rough query time estimate
[17:00:09] <bcave> hi
[17:04:10] <tscanausa> bcave: Hi
[17:12:19] <bcave> how are you tscanausa
[17:13:24] <tscanausa> Depends on the topic. In general good. MongoDb node is making me pull some hair
[17:13:43] <bcave> hahaha, you too
[17:13:52] <bcave> what aspect of it?
[17:14:40] <tscanausa> I have a laundry list
[17:15:09] <bcave> haha. i have a short list. day 3 of rapidly loosing hair over it
[17:15:43] <tscanausa> top of the list is some options you pass in the client it completely ignores even though the client "supports it"
[17:15:52] <bcave> what client?
[17:15:57] <tscanausa> node
[17:16:08] <bcave> mongoose?
[17:16:13] <tscanausa> nope native
[17:16:44] <bcave> what options?
[17:17:27] <tscanausa> connection and socket timeout
[17:20:09] <bcave> don't see those options in the docs
[17:20:20] <bcave> there's a wtimeout, but that's for write concerns, not socket
[17:20:57] <bcave> assuming its on MongoClient or Db ?
[17:24:30] <tscanausa> http://mongodb.github.io/node-mongodb-native/api-generated/server.html#Server
[17:24:38] <tscanausa> it is in the SocketOptions section
[17:28:38] <bcave> ah, server wasn't in the list. is that for administration?
[17:29:14] <bcave> oh there, i see
[17:29:21] <dman777_alter> is it true that mongodb cashes everything to ram? as in all it's collections documents?
[17:29:56] <dman777_alter> s/cashes/caches
[17:30:02] <bcave> pastebin your code tscanausa
[17:30:08] <bcave> your options
[17:30:36] <tscanausa> dman777_alter: depends
[17:31:06] <stefandxm> dman777_alter, its not true
[17:31:20] <dman777_alter> ah...ok. I read that on stack overflow
[17:31:29] <stefandxm> dman777_alter, but its good if its possible
[17:32:12] <bcave> dman777_alter: http://docs.mongodb.org/manual/faq/fundamentals/
[17:32:39] <bcave> read up the "does mongodb require a lot of ram" section
[17:35:18] <dman777_alter> hmmm...what exactly is a memory mapped file? I know of OS level syncing where data resides in memory and then is synced to disk.
[17:37:20] <stefandxm> google it
[17:38:15] <dman777_alter> ah...ok. So then....it does cache it's files. Does this mean if memory is available it will cache all of it's collections and documents?
[17:38:43] <stefandxm> the operating system is in charge of what will be in ram and what will be in disk
[17:38:50] <stefandxm> mongodb doesnt know
[17:40:10] <dman777_alter> stefandxm: oh...ok. If that is the case...'MongoDB automatically uses all free memory on the machine as its cache.'...that is the OS deciding this...not Mongo, correct? Because Linux uses all free memory for I/O cacheing anyways
[17:40:38] <stefandxm> yes
[17:43:19] <dman777_alter> so it's safe to say that with the beahvior of mongodb, all buffered memory for linux I/O disk...the operating system will give all what is left over from that to Mongo
[17:43:33] <stefandxm> its not safe to say anything
[17:43:56] <dman777_alter> ya, agreed...but logically it has to be that
[17:43:59] <stefandxm> but mongodb likes operatingsystems that gives it well oiled memory mapped files ie in ram
[17:44:19] <stefandxm> you could say that if mongodb doesnt get plenty of ram it will not be an awesome database
[17:44:34] <dman777_alter> are linux systems geared to mongodb likeing?
[17:44:39] <dman777_alter> for memory management?
[17:46:17] <dman777_alter> curious...if I had a collections that took about 50 gigs...how much memory would mongodb need to be a awesome database?
[17:48:18] <stefandxm> it would depend on how its accessed
[17:49:14] <stefandxm> since that would matter in a) how much needs to be in ram and b) how good any other database is ;)
[17:49:58] <dman777_alter> stefandxm: my question would be how much needs to be in ram if my collections/documents were 50gb?
[17:50:14] <stefandxm> your question is not well formed
[17:50:35] <dman777_alter> well, from an end point user :)
[17:50:44] <stefandxm> doesnt matter
[17:50:55] <stefandxm> unless you specify how its going to be accessed it wont matter
[17:52:56] <dman777_alter> local access from a local webserver
[18:06:10] <Black0range> Hey guys just had my first look at mongoDB and nosql. as i've understood it nosql is pretty much just one big hashtable?
[18:06:44] <cheeser> "nosql" is a meaningless marketing term.
[18:07:16] <Black0range> That explains why i've never understood what the beep it means
[18:08:04] <stefandxm> "not only sql"
[18:08:28] <stefandxm> and no. "one big hashtable" would be Big Table
[18:08:29] <cheeser> and that's an even dumber, newer marketing term. :)
[18:08:48] <stefandxm> cheeser, sorry?
[18:10:12] <cheeser> commenting on that term "not only sql." it's as silly as "no sql"
[18:10:28] <stefandxm> its the same
[18:10:33] <stefandxm> since ever
[18:10:43] <stefandxm> and its not silly. its not a database *type*
[18:10:52] <stefandxm> a database type would be a graph database, document database or similair
[18:10:57] <cheeser> it is silly
[18:11:02] <stefandxm> no its not silly
[18:11:17] <cheeser> i think it is
[18:11:19] <stefandxm> it means; "think about what you want to do do not lock yourself into a generic approach"
[18:11:33] <stefandxm> ie, dont limit yourself to sql
[18:11:46] <cheeser> i know what it means.
[18:11:54] <stefandxm> so whats silly
[18:12:20] <cheeser> it's reactionary rebranding from "no sql" which was dumb to begin with
[18:12:24] <cheeser> it means about as much
[18:12:33] <stefandxm> what is dumb and silly?
[18:12:42] <cheeser> they're words
[18:12:48] <stefandxm> ..
[18:13:03] <Black0range> Gentlemen, Gentlemen please, I say please contain yourselves!
[18:13:12] <cheeser> i'm contained
[18:13:39] <stefandxm> Black0range, please step back, irc fight in progress
[18:13:57] <Black0range> stefandxm: yessir
[18:19:32] <WormDrink> so
[18:19:36] <WormDrink> we all moving to aerospike now ;)
[18:19:39] <WormDrink> lololol
[18:20:00] <WormDrink> is there a meaningfull comparison of aeropspike vs mongodb anywhere ?
[18:20:05] <cheeser> mongodb supports in memory databases now. ;)
[18:20:18] <WormDrink> cheeser, I know - it actually was possible before even with hax
[18:20:29] <WormDrink> cheeser, but this use case is pretty small
[18:21:05] <Black0range> Wait!
[18:21:06] <cheeser> i dunno. could be interesting for secondaries to be used as reporting targets
[18:21:09] <cheeser> e.g.
[18:21:41] <WormDrink> aerospikes community barely exists - nobody in irc, 10 questions on stackoverflow
[18:21:43] <cheeser> couple that with filtered replica sets and it'd be interesting to use for testing against "live" data
[18:42:55] <umquant> I have a document for sensor that shows its values every minute by breaking up minutes in an hour and hours in a day https://gist.github.com/anonymous/d45bde18b79f4947aad3
[18:43:15] <umquant> Could anyone assist me on the find query needed to get hour and or minute ranges?
[18:55:27] <WormDrink> can a AP system be considered ACID ?
[18:55:40] <WormDrink> sorry
[18:55:53] <WormDrink> no yes - that is question
[18:57:22] <hahuang61> Sanjeev: hi
[18:57:43] <Sanjeev> hi howard
[19:33:05] <WormDrink> Can an AP System (CAP theorem related, system that provides availablity in case of partitioning) be ACID (specifically durable) ?
[19:51:49] <tscanausa> WormDrink: If implemented correctly yes. If you do not have a majority of nodes that can write to a value to disk and then on read a majority agree on the answer.
[19:53:24] <WormDrink> tscanausa, aerospike allows writes to minority still
[19:53:39] <WormDrink> in case of partitioning - then they try sync up later
[19:53:40] <WormDrink> I think
[19:55:21] <tscanausa> In most systems trying to solve the CAP problem make trade offs. in the case of areospike they are trading the consistency needed for ACID for partitioning.
[20:13:01] <betty> I have a question about roles that I am having trouble finding the answer to. What role is required for mapreduce operations?
[20:29:09] <betty> exit
[22:37:05] <geekdm> Does anyone have experience with Text Search?
[22:38:58] <geekdm> I am seeing very different results when I search for filename (works) versus filename.ext (returns a lot more than the previous)
[22:39:52] <geekdm> the period seems to cause the different results