PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Sunday the 9th of August, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:21:54] <Axy> How can I add field to every item in my collection
[00:21:58] <Axy> values will be different for all
[00:22:32] <Axy> I tried to do toArray and foreach there, then inserd(modifieddocs,..) but I'm getting buffer errors
[00:22:43] <StephenLynx> bulkWrite
[00:22:44] <Axy> and it makes sense because I'm modifying the whole collection in memory and trying to push in
[00:23:50] <Axy> StephenLynx, hmm are they executed in order?
[00:24:24] <StephenLynx> by default, yes.
[00:24:30] <StephenLynx> at least on the node driver
[00:24:38] <Axy> wait I'll share a very small codebit
[00:24:53] <StephenLynx> why does that matter?
[00:25:24] <Axy> https://gist.github.com/anonymous/8f9408e9e9c10e4e61e1 StephenLynx
[00:25:29] <Axy> because I believe it explains the issue
[00:25:53] <Axy> I need "index" values to all my existing documents, I already have a method to add index to new documents but I needed to add this to old ones
[00:26:06] <Axy> so this was my approach
[00:26:24] <Axy> (make a copy of the whole collection with indexes)
[00:27:13] <StephenLynx> wtf
[00:27:25] <StephenLynx> you can just set the index.
[00:28:05] <StephenLynx> ah, wait
[00:28:28] <StephenLynx> by index you mean you are adding a sequential field, is that it?
[00:28:34] <Axy> yes StephenLynx
[00:28:50] <Axy> I'm struggling with this issue for a week now
[00:28:53] <StephenLynx> you don`t need to copy and create a new collection either.
[00:29:00] <Axy> it's just for safety
[00:29:05] <StephenLynx> just use a bulk write
[00:29:09] <StephenLynx> and set the values
[00:29:09] <Axy> in case I mess up
[00:29:23] <Axy> Okay - but then the issue is still, I need the old values
[00:29:32] <StephenLynx> what old values?
[00:29:37] <Axy> and right now I'm getting a memory issue when I find all
[00:29:58] <Axy> because it tries to keep everything in an array, I think my collection is too big to do so
[00:30:14] <Axy> How can I go over each item in order
[00:30:30] <StephenLynx> find, sort, each
[00:30:56] <Axy> each is a method?!
[00:30:57] <Axy> wtf
[00:31:07] <StephenLynx> let me check
[00:32:24] <StephenLynx> each exists, but I suggest you use next
[00:32:39] <StephenLynx> use next to create the operations on the bulk array
[00:33:02] <StephenLynx> when there are no more documents, write the operations
[00:33:06] <Axy> but next returns one document only
[00:33:16] <StephenLynx> yes
[00:33:23] <StephenLynx> you call it until there are no more documents
[00:33:35] <Axy> oh I see
[00:33:46] <Axy> since everything's async I don't know what's doable and what's not any more
[00:34:03] <Axy> can I just do .next and bulkwrite in a for loop, would they work in the exact order
[00:34:07] <Axy> (items added I mean)
[00:34:10] <StephenLynx> no
[00:34:27] <StephenLynx> you will have to use asynchronous recursion.
[00:34:49] <Axy> oh god
[00:34:55] <Axy> things get even more complicated every secodns
[00:35:02] <Axy> I'm becoming an async recursion myself
[00:35:03] <StephenLynx> is not complicated
[00:35:06] <StephenLynx> just different :v
[00:35:28] <Axy> StephenLynx, if you can guide me on how to aooroach this, I2d be extremely happy
[00:35:42] <StephenLynx> on function A, you get the cursor, right?
[00:35:47] <Axy> yes
[00:35:56] <StephenLynx> then function A calls function B and passes the cursor
[00:36:15] <Axy> okay
[00:36:18] <Axy> so they're nested
[00:36:22] <StephenLynx> no
[00:36:44] <StephenLynx> function B calls next and check if anything arrived. if it did, it adds the operation to the operations array and calls function B again passing the cursor and the operations array
[00:36:56] <Axy> T___________T
[00:37:14] <StephenLynx> if nothing arrives on B's next, it calls function C passing the operations it assembled
[00:37:48] <Axy> okay I will try
[00:37:52] <Axy> I understand the concept
[00:38:15] <StephenLynx> if you keep using node/io long enough it will become natural to you.
[00:38:31] <Axy> If I could hold the whole collection in memory via toArray, could I just do, like, a for lop?
[00:38:35] <Axy> because I already have the items
[00:38:38] <StephenLynx> once you become more concerned with scope rather than with sequence of written code
[00:38:44] <StephenLynx> you coul.
[00:38:55] <StephenLynx> use a toArray, iterate it and assemble the operations
[00:39:15] <StephenLynx> try using projection
[00:39:17] <Axy> ok, but unfortunately whole toArray does not fit into 5usd droplet I own
[00:39:28] <Axy> projection?
[00:39:31] <StephenLynx> to get the least amount of data from the documents
[00:39:35] <Axy> Hm
[00:39:40] <StephenLynx> what information you need from these documents?
[00:39:54] <Axy> I need an incemental index on the documents
[00:40:01] <Axy> to be able to call them with an order when I need
[00:40:05] <Axy> like "gie me 18th document"
[00:40:07] <StephenLynx> ah, you are copying the whole thing
[00:40:13] <StephenLynx> so you will need the whole document.
[00:40:14] <Axy> so I need documents as they are, plus, an order
[00:40:17] <Axy> Yes
[00:40:23] <StephenLynx> if you were just updating them
[00:40:33] <StephenLynx> you could make it return just the bare minimum information on them
[00:40:36] <Axy> I mean I thought calling the documents with an index numver would be possible anyway
[00:40:38] <StephenLynx> probably just the _id would work
[00:40:39] <Axy> like, naturally
[00:40:59] <Axy> Hm I can ıpdate them
[00:41:02] <Axy> *update
[00:44:23] <Axy> hm just because I need an id to call the document and $set a field? StephenLynx ?
[00:44:34] <Axy> and I can do it via bulk
[00:44:50] <Axy> Then I'll just do that -- I'll mongodump everything and go this route
[00:44:56] <StephenLynx> yes, just project the _id
[00:45:01] <StephenLynx> and it will probably fit in RAM
[00:45:08] <Axy> yes it does, tried before
[00:45:18] <Axy> I already keep a portion of everything in ram
[00:45:23] <Axy> for speed
[00:46:20] <StephenLynx> since mongo has it`s own memcache
[00:46:24] <StephenLynx> that provides little gain
[00:46:49] <Axy> I can't believe I can't pull an already indexed field with an order/index number though
[00:47:07] <Axy> I mean okay the object id is unique and useful but I don't get why otherwise is not possible
[00:47:41] <StephenLynx> why do you need this number if you can use the array index?
[00:48:12] <Axy> I want to get rid of arrays and handle things with queries
[00:48:24] <Axy> when someone visits domain.com/post/1 I want them to see first post
[00:48:33] <Axy> and domain.com/post/1000 is the 1000th one
[00:48:56] <Axy> I tried using skip but it became really really bad
[00:48:58] <StephenLynx> so why don`t you just skip the right amount in your collection using the sort method you have?
[00:49:10] <Axy> skip is extremely slow
[00:49:12] <StephenLynx> how bad?
[00:49:17] <Axy> it takes a few seconds to reach high numbers
[00:49:34] <Axy> to be exact, it takes around 7 seconds to reach post 40000
[00:49:43] <Axy> that bad
[00:50:48] <Axy> I mean I'm open to ideas
[00:51:26] <StephenLynx> yeah, using the incremented field is an option
[00:51:34] <StephenLynx> as some soft of pre-aggregation
[00:52:05] <Axy> but I'm just unable to imagine a solution, for a (example) university database where there are only 3 classes, maths, physics, and history --- and all students (seperate documents) have scores for all three fields
[00:52:17] <Axy> what if I want get the 10th student in history scores
[00:52:33] <Axy> I was imagining this to work like, indexing all fields, and somehow getting the n'th item, easily
[00:52:38] <Axy> depending on how I sort
[00:52:52] <StephenLynx> you could also save the HTML for that document
[00:53:09] <StephenLynx> so you use it's filename on gridfs
[00:53:36] <Axy> Hm, not sure if I understand
[00:53:46] <StephenLynx> see lynxhub.com
[00:54:19] <Axy> so this is a sample document: {name:"lazykid",scores:{history:12,maths21,physics:67}}
[00:54:23] <Axy> checking
[00:54:46] <StephenLynx> http://lynxhub.com/lynxchan/preview/41.html
[00:54:58] <StephenLynx> I don't look for the document id 41 on board lynxchan
[00:55:12] <StephenLynx> I save an html page for it and output it
[00:55:36] <Axy> Oh but it's a lot of prrocessing
[00:55:41] <Axy> for this other example for instance
[00:55:47] <StephenLynx> is not
[00:55:52] <Axy> for the "classes" example, because scores are dynamic
[00:55:55] <StephenLynx> is even less, actually
[00:55:57] <Axy> and orders change
[00:56:13] <StephenLynx> because instead of generating the page on every request, you generate it once
[00:56:26] <Axy> right, but if the number is static
[00:56:35] <Axy> I mean, if 41 returns the sme thing every time
[00:56:54] <Axy> but if it's an order of success, that changes, then it matters
[00:57:10] <Axy> I men if 41 is the "41th highest score student in maths class"
[00:57:10] <StephenLynx> i understand your case now
[00:57:20] <StephenLynx> you use the path as a parameter
[00:57:26] <StephenLynx> instead of using a regular get parameter
[00:57:30] <Axy> so yeah, we already have indexes, we can sort things by a specific order, but we can't reach them by index number
[00:57:51] <StephenLynx> this is what I would do in your case:
[00:58:08] <StephenLynx> I would have pages listing several elements per page.
[00:58:20] <StephenLynx> each element would have a link using it's _id
[00:58:46] <StephenLynx> now the page that opens the element don`t need to use its index, because the index is already being used on the page that links to the element.
[00:58:49] <StephenLynx> understand?
[00:59:04] <Axy> Hm not sure
[00:59:08] <Axy> wait let me process
[00:59:29] <Axy> can you examplify
[00:59:33] <Axy> I know I want too much
[00:59:39] <Axy> just to understand, if it's easy
[01:00:07] <StephenLynx> it is easy, see lynxhub.com/logs.js
[01:00:40] <StephenLynx> you see the pages list objects in order
[01:01:01] <StephenLynx> imagine if each entry had a page of it's own and there would be links to it
[01:01:19] <Axy> hm
[01:01:24] <StephenLynx> http://lynxhub.com/boards.js even better
[01:01:27] <Axy> so in my example, "history"
[01:01:38] <StephenLynx> the boards are listed using certain parameters in order
[01:01:54] <StephenLynx> and each board have a link of it's own that uses not the order, but a unique field
[01:02:07] <StephenLynx> so when I open a board, I don't need the order of the board
[01:02:27] <Axy> oh I udnerstand, but it works when you need a visual order
[01:02:36] <Axy> it does not work when you wnt to view history/589
[01:02:47] <Axy> (589th most successfull student in history class))
[01:02:52] <StephenLynx> you can put that number on the page listing the elements.
[01:03:16] <Axy> but the page has to be generated every time
[01:03:17] <Axy> still
[01:03:31] <Axy> I mean I'm asking for too much maybe, butwhat if scores changed every few seconds
[01:03:38] <StephenLynx> you would have to skip, however much less often
[01:03:50] <StephenLynx> then the user must refresh the page listing the stuff
[01:04:01] <Axy> maybe not
[01:04:05] <Axy> maybe they just want to visit url
[01:04:26] <StephenLynx> sometimes people want too much
[01:04:33] <Axy> haha
[01:04:39] <StephenLynx> you job is not to please them, but to give them what they need.
[01:04:43] <StephenLynx> not what they want.
[01:05:02] <Axy> I generally create myself unsolvable problems and try to get out of it
[01:05:07] <Axy> mongodb is the first db I'm learning
[01:05:09] <StephenLynx> a system aimed to please above anything else is pretty much doomed, in my personal experience.
[01:05:14] <Axy> so I'm not familiar with even db concepts that much
[01:05:37] <Axy> but the thing is, I really don2t get why I2m not able to get something that's already indexed, via index number
[01:05:46] <Axy> my issue with mongodb atm is as simple as that
[01:06:29] <StephenLynx> because thats how mongo was designed.
[01:06:44] <StephenLynx> and the developers focus so solve the needs of their target demographic.
[01:07:03] <StephenLynx> instead of pleasing people outside this target demographic
[01:07:37] <Axy> is therany nosql b that can do this
[01:07:45] <StephenLynx> dunno
[01:07:59] <StephenLynx> tbh, I have a hunch your system do not fit mongo.
[01:08:18] <StephenLynx> college management system?
[01:08:34] <Axy> maybe, I just use it because I plug in huge request results easily in
[01:08:38] <Axy> as a bson
[01:08:43] <Axy> and seach through them
[01:08:50] <Axy> and all of my documents have differnt structure
[01:08:53] <Axy> well, most
[01:09:09] <StephenLynx> is it a management system of sorts?
[01:09:14] <Axy> If I wanted to create the same thing via fields it would take incredible abmounts of time, not sure if all is even possible
[01:09:27] <Axy> No it's just an imgeboard thing where people can paste links to websites and I save them there
[01:09:28] <StephenLynx> give me an overview of it
[01:09:39] <StephenLynx> ah
[01:09:58] <Axy> so tumblr's api result is differnt than 500px for instance
[01:10:06] <Axy> but I still save the whole api result there as a document
[01:10:19] <Axy> "just in case"
[01:10:32] <Axy> and all of the added images have a numbe.. for instance 5000th image
[01:10:35] <StephenLynx> if it is an imageboard of sorts, I think I would still use saved HTML
[01:10:37] <Axy> it's the 5000th image that's added
[01:10:55] <Axy> that way without having access to the board index or anyting like that people can just try differnt url's and hunt for cool stuff
[01:11:16] <StephenLynx> I just create urls using this sequence
[01:11:28] <StephenLynx> and each board have a counter that I use $inc to get the next one
[01:11:45] <StephenLynx> files use it, threads, posts
[01:11:57] <Axy> maybe it's the right way to do this, I just want to do this dynamically. For two reasons -- one is tht I really love dynamic stuff and two is that this is not really a commercial project or something, this is just me learning nodejs and mongodb that's all
[01:12:00] <StephenLynx> http://lynxhub.com/lynxchan/media/t_28.jpg
[01:12:05] <StephenLynx> see the 28?
[01:12:11] <Axy> yes!
[01:12:26] <StephenLynx> every file has a number in the name
[01:12:34] <StephenLynx> http://lynxhub.com/lynxchan/media/1.png
[01:12:48] <StephenLynx> but the file will always be there
[01:12:54] <StephenLynx> and dynamic is hard to do with heavy loads.
[01:12:59] <Axy> Yes I understand, her is what I build for the counter thing https://gist.github.com/anonymous/6d6119f5c45f82e64767
[01:13:13] <Axy> "counters" is a collection
[01:13:21] <Axy> with just... cunter objects
[01:13:28] <Axy> keeps the count of how many stuff in which collection
[01:13:47] <StephenLynx> I use each board documents to count.
[01:13:57] <Axy> I just increment it - simply just use this function when I'm inserting new docs
[01:13:58] <StephenLynx> because every file and post is bound by it's board.
[01:14:07] <Axy> Hm
[01:31:52] <FailBit> is it safe to run createIndex() while the set is being used
[01:40:36] <cheeser> where set == replica set?
[01:41:26] <FailBit> no replica here
[01:43:23] <FailBit> confession: I have no idea how to actually run mongodb, I'm not responsible for that
[01:43:29] <FailBit> I'd just like to get this field indexed
[01:43:52] <StephenLynx> I run ensure index everytime my software starts.
[01:49:37] <FailBit> I know I'm going to sound dumb here but if I run this
[01:49:39] <FailBit> db.tag_changes.createIndex({tag_id: 1})
[01:49:49] <FailBit> on production, what will happen on requests during index generation?
[01:51:32] <FailBit> what if it were this? db.tag_changes.createIndex({tag_id: 1}, {background: true})
[01:54:46] <FailBit> ?
[01:55:52] <cheeser> creating an index is a blocking operation but it will yield to other operations periodically so you don't lock the entire time.
[01:57:41] <FailBit> 10 rpm on tag_changes endpoints
[01:57:52] <FailBit> will this slow them down considerably?
[02:00:48] <FailBit> oh, if I actually look at things tag_changes is about 500 finds per minute
[02:01:35] <FailBit> with 3500511 objects in the collection
[02:02:58] <FailBit> I guess it would be better to have scheduled downtime to add this index?
[02:03:43] <StephenLynx> yes
[02:03:49] <cheeser> i don't imagine it'd take long to index that
[02:03:59] <cheeser> just schedule it in the middle of the night/weekend.
[02:04:15] <FailBit> weekend is when we get most of our traffic :p
[02:04:33] <FailBit> lightest traffic is tuesday morning
[02:05:30] <cheeser> there you go, then
[02:06:43] <FailBit> I'm going to pull down the latest dump of tag_changes and see how long it takes to index locally...
[02:07:23] <cheeser> good call.
[02:07:45] <cheeser> though keep in mind hardware differences and activity on the machine will vary the results
[02:07:58] <FailBit> yeah, production server is a good deal beastier than my box
[02:12:27] <FailBit> mongorestore into local db took about 2min
[02:13:27] <FailBit> gah
[02:13:36] <FailBit> exception: Btree::insert: key too large to index
[02:14:48] <FailBit> I'd like to smack whomever thought it was a good idea to use the ID field to store tag names.
[02:15:29] <StephenLynx> oh lawdy
[02:16:05] <FailBit> tag_changes.$tag_id_1 3339 { : \"but-its-worth-trying-10-name-anonymous-2010-10-06-09-27-try-opening-image-in-new-tab-11-name-fuck-2010-10-19-08-50-shits-still-not-working-12-name-ver...\" }
[02:16:08] <FailBit> yeah no
[02:17:02] <FailBit> what's the mongod switch that lets you turn off failure on key too large to index
[02:18:39] <FailBit> --setParameter failIndexKeyTooLong=false
[02:21:29] <FailBit> it took about 10 seconds on my local box
[02:21:32] <FailBit> that seems too fast
[02:22:02] <cheeser> run getIndexes() and see
[02:22:32] <FailBit> "v" : 1, "key" : {"tag_id" : 1}
[02:23:22] <FailBit> so I guess it worked?
[02:23:24] <FailBit> huh
[02:23:37] <FailBit> that seemed way too fast, especially for 900MB of tag_changes
[02:26:13] <FailBit> indexing the images_count field takes about 30s
[02:29:46] <FailBit> cheeser: about how long were you expecting it to take?
[02:29:56] <FailBit> 30s still seems too good to be true
[02:30:42] <cheeser> i didn't have an expectation
[02:35:11] <FailBit> I ran it on production, it took 10 seconds
[02:35:13] <FailBit> gg
[02:38:05] <cheeser> nice
[02:41:50] <akp> is it possible to use limit_execute with 2 backends? ie something like limit_execute data="db outbound carrier limit redis tn-limit tn limit2" >
[02:42:16] <cheeser> what?
[02:44:21] <akp> is it possible to use limit_execute like this - <action application="limit_execute" data="db realm resource1 limit1 redis realm resource2 limit2 " />
[02:48:54] <cheeser> that looks like a redis question
[02:49:29] <akp> ok, then change it to <action application="limit_execute" data="db realm resource1 limit1 db realm resource2 limit2 " />
[02:49:33] <akp> or hash for both of them
[02:49:53] <akp> but i think i probably want to use limit
[02:50:56] <cheeser> you're not going to make any more sense by removing redis fom your question.
[02:52:13] <akp> oh crap
[02:52:14] <akp> sorry
[02:52:21] <akp> wrong room.
[02:52:38] <cheeser> :)
[02:53:22] <akp> =[]
[06:46:38] <steamio> hi all, I have a quick question about replication set - The scenario is that I have a large single instance and Im creating a replication set. Other than restarting the main inctance for replication support, will there be any downtime while the secondary is syncing?
[07:34:33] <joannac> steamio: no downtime, but you'll see some load on the primary
[07:36:59] <steamio> great, thanks
[07:37:43] <steamio> i always get weak at the knees before starting
[08:05:59] <hemmi> Hey there. If I have a schema where posts are embeded in users how do I grab all posts for all users?
[08:07:05] <joannac> db.users.find({}, {posts:1})
[08:19:09] <hemmi> joannac: is that assuming i'm using indexes?
[10:03:43] <avner> Would here be a good place for Qs related to mongo driver behavior in a replicaset?
[10:17:22] <__marco> hello
[12:19:40] <moksa> hi
[12:20:08] <moksa> inside an object i have an element that is a list (python)
[12:20:30] <moksa> its possible to search all objects that contain a specific list element on that element?
[12:20:58] <moksa> like obj['element'] = ['car', 'bicicle']
[12:21:15] <moksa> and i want all obj that contain bicicle on 'element'
[12:22:22] <moksa> I was trying: .find({'element':{'$in': ['bicicle']}}))
[12:22:32] <moksa> but that is returning nothing..
[12:23:44] <joannac> moksa: pastebin one of your documents
[12:33:49] <moksa> joannac: found the issue
[12:33:59] <moksa> the probem was in the emelemt itself
[12:34:04] <moksa> the query was ok
[12:34:06] <moksa> thanks anywas
[13:52:43] <lmatteis> hi guys. so i have a collection with over 11 thousand documents, each representing a log item that recorded the status of my HTTP server
[13:53:00] <lmatteis> now i'd like to present these results to a user as a webpage, but of course i can't send all 11k records to the user
[13:53:30] <lmatteis> what could be a nice query which would show perhaps the average maybe for every month?
[13:55:26] <lmatteis> the problem is that my date is stored as an epoch number :(
[13:55:47] <StephenLynx> pre-aggregation
[13:56:25] <StephenLynx> pre-aggregate these in a separate collection so it would reduce the amount of records drastically
[13:56:35] <StephenLynx> and a skip wouldn't be so slow.
[13:56:39] <StephenLynx> or even better
[13:56:49] <StephenLynx> present pages in intervals of periods
[13:56:57] <StephenLynx> like "fetch me all entries from this date to that date
[13:57:23] <lmatteis> StephenLynx: right but how do i perform the aggregate query in the first hand?
[13:57:34] <lmatteis> like how do i group by month when i store an epoch number?
[13:58:02] <StephenLynx> group{_id:'$myPivotingField'}
[13:58:13] <StephenLynx> however
[13:58:17] <StephenLynx> nevermind, that wouldn't work.
[13:58:30] <StephenLynx> ok
[13:58:52] <StephenLynx> instance a date object, set it to the beginning of the month you wish
[13:58:58] <StephenLynx> then another one to the end of the month you wish
[13:59:06] <StephenLynx> and query all entries between these dates
[13:59:24] <StephenLynx> that is the way 10gen tells you to do paging
[13:59:30] <StephenLynx> because skips are slow for large collections
[13:59:34] <StephenLynx> too slow*
[14:00:20] <StephenLynx> you can then crush this data into a single document
[14:00:38] <StephenLynx> and query the collection holding the crushed data so you handle less documents
[14:19:11] <lmatteis> StephenLynx: not sure i understand
[14:19:23] <lmatteis> simply looking for the group by aggregation query
[14:19:37] <lmatteis> i probably have to somewhat extract the date from the epoch?
[14:23:36] <lmatteis> is there a way to aggregate only on specific types of documents
[14:23:41] <lmatteis> and not all documents of the collection
[14:24:23] <StephenLynx> what do you mean epoch:
[14:24:24] <StephenLynx> ?
[14:24:25] <l403> hello, I've noticed this rather large (16M) file fix exectuable permissions /var/lib/mongodb/local.ns I was wondering what it is
[14:24:30] <StephenLynx> you store the amount of ms?
[14:24:36] <l403> s/fix/with
[14:24:37] <lmatteis> yes
[14:24:40] <StephenLynx> oh
[14:24:42] <StephenLynx> hm
[14:24:51] <StephenLynx> you can query by it too.
[14:24:55] <StephenLynx> its just a number
[14:25:00] <StephenLynx> you store as a number, right?
[14:25:08] <lmatteis> yes
[14:25:20] <StephenLynx> so just extract the epoch of these dates
[14:25:20] <lmatteis> i can just do new Date(number) to get the Date type
[14:25:24] <lmatteis> ok
[14:25:33] <StephenLynx> and I suggest you convert these epochs to dates.
[14:25:35] <lmatteis> but can i run the aggregate() only on specific types of documents?
[14:25:45] <StephenLynx> yes, just use $match
[14:25:52] <StephenLynx> so you weed out anything unwanted.
[14:26:39] <lmatteis> StephenLynx: thanks. aggregate queries are read-only right? i don't want to screw things up in this db
[14:27:48] <StephenLynx> by default yes. and even then, you can only write it to a separate collection.
[14:28:01] <StephenLynx> the collection you read cannot be changed on the aggregate, afaik
[14:28:04] <lmatteis> k there's just too much data i guess :(
[14:28:11] <lmatteis> cuz a match is much slower than a find
[14:28:36] <StephenLynx> yeah, so I heard aggregate is slower than a regular find.
[14:28:44] <StephenLynx> that is why I suggested you pre-aggregate this data.
[14:28:53] <cheeser> i think you *can* write back to the same collection, actually, so you need to be really careful because it'll clobber whatever's in that collection
[14:29:32] <lmatteis> shit really
[14:29:48] <StephenLynx> but you have to explicitly write.
[14:29:49] <lmatteis> i'm just running db.coll.aggregate() from the command line mongodb
[14:30:02] <StephenLynx> unless you put an $out
[14:30:06] <StephenLynx> it wont write
[14:30:31] <lmatteis> ok
[14:34:43] <lmatteis> the aggregate is taking too long
[14:34:59] <lmatteis> there are millions of records
[14:35:13] <lmatteis> what should i do?
[14:35:58] <StephenLynx> pre-aggregate
[14:36:18] <StephenLynx> so it doesnt matter if it takes too long on the first time
[14:36:22] <StephenLynx> it will be done only once
[14:55:29] <mantovani> I'm using mongoexport
[14:55:55] <mantovani> and when I try to export a field which is inside another field I can't
[14:55:57] <mantovani> like
[14:56:04] <mantovani> mongoexport --db BigData --collection TwitterStreaming --fields="user.screen_name"
[14:56:16] <mantovani> if was just "user" without the "." would work
[14:57:57] <MalteJ> hi
[15:00:04] <mantovani> fixed
[15:00:34] <MalteJ> I am new to mongodb and have a problem: I have a document "user". Every user is allowed to have a maximum of let's say 5 documents of type "virtualmachine". How can I realize this, so I don't have problems with concurrency (e.g. when reading all virtualmachines of a user, counting them and then add a new one, if the count is <5)
[15:05:52] <mantovani> MalteJ: you don't want to have concurrency problem ? don't use mongodb
[15:06:17] <mantovani> MalteJ: why don't use postgres ?
[15:07:12] <MalteJ> for most of my stuff eventual consistency is ok
[15:07:52] <MalteJ> I am not sure if I should add an RDBMS to the stack just for some edge cases
[15:08:46] <mantovani> why are you using mongodb ?
[15:08:55] <mantovani> "eventual consistency"
[15:08:58] <mantovani> HAHAHAHAHAH
[15:09:10] <mantovani> you are not using mongodb for the correct propose.
[15:09:41] <MalteJ> could you please elaborate?
[15:09:53] <mantovani> so do you know what is ACID ?
[15:10:00] <mantovani> https://en.wikipedia.org/wiki/ACID
[15:10:08] <MalteJ> yes
[15:10:36] <JamesHarrison> MalteJ: you can atomically update a document
[15:10:48] <mantovani> ACID is what make your DB reliable
[15:10:53] <JamesHarrison> what you can't do, which mantovani is alluding to, is transactions
[15:11:23] <JamesHarrison> ACID isn't what makes your DB reliable it's just a convenient way to describe a particular pattern for consistency management, it's not the only way
[15:11:27] <mantovani> if you need transactions, you should used an ACID DB.
[15:11:38] <MalteJ> JamesHarrison: I could use the update method and query for the user and a timestamp of last change, right?
[15:11:43] <mantovani> JamesHarrison: is the only way who work
[15:12:02] <mantovani> mongodb isn't atomic
[15:12:19] <JamesHarrison> ...
[15:12:31] <mantovani> you will need implmenet it in your application which is a lot more dangers
[15:12:58] <JamesHarrison> okay
[15:12:59] <JamesHarrison> so
[15:13:04] <JamesHarrison> back in the real world
[15:13:09] <JamesHarrison> http://docs.mongodb.org/manual/core/write-operations-atomicity/ is worth a read
[15:13:38] <mantovani> can be now
[15:13:41] <mantovani> it wans't atomic
[15:13:48] <JamesHarrison> you can do two-phase commits, yes, or you can use update if current pattern or similar
[15:13:55] <MalteJ> JamesHarrison: can I query for an array length?
[15:14:25] <JamesHarrison> MalteJ: you could do, though I'd probably store a counter
[15:14:50] <JamesHarrison> mantovani is correct that if you're implementing from scratch, this doesn't sound like something mongodb is likely to be hugely suited for
[15:15:04] <MalteJ> JamesHarrison: yeah, thank you! :)
[15:15:04] <JamesHarrison> RDBMses like PostgreSQL are still the right tool for most jobs
[15:15:50] <mantovani> JamesHarrison: so far I remember mongodb didn't had this isolate operator.
[15:16:04] <mantovani> actually I remember it was in the introduction
[15:16:06] <JamesHarrison> mantovani: it's not that useful since it fails in sharded use-cases though
[15:16:34] <mantovani> oh
[15:16:49] <MalteJ> I have to synchronize my data with the vm hypervisor anyway. So an RDBMS with strong consistency does not really help, when the DB and there real world are not consistent.
[15:16:51] <mantovani> so this operations just can be used if you just has your data at one instance ?
[15:17:25] <MalteJ> mantovani: no, your writes go through the master
[15:17:55] <JamesHarrison> MalteJ: basically what I'd do would be to store an updated_at timestamp or version field on the object, query for that version/timestamp, and then findAndModify using the ID _and_ version to insert the document and update the version/timestamp
[15:17:57] <mantovani> why he said it ?
[15:17:58] <mantovani> 12:14 < JamesHarrison> mantovani: it's not that useful since it fails in sharded use-cases though
[15:18:26] <MalteJ> JamesHarrison: yes, thats what I will do. thanks :)
[15:18:27] <JamesHarrison> mantovani: $isolated does not work on _sharded_ clusters
[15:18:32] <mantovani> oh right
[15:18:35] <mantovani> MalteJ: see it ^^^^^
[15:18:36] <JamesHarrison> replicated/master-slave cluster otoh
[15:18:40] <JamesHarrison> it works fine on
[15:18:46] <JamesHarrison> if you shard it does not
[15:18:53] <mantovani> right
[15:19:01] <JamesHarrison> sharding is not common so it's not an awful approach
[15:19:17] <JamesHarrison> but it's still not a transaction
[15:19:31] <JamesHarrison> and really if you need transactions, you need a transactional RDBMS
[15:19:32] <mantovani> but whatever you agree with me to do what MalteJ want postgres would fet better.
[15:19:50] <JamesHarrison> assuming nothing about the rest of his use-case and deployment approach yes
[15:20:03] <mantovani> yes, we don't know what he has there
[15:20:07] <JamesHarrison> but I'm assuming there's a compelling reason for using mongo over pgsql in his case :)
[15:20:17] <mantovani> :)
[15:20:43] <MalteJ> probably there is not :D
[15:21:33] <MalteJ> what would be a good reason?
[15:22:24] <JamesHarrison> realistically, for most use cases, none
[15:22:33] <JamesHarrison> replication and high availability in theory
[15:22:56] <JamesHarrison> but realistically I've not seen a mongo cluster with better reliability than a well-engineered single postgres instance
[15:23:13] <MalteJ> how do you like the mariadb galera cluster?
[15:23:25] <JamesHarrison> and newer postgresql does replication pretty well now
[15:23:54] <JamesHarrison> never used it, not a fan of mysql/mariadb
[15:24:37] <MalteJ> I have grown up with MySQL. so probably I'd go with this.
[15:24:41] <JamesHarrison> postgresql has lots of options for replication and so on
[15:24:43] <JamesHarrison> however
[15:24:51] <JamesHarrison> in most use cases it's unnecessary
[15:25:18] <JamesHarrison> if you need real HA (and most people don't) then you can do it quite readily these days without any additional software, just core postgresql
[15:25:33] <mantovani> MalteJ: well.... try postgres
[15:25:37] <JamesHarrison> ^
[15:25:51] <JamesHarrison> it's worlds better than mysql, especially for sysadmins, imho ;-)
[15:26:10] <JamesHarrison> one of those lovely bits of software that Just Works, and keeps Just Working
[15:26:30] <mantovani> beyond this so far I remember the postgres date type support is far better
[15:26:51] <mantovani> like if you are working with temporal tables
[15:26:55] <MalteJ> hmm, I have hoped I could abandon ORMs in my code :(
[15:27:12] <mantovani> you can set the value 'infinity' either '9999-31-12'
[15:27:14] <JamesHarrison> native uuids are nice, and you can do a lot of the json-style data manipulation/querying that mongo does in postgres now
[15:27:33] <mantovani> so when you compare both fields postgres won't even compare because 'infinity' is always higher
[15:27:34] <JamesHarrison> MalteJ: ORMs are your friend!
[15:27:56] <MalteJ> there are no good ORMs for golang...
[15:28:10] <MalteJ> Hibernate is quite nice
[15:28:12] <JamesHarrison> eh
[15:28:45] <JamesHarrison> there's gorp, gorm, beedb, hood...
[15:28:56] <mantovani> MalteJ: postgres support json
[15:28:58] <mantovani> native json
[15:29:07] <JamesHarrison> https://github.com/jinzhu/gorm
[15:29:07] <mantovani> actually you can has a field of type "json"
[15:29:23] <MalteJ> yeah, I'll try gorm
[15:30:48] <mantovani> http://bit.ly/1KbWR46
[15:30:50] <mantovani> MalteJ: ^
[15:31:32] <MalteJ> lol, there's an SAP HANA client implementation :D
[15:31:40] <dddh> sql > orm
[15:31:54] <mantovani> not necessary
[15:32:00] <JamesHarrison> dddh: there are very few cases where that is the case
[15:32:16] <JamesHarrison> either for performance or developer friendliness
[15:32:20] <JamesHarrison> not to mention DB portability
[15:32:31] <JamesHarrison> and most ORMs will get out of your way and let you poke the DB direct with a stick if you need to
[15:33:33] <dddh> sql professionals usually use stored procedures for specific database
[15:33:58] <dddh> and portability very often means you do not use what you pay for
[15:34:16] <mantovani> WTF
[15:34:41] <JamesHarrison> brave new world - I'd argue stored procedures have very limited use cases, and the overhead of database agnosticism is outweighed heavily by the improvement in developer productivity and reduced lock-in risk
[15:34:55] <JamesHarrison> there are still places where stored procedures/triggers etc make sense, sure
[15:35:05] <mantovani> yes it does
[15:35:26] <mantovani> like many cases it depends if you have/want use the DB machine's resources
[15:35:49] <mantovani> in OLDAP we do ELT in many cases for performance
[15:35:51] <JamesHarrison> but for 99% of apps where you just need a reliable datastore, even a performant one, ORMs will be just fine
[15:35:56] <mantovani> OLAP*
[15:37:06] <MalteJ> the question is, what is cheaper? coding, testing, debugging for the super-hyper proprietary rdbms functions or just adding more server hardware?
[15:37:29] <JamesHarrison> indeed
[15:37:49] <mantovani> MalteJ: ask it for google
[15:37:55] <JamesHarrison> most of the time unless you're dealing with very tight performance requirements or very large amount of data, the latter is usually the best option
[15:38:12] <mantovani> MalteJ: or much more simple, try amazon and you will have your answer.
[15:41:12] <mantovani> MalteJ: actually when you use a rdbms since it do a lot for you, you will save code, test and debug
[15:41:35] <mantovani> nif you need to implement what it does for you manually so you will have more code, test and debug
[15:41:47] <mantovani> think about it
[15:42:31] <MalteJ> the worst thing I have tried was cassandra. all this denormalized stuff. you write everything 5 times :D
[15:43:41] <JamesHarrison> cassandra is a really nice and powerful datastore
[15:43:50] <JamesHarrison> it is not to be confused with anything remotely like an RDBMS
[15:44:02] <JamesHarrison> but for the right use cases it can't be beat
[15:44:23] <JamesHarrison> I've used it in graph database use cases with apache titan, works fantastically
[15:45:38] <mantovani> JamesHarrison: hbase should performance will for it too.
[15:45:49] <JamesHarrison> mantovani: yes, that's another option
[15:47:59] <mantovani> I'm using mongodb because it fit perfect, I take tweets (which are json) and store it there
[15:48:17] <mantovani> actually I don't have any work
[15:53:15] <dddh> mantovani: how much data from twitter you have?
[15:53:58] <mantovani> few just 80M
[15:54:03] <mantovani> 80 tweets
[15:54:12] <mantovani> sorry, 80M tweets*
[15:55:56] <mantovani> dddh: is about 300GB
[16:42:46] <dddh> oh
[18:26:12] <diegoaguilar> Can someone help? http://dba.stackexchange.com/questions/110508