pmxbot IRC Log Viewer

[00:21:54] <Axy> How can I add field to every item in my collection

[00:21:58] <Axy> values will be different for all

[00:22:32] <Axy> I tried to do toArray and foreach there, then inserd(modifieddocs,..) but I'm getting buffer errors

[00:22:43] <StephenLynx> bulkWrite

[00:22:44] <Axy> and it makes sense because I'm modifying the whole collection in memory and trying to push in

[00:23:50] <Axy> StephenLynx, hmm are they executed in order?

[00:24:24] <StephenLynx> by default, yes.

[00:24:30] <StephenLynx> at least on the node driver

[00:24:38] <Axy> wait I'll share a very small codebit

[00:24:53] <StephenLynx> why does that matter?

[00:25:24] <Axy> https://gist.github.com/anonymous/8f9408e9e9c10e4e61e1 StephenLynx

[00:25:29] <Axy> because I believe it explains the issue

[00:25:53] <Axy> I need "index" values to all my existing documents, I already have a method to add index to new documents but I needed to add this to old ones

[00:26:06] <Axy> so this was my approach

[00:26:24] <Axy> (make a copy of the whole collection with indexes)

[00:27:13] <StephenLynx> wtf

[00:27:25] <StephenLynx> you can just set the index.

[00:28:05] <StephenLynx> ah, wait

[00:28:28] <StephenLynx> by index you mean you are adding a sequential field, is that it?

[00:28:34] <Axy> yes StephenLynx

[00:28:50] <Axy> I'm struggling with this issue for a week now

[00:28:53] <StephenLynx> you don`t need to copy and create a new collection either.

[00:29:00] <Axy> it's just for safety

[00:29:05] <StephenLynx> just use a bulk write

[00:29:09] <StephenLynx> and set the values

[00:29:09] <Axy> in case I mess up

[00:29:23] <Axy> Okay - but then the issue is still, I need the old values

[00:29:32] <StephenLynx> what old values?

[00:29:37] <Axy> and right now I'm getting a memory issue when I find all

[00:29:58] <Axy> because it tries to keep everything in an array, I think my collection is too big to do so

[00:30:14] <Axy> How can I go over each item in order

[00:30:30] <StephenLynx> find, sort, each

[00:30:56] <Axy> each is a method?!

[00:30:57] <Axy> wtf

[00:31:07] <StephenLynx> let me check

[00:32:24] <StephenLynx> each exists, but I suggest you use next

[00:32:39] <StephenLynx> use next to create the operations on the bulk array

[00:33:02] <StephenLynx> when there are no more documents, write the operations

[00:33:06] <Axy> but next returns one document only

[00:33:16] <StephenLynx> yes

[00:33:23] <StephenLynx> you call it until there are no more documents

[00:33:35] <Axy> oh I see

[00:33:46] <Axy> since everything's async I don't know what's doable and what's not any more

[00:34:03] <Axy> can I just do .next and bulkwrite in a for loop, would they work in the exact order

[00:34:07] <Axy> (items added I mean)

[00:34:10] <StephenLynx> no

[00:34:27] <StephenLynx> you will have to use asynchronous recursion.

[00:34:49] <Axy> oh god

[00:34:55] <Axy> things get even more complicated every secodns

[00:35:02] <Axy> I'm becoming an async recursion myself

[00:35:03] <StephenLynx> is not complicated

[00:35:06] <StephenLynx> just different :v

[00:35:28] <Axy> StephenLynx, if you can guide me on how to aooroach this, I2d be extremely happy

[00:35:42] <StephenLynx> on function A, you get the cursor, right?

[00:35:47] <Axy> yes

[00:35:56] <StephenLynx> then function A calls function B and passes the cursor

[00:36:15] <Axy> okay

[00:36:18] <Axy> so they're nested

[00:36:22] <StephenLynx> no

[00:36:44] <StephenLynx> function B calls next and check if anything arrived. if it did, it adds the operation to the operations array and calls function B again passing the cursor and the operations array

[00:36:56] <Axy> T___________T

[00:37:14] <StephenLynx> if nothing arrives on B's next, it calls function C passing the operations it assembled

[00:37:48] <Axy> okay I will try

[00:37:52] <Axy> I understand the concept

[00:38:15] <StephenLynx> if you keep using node/io long enough it will become natural to you.

[00:38:31] <Axy> If I could hold the whole collection in memory via toArray, could I just do, like, a for lop?

[00:38:35] <Axy> because I already have the items

[00:38:38] <StephenLynx> once you become more concerned with scope rather than with sequence of written code

[00:38:44] <StephenLynx> you coul.

[00:38:55] <StephenLynx> use a toArray, iterate it and assemble the operations

[00:39:15] <StephenLynx> try using projection

[00:39:17] <Axy> ok, but unfortunately whole toArray does not fit into 5usd droplet I own

[00:39:28] <Axy> projection?

[00:39:31] <StephenLynx> to get the least amount of data from the documents

[00:39:35] <Axy> Hm

[00:39:40] <StephenLynx> what information you need from these documents?

[00:39:54] <Axy> I need an incemental index on the documents

[00:40:01] <Axy> to be able to call them with an order when I need

[00:40:05] <Axy> like "gie me 18th document"

[00:40:07] <StephenLynx> ah, you are copying the whole thing

[00:40:13] <StephenLynx> so you will need the whole document.

[00:40:14] <Axy> so I need documents as they are, plus, an order

[00:40:17] <Axy> Yes

[00:40:23] <StephenLynx> if you were just updating them

[00:40:33] <StephenLynx> you could make it return just the bare minimum information on them

[00:40:36] <Axy> I mean I thought calling the documents with an index numver would be possible anyway

[00:40:38] <StephenLynx> probably just the _id would work

[00:40:39] <Axy> like, naturally

[00:40:59] <Axy> Hm I can ıpdate them

[00:41:02] <Axy> *update

[00:44:23] <Axy> hm just because I need an id to call the document and $set a field? StephenLynx ?

[00:44:34] <Axy> and I can do it via bulk

[00:44:50] <Axy> Then I'll just do that -- I'll mongodump everything and go this route

[00:44:56] <StephenLynx> yes, just project the _id

[00:45:01] <StephenLynx> and it will probably fit in RAM

[00:45:08] <Axy> yes it does, tried before

[00:45:18] <Axy> I already keep a portion of everything in ram

[00:45:23] <Axy> for speed

[00:46:20] <StephenLynx> since mongo has it`s own memcache

[00:46:24] <StephenLynx> that provides little gain

[00:46:49] <Axy> I can't believe I can't pull an already indexed field with an order/index number though

[00:47:07] <Axy> I mean okay the object id is unique and useful but I don't get why otherwise is not possible

[00:47:41] <StephenLynx> why do you need this number if you can use the array index?

[00:48:12] <Axy> I want to get rid of arrays and handle things with queries

[00:48:24] <Axy> when someone visits domain.com/post/1 I want them to see first post

[00:48:33] <Axy> and domain.com/post/1000 is the 1000th one

[00:48:56] <Axy> I tried using skip but it became really really bad

[00:48:58] <StephenLynx> so why don`t you just skip the right amount in your collection using the sort method you have?

[00:49:10] <Axy> skip is extremely slow

[00:49:12] <StephenLynx> how bad?

[00:49:17] <Axy> it takes a few seconds to reach high numbers

[00:49:34] <Axy> to be exact, it takes around 7 seconds to reach post 40000

[00:49:43] <Axy> that bad

[00:50:48] <Axy> I mean I'm open to ideas

[00:51:26] <StephenLynx> yeah, using the incremented field is an option

[00:51:34] <StephenLynx> as some soft of pre-aggregation

[00:52:05] <Axy> but I'm just unable to imagine a solution, for a (example) university database where there are only 3 classes, maths, physics, and history --- and all students (seperate documents) have scores for all three fields

[00:52:17] <Axy> what if I want get the 10th student in history scores

[00:52:33] <Axy> I was imagining this to work like, indexing all fields, and somehow getting the n'th item, easily

[00:52:38] <Axy> depending on how I sort

[00:52:52] <StephenLynx> you could also save the HTML for that document

[00:53:09] <StephenLynx> so you use it's filename on gridfs

[00:53:36] <Axy> Hm, not sure if I understand

[00:53:46] <StephenLynx> see lynxhub.com

[00:54:19] <Axy> so this is a sample document: {name:"lazykid",scores:{history:12,maths21,physics:67}}

[00:54:23] <Axy> checking

[00:54:46] <StephenLynx> http://lynxhub.com/lynxchan/preview/41.html

[00:54:58] <StephenLynx> I don't look for the document id 41 on board lynxchan

[00:55:12] <StephenLynx> I save an html page for it and output it

[00:55:36] <Axy> Oh but it's a lot of prrocessing

[00:55:41] <Axy> for this other example for instance

[00:55:47] <StephenLynx> is not

[00:55:52] <Axy> for the "classes" example, because scores are dynamic

[00:55:55] <StephenLynx> is even less, actually

[00:55:57] <Axy> and orders change

[00:56:13] <StephenLynx> because instead of generating the page on every request, you generate it once

[00:56:26] <Axy> right, but if the number is static

[00:56:35] <Axy> I mean, if 41 returns the sme thing every time

[00:56:54] <Axy> but if it's an order of success, that changes, then it matters

[00:57:10] <Axy> I men if 41 is the "41th highest score student in maths class"

[00:57:10] <StephenLynx> i understand your case now

[00:57:20] <StephenLynx> you use the path as a parameter

[00:57:26] <StephenLynx> instead of using a regular get parameter

[00:57:30] <Axy> so yeah, we already have indexes, we can sort things by a specific order, but we can't reach them by index number

[00:57:51] <StephenLynx> this is what I would do in your case:

[00:58:08] <StephenLynx> I would have pages listing several elements per page.

[00:58:20] <StephenLynx> each element would have a link using it's _id

[00:58:46] <StephenLynx> now the page that opens the element don`t need to use its index, because the index is already being used on the page that links to the element.

[00:58:49] <StephenLynx> understand?

[00:59:04] <Axy> Hm not sure

[00:59:08] <Axy> wait let me process

[00:59:29] <Axy> can you examplify

[00:59:33] <Axy> I know I want too much

[00:59:39] <Axy> just to understand, if it's easy

[01:00:07] <StephenLynx> it is easy, see lynxhub.com/logs.js

[01:00:40] <StephenLynx> you see the pages list objects in order

[01:01:01] <StephenLynx> imagine if each entry had a page of it's own and there would be links to it

[01:01:19] <Axy> hm

[01:01:24] <StephenLynx> http://lynxhub.com/boards.js even better

[01:01:27] <Axy> so in my example, "history"

[01:01:38] <StephenLynx> the boards are listed using certain parameters in order

[01:01:54] <StephenLynx> and each board have a link of it's own that uses not the order, but a unique field

[01:02:07] <StephenLynx> so when I open a board, I don't need the order of the board

[01:02:27] <Axy> oh I udnerstand, but it works when you need a visual order

[01:02:36] <Axy> it does not work when you wnt to view history/589

[01:02:47] <Axy> (589th most successfull student in history class))

[01:02:52] <StephenLynx> you can put that number on the page listing the elements.

[01:03:16] <Axy> but the page has to be generated every time

[01:03:17] <Axy> still

[01:03:31] <Axy> I mean I'm asking for too much maybe, butwhat if scores changed every few seconds

[01:03:38] <StephenLynx> you would have to skip, however much less often

[01:03:50] <StephenLynx> then the user must refresh the page listing the stuff

[01:04:01] <Axy> maybe not

[01:04:05] <Axy> maybe they just want to visit url

[01:04:26] <StephenLynx> sometimes people want too much

[01:04:33] <Axy> haha

[01:04:39] <StephenLynx> you job is not to please them, but to give them what they need.

[01:04:43] <StephenLynx> not what they want.

[01:05:02] <Axy> I generally create myself unsolvable problems and try to get out of it

[01:05:07] <Axy> mongodb is the first db I'm learning

[01:05:09] <StephenLynx> a system aimed to please above anything else is pretty much doomed, in my personal experience.

[01:05:14] <Axy> so I'm not familiar with even db concepts that much

[01:05:37] <Axy> but the thing is, I really don2t get why I2m not able to get something that's already indexed, via index number

[01:05:46] <Axy> my issue with mongodb atm is as simple as that

[01:06:29] <StephenLynx> because thats how mongo was designed.

[01:06:44] <StephenLynx> and the developers focus so solve the needs of their target demographic.

[01:07:03] <StephenLynx> instead of pleasing people outside this target demographic

[01:07:37] <Axy> is therany nosql b that can do this

[01:07:45] <StephenLynx> dunno

[01:07:59] <StephenLynx> tbh, I have a hunch your system do not fit mongo.

[01:08:18] <StephenLynx> college management system?

[01:08:34] <Axy> maybe, I just use it because I plug in huge request results easily in

[01:08:38] <Axy> as a bson

[01:08:43] <Axy> and seach through them

[01:08:50] <Axy> and all of my documents have differnt structure

[01:08:53] <Axy> well, most

[01:09:09] <StephenLynx> is it a management system of sorts?

[01:09:14] <Axy> If I wanted to create the same thing via fields it would take incredible abmounts of time, not sure if all is even possible

[01:09:27] <Axy> No it's just an imgeboard thing where people can paste links to websites and I save them there

[01:09:28] <StephenLynx> give me an overview of it

[01:09:39] <StephenLynx> ah

[01:09:58] <Axy> so tumblr's api result is differnt than 500px for instance

[01:10:06] <Axy> but I still save the whole api result there as a document

[01:10:19] <Axy> "just in case"

[01:10:32] <Axy> and all of the added images have a numbe.. for instance 5000th image

[01:10:35] <StephenLynx> if it is an imageboard of sorts, I think I would still use saved HTML

[01:10:37] <Axy> it's the 5000th image that's added

[01:10:55] <Axy> that way without having access to the board index or anyting like that people can just try differnt url's and hunt for cool stuff

[01:11:16] <StephenLynx> I just create urls using this sequence

[01:11:28] <StephenLynx> and each board have a counter that I use $inc to get the next one

[01:11:45] <StephenLynx> files use it, threads, posts

[01:11:57] <Axy> maybe it's the right way to do this, I just want to do this dynamically. For two reasons -- one is tht I really love dynamic stuff and two is that this is not really a commercial project or something, this is just me learning nodejs and mongodb that's all

[01:12:00] <StephenLynx> http://lynxhub.com/lynxchan/media/t_28.jpg

[01:12:05] <StephenLynx> see the 28?

[01:12:11] <Axy> yes!

[01:12:26] <StephenLynx> every file has a number in the name

[01:12:34] <StephenLynx> http://lynxhub.com/lynxchan/media/1.png

[01:12:48] <StephenLynx> but the file will always be there

[01:12:54] <StephenLynx> and dynamic is hard to do with heavy loads.

[01:12:59] <Axy> Yes I understand, her is what I build for the counter thing https://gist.github.com/anonymous/6d6119f5c45f82e64767

[01:13:13] <Axy> "counters" is a collection

[01:13:21] <Axy> with just... cunter objects

[01:13:28] <Axy> keeps the count of how many stuff in which collection

[01:13:47] <StephenLynx> I use each board documents to count.

[01:13:57] <Axy> I just increment it - simply just use this function when I'm inserting new docs

[01:13:58] <StephenLynx> because every file and post is bound by it's board.

[01:14:07] <Axy> Hm

[01:31:52] <FailBit> is it safe to run createIndex() while the set is being used

[01:40:36] <cheeser> where set == replica set?

[01:41:26] <FailBit> no replica here

[01:43:23] <FailBit> confession: I have no idea how to actually run mongodb, I'm not responsible for that

[01:43:29] <FailBit> I'd just like to get this field indexed

[01:43:52] <StephenLynx> I run ensure index everytime my software starts.

[01:49:37] <FailBit> I know I'm going to sound dumb here but if I run this

[01:49:39] <FailBit> db.tag_changes.createIndex({tag_id: 1})

[01:49:49] <FailBit> on production, what will happen on requests during index generation?

[01:51:32] <FailBit> what if it were this? db.tag_changes.createIndex({tag_id: 1}, {background: true})

[01:54:46] <FailBit> ?

[01:55:52] <cheeser> creating an index is a blocking operation but it will yield to other operations periodically so you don't lock the entire time.

[01:57:41] <FailBit> 10 rpm on tag_changes endpoints

[01:57:52] <FailBit> will this slow them down considerably?

[02:00:48] <FailBit> oh, if I actually look at things tag_changes is about 500 finds per minute

[02:01:35] <FailBit> with 3500511 objects in the collection

[02:02:58] <FailBit> I guess it would be better to have scheduled downtime to add this index?

[02:03:43] <StephenLynx> yes

[02:03:49] <cheeser> i don't imagine it'd take long to index that

[02:03:59] <cheeser> just schedule it in the middle of the night/weekend.

[02:04:15] <FailBit> weekend is when we get most of our traffic :p

[02:04:33] <FailBit> lightest traffic is tuesday morning

[02:05:30] <cheeser> there you go, then

[02:06:43] <FailBit> I'm going to pull down the latest dump of tag_changes and see how long it takes to index locally...

[02:07:23] <cheeser> good call.

[02:07:45] <cheeser> though keep in mind hardware differences and activity on the machine will vary the results

[02:07:58] <FailBit> yeah, production server is a good deal beastier than my box

[02:12:27] <FailBit> mongorestore into local db took about 2min

[02:13:27] <FailBit> gah

[02:13:36] <FailBit> exception: Btree::insert: key too large to index

[02:14:48] <FailBit> I'd like to smack whomever thought it was a good idea to use the ID field to store tag names.

[02:15:29] <StephenLynx> oh lawdy

[02:16:05] <FailBit> tag_changes.$tag_id_1 3339 { : \"but-its-worth-trying-10-name-anonymous-2010-10-06-09-27-try-opening-image-in-new-tab-11-name-fuck-2010-10-19-08-50-shits-still-not-working-12-name-ver...\" }

[02:16:08] <FailBit> yeah no

[02:17:02] <FailBit> what's the mongod switch that lets you turn off failure on key too large to index

[02:18:39] <FailBit> --setParameter failIndexKeyTooLong=false

[02:21:29] <FailBit> it took about 10 seconds on my local box

[02:21:32] <FailBit> that seems too fast

[02:22:02] <cheeser> run getIndexes() and see

[02:22:32] <FailBit> "v" : 1, "key" : {"tag_id" : 1}

[02:23:22] <FailBit> so I guess it worked?

[02:23:24] <FailBit> huh

[02:23:37] <FailBit> that seemed way too fast, especially for 900MB of tag_changes

[02:26:13] <FailBit> indexing the images_count field takes about 30s

[02:29:46] <FailBit> cheeser: about how long were you expecting it to take?

[02:29:56] <FailBit> 30s still seems too good to be true

[02:30:42] <cheeser> i didn't have an expectation

[02:35:11] <FailBit> I ran it on production, it took 10 seconds

[02:35:13] <FailBit> gg

[02:38:05] <cheeser> nice

[02:41:50] <akp> is it possible to use limit_execute with 2 backends? ie something like limit_execute data="db outbound carrier limit redis tn-limit tn limit2" >

[02:42:16] <cheeser> what?

[02:44:21] <akp> is it possible to use limit_execute like this - <action application="limit_execute" data="db realm resource1 limit1 redis realm resource2 limit2 " />

[02:48:54] <cheeser> that looks like a redis question

[02:49:29] <akp> ok, then change it to <action application="limit_execute" data="db realm resource1 limit1 db realm resource2 limit2 " />

[02:49:33] <akp> or hash for both of them

[02:49:53] <akp> but i think i probably want to use limit

[02:50:56] <cheeser> you're not going to make any more sense by removing redis fom your question.

[02:52:13] <akp> oh crap

[02:52:14] <akp> sorry

[02:52:21] <akp> wrong room.

[02:52:38] <cheeser> :)

[02:53:22] <akp> =[]

[06:46:38] <steamio> hi all, I have a quick question about replication set - The scenario is that I have a large single instance and Im creating a replication set. Other than restarting the main inctance for replication support, will there be any downtime while the secondary is syncing?

[07:34:33] <joannac> steamio: no downtime, but you'll see some load on the primary

[07:36:59] <steamio> great, thanks

[07:37:43] <steamio> i always get weak at the knees before starting

[08:05:59] <hemmi> Hey there. If I have a schema where posts are embeded in users how do I grab all posts for all users?

[08:07:05] <joannac> db.users.find({}, {posts:1})

[08:19:09] <hemmi> joannac: is that assuming i'm using indexes?

[10:03:43] <avner> Would here be a good place for Qs related to mongo driver behavior in a replicaset?

[10:17:22] <__marco> hello

[12:19:40] <moksa> hi

[12:20:08] <moksa> inside an object i have an element that is a list (python)

[12:20:30] <moksa> its possible to search all objects that contain a specific list element on that element?

[12:20:58] <moksa> like obj['element'] = ['car', 'bicicle']

[12:21:15] <moksa> and i want all obj that contain bicicle on 'element'

[12:22:22] <moksa> I was trying: .find({'element':{'$in': ['bicicle']}}))

[12:22:32] <moksa> but that is returning nothing..

[12:23:44] <joannac> moksa: pastebin one of your documents

[12:33:49] <moksa> joannac: found the issue

[12:33:59] <moksa> the probem was in the emelemt itself

[12:34:04] <moksa> the query was ok

[12:34:06] <moksa> thanks anywas

[13:52:43] <lmatteis> hi guys. so i have a collection with over 11 thousand documents, each representing a log item that recorded the status of my HTTP server

[13:53:00] <lmatteis> now i'd like to present these results to a user as a webpage, but of course i can't send all 11k records to the user

[13:53:30] <lmatteis> what could be a nice query which would show perhaps the average maybe for every month?

[13:55:26] <lmatteis> the problem is that my date is stored as an epoch number :(

[13:55:47] <StephenLynx> pre-aggregation

[13:56:25] <StephenLynx> pre-aggregate these in a separate collection so it would reduce the amount of records drastically

[13:56:35] <StephenLynx> and a skip wouldn't be so slow.

[13:56:39] <StephenLynx> or even better

[13:56:49] <StephenLynx> present pages in intervals of periods

[13:56:57] <StephenLynx> like "fetch me all entries from this date to that date

[13:57:23] <lmatteis> StephenLynx: right but how do i perform the aggregate query in the first hand?

[13:57:34] <lmatteis> like how do i group by month when i store an epoch number?

[13:58:02] <StephenLynx> group{_id:'$myPivotingField'}

[13:58:13] <StephenLynx> however

[13:58:17] <StephenLynx> nevermind, that wouldn't work.

[13:58:30] <StephenLynx> ok

[13:58:52] <StephenLynx> instance a date object, set it to the beginning of the month you wish

[13:58:58] <StephenLynx> then another one to the end of the month you wish

[13:59:06] <StephenLynx> and query all entries between these dates

[13:59:24] <StephenLynx> that is the way 10gen tells you to do paging

[13:59:30] <StephenLynx> because skips are slow for large collections

[13:59:34] <StephenLynx> too slow*

[14:00:20] <StephenLynx> you can then crush this data into a single document

[14:00:38] <StephenLynx> and query the collection holding the crushed data so you handle less documents

[14:19:11] <lmatteis> StephenLynx: not sure i understand

[14:19:23] <lmatteis> simply looking for the group by aggregation query

[14:19:37] <lmatteis> i probably have to somewhat extract the date from the epoch?

[14:23:36] <lmatteis> is there a way to aggregate only on specific types of documents

[14:23:41] <lmatteis> and not all documents of the collection

[14:24:23] <StephenLynx> what do you mean epoch:

[14:24:24] <StephenLynx> ?

[14:24:25] <l403> hello, I've noticed this rather large (16M) file fix exectuable permissions /var/lib/mongodb/local.ns I was wondering what it is

[14:24:30] <StephenLynx> you store the amount of ms?

[14:24:36] <l403> s/fix/with

[14:24:37] <lmatteis> yes

[14:24:40] <StephenLynx> oh

[14:24:42] <StephenLynx> hm

[14:24:51] <StephenLynx> you can query by it too.

[14:24:55] <StephenLynx> its just a number

[14:25:00] <StephenLynx> you store as a number, right?

[14:25:08] <lmatteis> yes

[14:25:20] <StephenLynx> so just extract the epoch of these dates

[14:25:20] <lmatteis> i can just do new Date(number) to get the Date type

[14:25:24] <lmatteis> ok

[14:25:33] <StephenLynx> and I suggest you convert these epochs to dates.

[14:25:35] <lmatteis> but can i run the aggregate() only on specific types of documents?

[14:25:45] <StephenLynx> yes, just use $match

[14:25:52] <StephenLynx> so you weed out anything unwanted.

[14:26:39] <lmatteis> StephenLynx: thanks. aggregate queries are read-only right? i don't want to screw things up in this db

[14:27:48] <StephenLynx> by default yes. and even then, you can only write it to a separate collection.

[14:28:01] <StephenLynx> the collection you read cannot be changed on the aggregate, afaik

[14:28:04] <lmatteis> k there's just too much data i guess :(

[14:28:11] <lmatteis> cuz a match is much slower than a find

[14:28:36] <StephenLynx> yeah, so I heard aggregate is slower than a regular find.

[14:28:44] <StephenLynx> that is why I suggested you pre-aggregate this data.

[14:28:53] <cheeser> i think you *can* write back to the same collection, actually, so you need to be really careful because it'll clobber whatever's in that collection

[14:29:32] <lmatteis> shit really

[14:29:48] <StephenLynx> but you have to explicitly write.

[14:29:49] <lmatteis> i'm just running db.coll.aggregate() from the command line mongodb

[14:30:02] <StephenLynx> unless you put an $out

[14:30:06] <StephenLynx> it wont write

[14:30:31] <lmatteis> ok

[14:34:43] <lmatteis> the aggregate is taking too long

[14:34:59] <lmatteis> there are millions of records

[14:35:13] <lmatteis> what should i do?

[14:35:58] <StephenLynx> pre-aggregate

[14:36:18] <StephenLynx> so it doesnt matter if it takes too long on the first time

[14:36:22] <StephenLynx> it will be done only once

[14:55:29] <mantovani> I'm using mongoexport

[14:55:55] <mantovani> and when I try to export a field which is inside another field I can't

[14:55:57] <mantovani> like

[14:56:04] <mantovani> mongoexport --db BigData --collection TwitterStreaming --fields="user.screen_name"

[14:56:16] <mantovani> if was just "user" without the "." would work

[14:57:57] <MalteJ> hi

[15:00:04] <mantovani> fixed

[15:00:34] <MalteJ> I am new to mongodb and have a problem: I have a document "user". Every user is allowed to have a maximum of let's say 5 documents of type "virtualmachine". How can I realize this, so I don't have problems with concurrency (e.g. when reading all virtualmachines of a user, counting them and then add a new one, if the count is <5)

[15:05:52] <mantovani> MalteJ: you don't want to have concurrency problem ? don't use mongodb

[15:06:17] <mantovani> MalteJ: why don't use postgres ?

[15:07:12] <MalteJ> for most of my stuff eventual consistency is ok

[15:07:52] <MalteJ> I am not sure if I should add an RDBMS to the stack just for some edge cases

[15:08:46] <mantovani> why are you using mongodb ?

[15:08:55] <mantovani> "eventual consistency"

[15:08:58] <mantovani> HAHAHAHAHAH

[15:09:10] <mantovani> you are not using mongodb for the correct propose.

[15:09:41] <MalteJ> could you please elaborate?

[15:09:53] <mantovani> so do you know what is ACID ?

[15:10:00] <mantovani> https://en.wikipedia.org/wiki/ACID

[15:10:08] <MalteJ> yes

[15:10:36] <JamesHarrison> MalteJ: you can atomically update a document

[15:10:48] <mantovani> ACID is what make your DB reliable

[15:10:53] <JamesHarrison> what you can't do, which mantovani is alluding to, is transactions

[15:11:23] <JamesHarrison> ACID isn't what makes your DB reliable it's just a convenient way to describe a particular pattern for consistency management, it's not the only way

[15:11:27] <mantovani> if you need transactions, you should used an ACID DB.

[15:11:38] <MalteJ> JamesHarrison: I could use the update method and query for the user and a timestamp of last change, right?

[15:11:43] <mantovani> JamesHarrison: is the only way who work

[15:12:02] <mantovani> mongodb isn't atomic

[15:12:19] <JamesHarrison> ...

[15:12:31] <mantovani> you will need implmenet it in your application which is a lot more dangers

[15:12:58] <JamesHarrison> okay

[15:12:59] <JamesHarrison> so

[15:13:04] <JamesHarrison> back in the real world

[15:13:09] <JamesHarrison> http://docs.mongodb.org/manual/core/write-operations-atomicity/ is worth a read

[15:13:38] <mantovani> can be now

[15:13:41] <mantovani> it wans't atomic

[15:13:48] <JamesHarrison> you can do two-phase commits, yes, or you can use update if current pattern or similar

[15:13:55] <MalteJ> JamesHarrison: can I query for an array length?

[15:14:25] <JamesHarrison> MalteJ: you could do, though I'd probably store a counter

[15:14:50] <JamesHarrison> mantovani is correct that if you're implementing from scratch, this doesn't sound like something mongodb is likely to be hugely suited for

[15:15:04] <MalteJ> JamesHarrison: yeah, thank you! :)

[15:15:04] <JamesHarrison> RDBMses like PostgreSQL are still the right tool for most jobs

[15:15:50] <mantovani> JamesHarrison: so far I remember mongodb didn't had this isolate operator.

[15:16:04] <mantovani> actually I remember it was in the introduction

[15:16:06] <JamesHarrison> mantovani: it's not that useful since it fails in sharded use-cases though

[15:16:34] <mantovani> oh

[15:16:49] <MalteJ> I have to synchronize my data with the vm hypervisor anyway. So an RDBMS with strong consistency does not really help, when the DB and there real world are not consistent.

[15:16:51] <mantovani> so this operations just can be used if you just has your data at one instance ?

[15:17:25] <MalteJ> mantovani: no, your writes go through the master

[15:17:55] <JamesHarrison> MalteJ: basically what I'd do would be to store an updated_at timestamp or version field on the object, query for that version/timestamp, and then findAndModify using the ID _and_ version to insert the document and update the version/timestamp

[15:17:57] <mantovani> why he said it ?

[15:17:58] <mantovani> 12:14 < JamesHarrison> mantovani: it's not that useful since it fails in sharded use-cases though

[15:18:26] <MalteJ> JamesHarrison: yes, thats what I will do. thanks :)

[15:18:27] <JamesHarrison> mantovani: $isolated does not work on _sharded_ clusters

[15:18:32] <mantovani> oh right

[15:18:35] <mantovani> MalteJ: see it ^^^^^

[15:18:36] <JamesHarrison> replicated/master-slave cluster otoh

[15:18:40] <JamesHarrison> it works fine on

[15:18:46] <JamesHarrison> if you shard it does not

[15:18:53] <mantovani> right

[15:19:01] <JamesHarrison> sharding is not common so it's not an awful approach

[15:19:17] <JamesHarrison> but it's still not a transaction

[15:19:31] <JamesHarrison> and really if you need transactions, you need a transactional RDBMS

[15:19:32] <mantovani> but whatever you agree with me to do what MalteJ want postgres would fet better.

[15:19:50] <JamesHarrison> assuming nothing about the rest of his use-case and deployment approach yes

[15:20:03] <mantovani> yes, we don't know what he has there

[15:20:07] <JamesHarrison> but I'm assuming there's a compelling reason for using mongo over pgsql in his case :)

[15:20:17] <mantovani> :)

[15:20:43] <MalteJ> probably there is not :D

[15:21:33] <MalteJ> what would be a good reason?

[15:22:24] <JamesHarrison> realistically, for most use cases, none

[15:22:33] <JamesHarrison> replication and high availability in theory

[15:22:56] <JamesHarrison> but realistically I've not seen a mongo cluster with better reliability than a well-engineered single postgres instance

[15:23:13] <MalteJ> how do you like the mariadb galera cluster?

[15:23:25] <JamesHarrison> and newer postgresql does replication pretty well now

[15:23:54] <JamesHarrison> never used it, not a fan of mysql/mariadb

[15:24:37] <MalteJ> I have grown up with MySQL. so probably I'd go with this.

[15:24:41] <JamesHarrison> postgresql has lots of options for replication and so on

[15:24:43] <JamesHarrison> however

[15:24:51] <JamesHarrison> in most use cases it's unnecessary

[15:25:18] <JamesHarrison> if you need real HA (and most people don't) then you can do it quite readily these days without any additional software, just core postgresql

[15:25:33] <mantovani> MalteJ: well.... try postgres

[15:25:37] <JamesHarrison> ^

[15:25:51] <JamesHarrison> it's worlds better than mysql, especially for sysadmins, imho ;-)

[15:26:10] <JamesHarrison> one of those lovely bits of software that Just Works, and keeps Just Working

[15:26:30] <mantovani> beyond this so far I remember the postgres date type support is far better

[15:26:51] <mantovani> like if you are working with temporal tables

[15:26:55] <MalteJ> hmm, I have hoped I could abandon ORMs in my code :(

[15:27:12] <mantovani> you can set the value 'infinity' either '9999-31-12'

[15:27:14] <JamesHarrison> native uuids are nice, and you can do a lot of the json-style data manipulation/querying that mongo does in postgres now

[15:27:33] <mantovani> so when you compare both fields postgres won't even compare because 'infinity' is always higher

[15:27:34] <JamesHarrison> MalteJ: ORMs are your friend!

[15:27:56] <MalteJ> there are no good ORMs for golang...

[15:28:10] <MalteJ> Hibernate is quite nice

[15:28:12] <JamesHarrison> eh

[15:28:45] <JamesHarrison> there's gorp, gorm, beedb, hood...

[15:28:56] <mantovani> MalteJ: postgres support json

[15:28:58] <mantovani> native json

[15:29:07] <JamesHarrison> https://github.com/jinzhu/gorm

[15:29:07] <mantovani> actually you can has a field of type "json"

[15:29:23] <MalteJ> yeah, I'll try gorm

[15:30:48] <mantovani> http://bit.ly/1KbWR46

[15:30:50] <mantovani> MalteJ: ^

[15:31:32] <MalteJ> lol, there's an SAP HANA client implementation :D

[15:31:40] <dddh> sql > orm

[15:31:54] <mantovani> not necessary

[15:32:00] <JamesHarrison> dddh: there are very few cases where that is the case

[15:32:16] <JamesHarrison> either for performance or developer friendliness

[15:32:20] <JamesHarrison> not to mention DB portability

[15:32:31] <JamesHarrison> and most ORMs will get out of your way and let you poke the DB direct with a stick if you need to

[15:33:33] <dddh> sql professionals usually use stored procedures for specific database

[15:33:58] <dddh> and portability very often means you do not use what you pay for

[15:34:16] <mantovani> WTF

[15:34:41] <JamesHarrison> brave new world - I'd argue stored procedures have very limited use cases, and the overhead of database agnosticism is outweighed heavily by the improvement in developer productivity and reduced lock-in risk

[15:34:55] <JamesHarrison> there are still places where stored procedures/triggers etc make sense, sure

[15:35:05] <mantovani> yes it does

[15:35:26] <mantovani> like many cases it depends if you have/want use the DB machine's resources

[15:35:49] <mantovani> in OLDAP we do ELT in many cases for performance

[15:35:51] <JamesHarrison> but for 99% of apps where you just need a reliable datastore, even a performant one, ORMs will be just fine

[15:35:56] <mantovani> OLAP*

[15:37:06] <MalteJ> the question is, what is cheaper? coding, testing, debugging for the super-hyper proprietary rdbms functions or just adding more server hardware?

[15:37:29] <JamesHarrison> indeed

[15:37:49] <mantovani> MalteJ: ask it for google

[15:37:55] <JamesHarrison> most of the time unless you're dealing with very tight performance requirements or very large amount of data, the latter is usually the best option

[15:38:12] <mantovani> MalteJ: or much more simple, try amazon and you will have your answer.

[15:41:12] <mantovani> MalteJ: actually when you use a rdbms since it do a lot for you, you will save code, test and debug

[15:41:35] <mantovani> nif you need to implement what it does for you manually so you will have more code, test and debug

[15:41:47] <mantovani> think about it

[15:42:31] <MalteJ> the worst thing I have tried was cassandra. all this denormalized stuff. you write everything 5 times :D

[15:43:41] <JamesHarrison> cassandra is a really nice and powerful datastore

[15:43:50] <JamesHarrison> it is not to be confused with anything remotely like an RDBMS

[15:44:02] <JamesHarrison> but for the right use cases it can't be beat

[15:44:23] <JamesHarrison> I've used it in graph database use cases with apache titan, works fantastically

[15:45:38] <mantovani> JamesHarrison: hbase should performance will for it too.

[15:45:49] <JamesHarrison> mantovani: yes, that's another option

[15:47:59] <mantovani> I'm using mongodb because it fit perfect, I take tweets (which are json) and store it there

[15:48:17] <mantovani> actually I don't have any work

[15:53:15] <dddh> mantovani: how much data from twitter you have?

[15:53:58] <mantovani> few just 80M

[15:54:03] <mantovani> 80 tweets

[15:54:12] <mantovani> sorry, 80M tweets*

[15:55:56] <mantovani> dddh: is about 300GB

[16:42:46] <dddh> oh

[18:26:12] <diegoaguilar> Can someone help? http://dba.stackexchange.com/questions/110508

Log file Viewer

Help | Karma | Search:

#mongodb logs for Sunday the 9th of August, 2015