pmxbot IRC Log Viewer

[00:41:03] <hdm> using a sharded setup, sharding definitely working, i see all of my inserts under the "notSharded" section of db.collection.stats() [ under ops ], is that right? the shard key is not random and chunk splits should lead to more even distribution of inserts

[00:46:46] <mrpro> is it possible to save to a collection only if a record doesnt exist due to some criteria?

[00:47:01] <hdm> yes, use an upsert

[00:48:02] <hdm> if the criteria maps to a unique index, that will also prevent a duplicate insert

[00:51:44] <mrpro> i think i need unique index

[00:52:26] <mrpro> c# driver throws on dupe or i can get error without it throwing?

[00:53:13] <hdm> catch the error or put it into safe mode

[00:53:16] <hdm> er, out of safe mode

[00:54:23] <unknet> Hi

[01:00:15] <mrpro> hdm

[01:00:31] <mrpro> safeMode is just telling the driver to do getLastError for me right

[01:00:41] <mrpro> non-safemode i can call getLastError myself right?

[01:01:54] <hdm> yeah, not sure how the c# driver works offhand, but if you do batch insert/upserts, that might work differently

[01:04:25] <mrpro> hmm

[01:04:33] <mrpro> batch….what does that mean

[01:04:37] <mrpro> are batches all-or-nothing?

[01:04:48] <mrpro> oh but they cant go to different collections anyway

[01:05:08] <mrpro> i need to make like 5 inserts as all-or-nothing pretty much

[01:05:37] <mrpro> but its not a big issue for me if something gets inserted…say 3 out of those 5

[01:05:58] <mrpro> cause it's going to be retried on the next round. I just need to avoid dupes. I think i am good for the most part

[01:06:11] <hdm> cool

[01:06:20] <hdm> one driver i use does batches of 10k upserts

[01:06:25] <hdm> mapping back errors is tougher

[01:06:39] <mrpro> well

[01:06:46] <mrpro> but you wont know which failed right

[01:07:08] <mrpro> upsert returns modified doc right?

[01:07:13] <mrpro> waste of BW

[01:07:16] <hdm> not sure offhand, ive got a mess of elastic and mongo code, the mongo driver does say which failed

[01:07:25] <hdm> doesnt iirc

[01:07:27] <unknet> Hi

[01:07:37] <hdm> unknet: you said that already

[01:07:41] <mrpro> hi

[01:07:48] <unknet> you're right! lol

[01:08:27] <unknet> I have a question

[01:08:53] <hdm> just ask it :)

[01:11:02] <unknet> I need to model a User which has products and offers

[01:11:25] <unknet> with products having many different versions

[01:11:40] <unknet> and offers must be pointing to a particular version

[01:11:44] <unknet> of that product

[01:12:08] <hdm> im guessing the products and offers dont belong to each user though

[01:12:16] <hdm> they would apply across many, etc

[01:12:35] <unknet> products belongs to a user and a offer to two users

[01:12:56] <hdm> are you sure? a product is so unique that every user has a unique product?

[01:13:02] <unknet> yes

[01:13:11] <hdm> how many of them will a user have at a given time?>

[01:13:35] <unknet> around 40

[01:13:48] <unknet> with three o four short field

[01:13:50] <unknet> fields

[01:13:58] <hdm> look at implementing the product + offer combo as a sub-document of user

[01:14:11] <hdm> if the user buys something, copy the entire subdocument to the purchases collection/table/etc

[01:14:27] <hdm> you will need to delete expired producs via bg job or something to keep it sane

[01:14:46] <unknet> i ave been thinking on doing that but i need products in a collection

[01:14:56] <hdm> then products arent uniq to each user

[01:14:57] <unknet> the are accessed individually

[01:15:00] <hdm> offers are

[01:15:07] <unknet> ?

[01:15:15] <unknet> products are unique for each users

[01:15:37] <hdm> then store it on the user

[01:15:51] <hdm> each collection you have to hit is another query

[01:16:00] <unknet> ops youc could be right

[01:16:13] <hdm> if that doesnt make sense, look at using a relationial db instead

[01:16:22] <unknet> perhaps i dont really need a separate collection

[01:16:47] <unknet> hdm with relational model i have 20 tables with document only 6

[01:16:50] <hdm> if you want to manage products centrally, having them as subdocuments wont make sense

[01:17:05] <hdm> you can also use external references, but it means secondary queries

[01:17:16] <unknet> you are right i really dont need a central repository

[01:17:30] <unknet> but what about offers that belongs to two users?

[01:17:44] <unknet> that offers make reference to things of two users also

[01:17:51] <unknet> products i mean

[01:17:52] <hdm> then it isnt unique to the user, is it? either you duplicate or you centrally manage

[01:18:06] <hdm> you could make it its own collection and have it have a subdocument of users

[01:18:24] <hdm> then query for it by looking at offers.user_ids : user_id

[01:18:32] <hdm> will be slow without an index though

[01:18:41] <unknet> mm

[01:19:29] <unknet> and what about versions?

[01:19:46] <hdm> fyi to any 10gen folks, the mms agent is causing problems: 29138 root 20 0 4862m 1.4g 4308 S 140 0.7 25:52.42 python

[01:20:13] <hdm> each offer could have its version at top level, or versions as sub-docs

[01:20:17] <unknet> i need to make reference to a product versions really

[01:20:28] <unknet> not the current product version

[01:20:36] <hdm> sounds like you need relational even for products then

[01:21:06] <hdm> if you are set on document model and want single queries, you can look at something like riak's "links" (but its much slower than mongo imo)

[01:21:29] <unknet> mmm

[01:21:30] <hdm> elastic parents might work too, but youre stretching the relationalness of mongo if you need to manage it that way

[01:22:18] <hdm> there are some good tutorials for doing transaction-style work with mongo over ultiple collections

[01:22:23] <hdm> but imo its a pita

[01:23:23] <unknet> i think that despite of using relations for that i have advantajes with mongo

[01:23:32] <unknet> finally i come up with less tables

[01:23:36] <unknet> than realtional model

[01:23:39] <unknet> and less joins

[01:24:20] <unknet> hdm i dont need transactions between collections

[01:24:35] <unknet> is not critical if some offers or products are lost

[01:25:00] <hdm> how about killing "product" as a unique type entirely then?

[01:25:02] <unknet> this is another reason because i think i can keep up with mongo

[01:25:10] <hdm> and just putting a specific document for an offer into each user's doc

[01:25:14] <mrpro> hmm

[01:25:22] <mrpro> will getLastError return _id of the dupe record?

[01:25:47] <unknet> hdm but an user can have infinite offers and document limit to 16mb.....

[01:25:51] <hdm> doc.offers = [ { product : 1233, offer_expire : Date(), offer_details : {}} ]

[01:25:55] <unknet> im afraid of that

[01:26:00] <hdm> unknet: yes, so expire them and delete via bg jobs

[01:26:10] <hdm> delete where user.offers.offer_expire < todays_date

[01:26:16] <unknet> it could be another solution yes

[01:26:34] <unknet> to put a limit on offer number

[01:26:54] <Oddman> unknet, 16mb is not a hard limit

[01:27:04] <hdm> if you think of each offer being a unique combination of product + expiration + values for a user, you can stick it into each user's doc

[01:27:08] <Oddman> it's just a recommended limit

[01:27:18] <Oddman> besides, how many offers are we talking here?

[01:27:19] <hdm> Oddman: its compile time though, isnt it?

[01:27:20] <unknet> Oddman i didnt know that

[01:27:23] <Oddman> to fill up 16mb you need a shitload ;)

[01:27:40] <unknet> Oddman yes i know but im thinking im a bit paranoic lol

[01:27:42] <hdm> or one bad cron job heh

[01:27:44] <Oddman> yes, you are

[01:27:48] <Oddman> tbh, as I tell everyone

[01:27:53] <Oddman> worry about scaling issues if/when you get them

[01:28:06] <hdm> unknet: since you only hit a limit when adding things, add a task before setting an offer than clears stale subdocs

[01:28:08] <Oddman> for now implement the easiest way possible that requires the least amount of dev time

[01:28:20] <hdm> s/than/that/

[01:28:35] <unknet> hdm im taking note of that

[01:28:47] <hdm> just make it part of offer provisioning instead

[01:28:55] <unknet> Oddman im trying i have removed lots of innecesary features of my project

[01:29:19] <unknet> for building only essentials

[01:29:28] <hdm> Oddman: it all depends on the size of your data :/ ive hit scaling issues on day one, but dealing with 500m records

[01:29:50] <rubydog> guys if I have to write non-rails application in Ruby and use mongodb for database which gem is good?

[01:29:52] <Oddman> exactly

[01:29:53] <hdm> right now 20% iowait on all nodes and slow inserts after 100m records, and adding the MMS agent stole 2 cores and 2Gb of ram

[01:29:57] <Oddman> so you have to deal with it from day one :)

[01:29:59] <unknet> i dont thing thats my case lol

[01:30:03] <Oddman> like I said when/if

[01:30:06] <hdm> rubydog: depends on your style, mongo, mongo_mapper, or mongoid

[01:30:23] <rubydog> its just a small script I have to write

[01:30:26] <hdm> mongo is raw, mongo_mapper is orm/ar style, mongoid is similar but has some neat features

[01:30:27] <Oddman> what'st he project anyways hdm? Sounds big :)

[01:30:39] <hdm> Oddman: sec

[01:30:40] <Oddman> mongoid is delicious for ruby/rails peeps

[01:31:04] <unknet> hdm why i cant in mongoid have a model embbeded with a belongs_to relation?

[01:31:07] <rubydog> Oddman: hdm, just looking at it :)

[01:31:19] <hdm> video of a hurried/crappy version of the research ^ http://www.irongeek.com/i.php?page=videos/bsideslasvegas2012/1.1.4-hd-moore-empirical-exploitation&mode=print

[01:31:32] <Oddman> unknet, it's defined differently

[01:31:47] <hdm> basically scanning the internet, constantly, for months

[01:32:00] <Oddman> haha

[01:32:02] <Oddman> nice

[01:32:03] <hdm> getting closer to a billion records atm

[01:32:07] <Oddman> far out

[01:32:32] <unknet> Oddman, how?

[01:32:46] <Oddman> read the docs dude :)

[01:32:55] <unknet> i have readed it but

[01:33:02] <unknet> it dont explain it well

[01:33:06] <Oddman> sure it does

[01:33:22] <Oddman> it refers to them as embedded_in or embeds_many

[01:33:34] <hdm> rubydog: fwiw, plain mongo was best for standalone stuff that needed to do large-scale updates/queries, orm models suck at upserts/multi-update

[01:33:34] <Oddman> IF it's a nested document

[01:33:48] <Oddman> ^ what hdm said

[01:33:52] <Oddman> too much overhead in creating the objects

[01:34:11] <unknet> yes but why i cannot have a document embbeded pointing to another object id in a collection?

[01:34:17] <hdm> current code uses pure mongo and batches into 10k upserts or inserts

[01:34:31] <hdm> unknet: because at the end of the day its syntactic sugar, it has nothign to do with mongo

[01:34:41] <hdm> mongo doesnt care what your number means

[01:34:42] <Oddman> unknet, why can't you?

[01:34:47] <Oddman> there's nothing stopping you

[01:34:59] <unknet> Oddman mongoid tell me that i cant have

[01:35:14] <unknet> a model embbeded_in with a belongs_to

[01:35:19] <hdm> you can file tickets against mongoid if you want more magic apis, but its still doing two queries behind the scenes

[01:35:21] <Oddman> er

[01:35:27] <Oddman> hmmm

[01:35:30] <hdm> i kind of like that not being automated, its a red flag that what im doing is stupid

[01:35:37] <Oddman> ja

[01:35:53] <Oddman> tbh an embedded_in and belongs_to in a nested document does seem a bit odd

[01:36:15] <Oddman> I thought you were just after the embedded_in aspect unknet, sorry

[01:36:24] <hdm> bbl

[01:37:14] <unknet> yes it strange but i need it to check if a product in a offer is in his current version at the time to make a deal

[01:37:30] <unknet> using that reference i could check in the product collection

[01:37:45] <unknet> but mongoid dont allows me to do that

[01:38:04] <Oddman> it might be better to denormalize there

[01:38:08] <Oddman> and simply set an expiry

[01:38:26] <unknet> it not is time dependant

[01:38:33] <unknet> depends on another user actions

[01:38:52] <Oddman> right

[01:39:17] <Oddman> well then just write the code yourself to create the relationship rather than relying on mongoid

[01:39:34] <unknet> putting a field with the object id?

[01:40:42] <unknet> i could put in a field called current_product_version_id pointing to producst collection

[01:41:13] <Oddman> I'd just call it product_id tbh. lol

[01:41:27] <unknet> yes lol

[01:41:27] <Oddman> follow conventions, but create a separate method on your model for getting that information when necessary

[01:41:44] <unknet> yes but then i have to manage

[01:41:57] <unknet> deletions in the products collection

[01:42:01] <unknet> by hand

[01:42:33] <unknet> to delete offers containing it

[01:42:50] <unknet> all that mess is because versioning

[01:42:57] <unknet> it brings me mad

[01:42:57] <unknet> lol

[01:44:32] <unknet> i think that my project should materialize a hammer which hit the user hands when tries to change a product fields

[01:46:46] <Oddman> how are queries for offers done?

[01:47:06] <unknet> every time a user login

[01:47:37] <unknet> logs in

[01:48:01] <Oddman> no as in

[01:48:13] <Oddman> apart from finding if a user has any offers, is there any other way they're queried?

[01:48:43] <unknet> no

[01:49:18] <unknet> i only need to show it to the user

[01:50:19] <unknet> mmm

[01:50:19] <Oddman> how are offers assigned to users?

[01:50:35] <unknet> there is one case more really

[01:50:57] <unknet> i need to know in which offers a product is in

[01:51:17] <unknet> Oddman, another user assings it

[01:51:25] <Oddman> hmmm

[01:51:52] <unknet> Oddman, i have been considering a graph database seriously

[01:51:56] <mrpro> shit

[01:51:56] <unknet> for that

[01:51:59] <mrpro> how do i do getlasterror

[01:52:08] <mrpro> database.getlasterror()?

[01:52:10] <unknet> instead a document db

[01:52:24] <Oddman> really no need, unknet - over-thinking it

[01:52:28] <Oddman> it's a pretty simple relationship tbh

[01:52:31] <Oddman> and you're getting caught up in mongoid

[01:52:34] <unknet> yes i know

[01:52:37] <Oddman> so work around it :)

[01:52:40] <unknet> and i like mongoid

[01:52:42] <unknet> lol

[01:52:44] <Oddman> if you have to find products in users, then so be it

[01:53:04] <Oddman> you can just do db.users.ensureIndex('offers.product_id') or something

[01:53:18] <Oddman> if you need to find all user offers by that product id

[01:53:30] <unknet> thats easy the problem comes with product versioning

[01:53:43] <unknet> offers really point to a specific version of a product

[01:53:54] <Oddman> so point it to the product version, not the product id

[01:54:11] <unknet> then i cant user mongoid::versioning

[01:54:20] <Oddman> well shit happens. lol

[01:54:28] <unknet> yes i hate that

[01:54:29] <unknet> lol

[01:54:35] <Oddman> you're letting software dictate your requirements, it's the other way around

[01:55:25] <unknet> Oddman, no, my problem is that im unfolding little by little and i dont want to finish with my realtional model again

[01:55:47] <unknet> from my mongodb model

[01:56:07] <unknet> my relational model has a separate table for versions

[01:56:07] <Oddman> I don't see why you don't just denormalize the product version on the offers part of users

[01:56:37] <Oddman> aka, copy the product version data (that is necessary) to the user.offers array

[01:56:40] <unknet> Oddman because i need to invalidate the offer if the product is deleted

[01:56:49] <Oddman> so then you delete it/invalidate it when needed

[01:57:04] <unknet> and for finding it in the offer?

[01:57:08] <unknet> it has no reference

[01:57:32] <unknet> thats mongoid not letting my to make a belongs with embbed again

[01:59:20] <unknet> (why in mongoid documentation the author is saying that relations could be removed in future versions? it is not weird?)

[02:00:10] <mrpro> umm

[02:00:30] <mrpro> with c# driver, GetLastErrorResult doesn't provide error code wtf

[02:01:51] <unknet> Oddman perhaps i will keep in each product a hash with offer_id's where the product is in

[02:02:42] <Oddman> there's no need to do that

[02:02:53] <Oddman> just query user.offers where product_id = products.id

[02:03:26] <unknet> but then i do not denormalize as you said

[02:03:31] <Oddman> yes you do

[02:03:34] <Oddman> or, you can

[02:03:46] <Oddman> you de-normalize and keep the reference

[02:04:08] <unknet> but mongoid does not allow embbed and belongs_to

[02:04:08] <Oddman> bbl

[02:04:12] <Oddman> you don't need to

[02:04:20] <Oddman> as I said earlier (we're going around in circles here dude) - you write your own code to do it

[02:04:35] <unknet> oh yes

[02:04:52] <unknet> 4:00am here Oddman

[02:04:58] <Oddman> hehe, understood :)

[02:05:03] <unknet> :P

[02:05:04] <Oddman> go get some sleep ya nutbag :P

[02:06:34] <unknet> it is my fault i have to remind that i should not turn on my pc after 22:00

[02:07:04] <unknet> but it calls me and...

[02:08:28] <unknet> then rails... then mongoid joins to the party and...

[02:08:43] <aarjona> hello

[02:08:57] <aarjona> is there any mirror of the current mongodb docs?

[02:09:35] <aarjona> the backup site for the c# driver is the 1.2 docs from 6 months ago

[03:04:48] <unknet> Hi

[03:15:08] <unknet> is bad having two documents in different collections with the same id_ ?

[03:18:47] <zastern> Anybody else having issues with the mongo site right now?

[03:18:54] <zastern> e.g. http://www.mongodb.org/display/DOCS/GridFS

[03:25:08] <hdm> zastern: same issue here

[03:26:25] <unknet> ?

[03:34:36] <unknet> are there any issues having two documents with the same id but in different collections?

[04:51:58] <jgornick> Hey guys, when I remove records from my collection, why is the collection totalSize() still the previous size?

[04:52:54] <sucode> Hello, how to set skip with geo near query, I am using spring-data mongodb

[05:35:28] <mrpro> so if i do safeMode = false and then do getLastError myself, how do i then do W2 or whatever?

[06:38:43] <zastern> I keep trying to google "enable gridfs" and im getting nothing

[06:38:49] <zastern> does it just work automagically?

[07:33:12] <[AD]Turbo> hola

[07:57:13] <Rhaven> Hello guys,

[07:57:13] <Rhaven> I'm strangling with some high latency since i have added a new shard to my cluster, specially with the moveChunk command that the balancer do every time he need to restore the balance of data between each shard. i tried to define a balance window but nothing changed. Is there a way to prevent this latency?

[08:20:04] <remonvv> Anyone any idea what this is : Chunk map pointed to incorrect chunk

[08:20:08] <remonvv> mongos is refusing to boot

[09:00:41] <NodeX> if any 10gen employees are around ... your docs sites are kinda screwed

[09:00:47] <NodeX> http://api.mongodb.org/wiki/current/Capped%20Collections.html <---- css isn't loading ;)

[09:50:10] <Signum> Hi. I'm trying to introduce GridFS file storage in MongoDB in a web application here. I'm just not sure how to refer to the files from the JSON documents. Is GridFS something different than regular collections? Do I just refer to them by filename? Or can I embed files somehow in JSON documents?

[09:50:23] <Signum> Btw... the documentation part of the MongoDB home page seems broken.

[10:02:48] <NodeX> gridfs has 2 parts

[10:03:01] <NodeX> one is the files the other is the chunks of the files

[10:03:24] <NodeX> inside the files collection you can use metadata to identify / tag your files

[10:05:10] <Signum> NodeX: Thanks. So GridFS comes with that metadata collection by default which I can extend by putting more values into it?

[10:06:07] <Signum> NodeX: My application is screenshots.debian.net (currently based on PostgreSQL and files on disk) which I'd like to rewrite and also use MongoDB. My idea was to connect the screenshot PNG files directly with the packages so that if a package got deleted the files are removed automatically.

[10:33:49] <remonvv> http://www.mongodb.org/display/DOCS/Tag+Aware+Sharding seems broken

[10:38:55] <NodeX> Signum : sorry, I got distracted

[10:40:03] <NodeX> you can't connect (JOIN) anything in mongo, so your app would have to save perhaps a package ID with a screenshot and if/when a package is deleted then delete the files too

[11:48:52] <Signum> NodeX: sorry, I got disconnected from my screen session and didn't even notice. :)

[11:49:15] <Signum> NodeX: Okay, thanks. So there's nothing like a cascade and I'll have to deal with that application-wise.

[11:50:40] <NodeX> I dont know what cascade is sorry

[11:57:00] <Signum> NodeX: In relational databases cascades define that related documents are updated or removed automatically.

[11:57:42] <kali> Signum: yep, no cascade nor trigger, you need to do that app-side

[11:57:43] <Signum> NodeX: In MongoDB there seem to be DBrefs but they don't delete related documents automatically.

[11:57:57] <Signum> kali: Okay, can do.

[11:58:29] <kali> Signum: dbref is just a convention or a protocol. mongodb does not do anything clever with them

[11:58:59] <Signum> So basically I'd have a database with a "packages" collection and a "screenshots" collection. For each package I'd store several screenshots. And I could extend the metadata JSON of a screenshot by data like "uploader-IP-address".

[11:59:13] <Signum> ?

[12:00:02] <kali> Signum: you can basically shove anything you like in the document (avoid "-" in the keys, though)

[12:00:16] <Signum> kali: Very good. Thanks. Then I'll get hacking.

[12:07:29] <Cubud> Hi all

[12:22:47] <Cubud> I am just looking into MongoDB to see if it will be suitable for what I need

[12:22:55] <Cubud> The closes thing I can think of is YouTube

[12:23:32] <Cubud> I am currently under the impression that I would have a collection "uservotes" which would record the user + videoID + thumb up/down

[12:23:33] <NodeX> Signum : Mongo is not SQL - when you wrap your way of thinking round that you will understand it alot more

[12:23:55] <Cubud> And then do a map/reduce to work out the current votes for a video. Am I right so far?

[12:24:10] <NodeX> avoid map/reduce if you can

[12:24:18] <NodeX> better to keep a counter of votes

[12:24:36] <Cubud> That makes sense to me, except I cannot have transactions across collections

[12:24:48] <NodeX> you can't have transactions period

[12:24:49] <Cubud> I would otherwise imagine doing this

[12:25:10] <Cubud> Well, update to a single document is atomic, and a batch update I think is atomic too

[12:25:16] <Cubud> that's what I mean by "transactions" :)

[12:26:25] <Cubud> So if I have a single source of data for thumbs "uservotes" then that would only require a single atomic operation, but if that also updates totals in a video then they could easily become out of sync

[12:27:21] <NodeX> yes because that's a "transaction" :)

[12:27:40] <Cubud> Yeah, but I want to understand the MongoDB solution to the same problem

[12:27:51] <NodeX> you could bake them into your app by counting the number of votes before and after

[12:27:54] <Cubud> How would I implement such an app with Mongo

[12:28:05] <NodeX> if you want true transactions then dont use mongo

[12:28:16] <Signum> NodeX: Right, I'm getting into nosql slowly. I have a demo implementation of my data storage in CouchDB but for several reasons (missing data types, revisions that blow up the database, missing search features) I stopped following that approach.

[12:28:28] <NodeX> if you realise that a vote either way means nothing then it doesn't matter

[12:28:37] <NodeX> heh

[12:28:39] <Cubud> I am not bothered about transactions so much as I just want to make sure that the video ratings don't become permanently incorrect

[12:29:29] <NodeX> Cubud : you would probably write a sanity script then to run once an X and count the votes and compare to the number of votes in the field coutner

[12:29:31] <NodeX> counter*

[12:29:38] <Cubud> You see I have been reading a Manning book called something like "Big Data" (I don't recall) and it suggests that I record all user actions

[12:29:45] <Cubud> PersonX voted video Y with a thumb up

[12:29:49] <NodeX> it wont tell you which ones didn't transact but it will tell you there is a mismatc

[12:29:58] <Cubud> 2 days later - PersonX voted video Y with a thumb down

[12:30:29] <NodeX> Cubud : I track every user interaction on my sites by default so I can always go back to it should I need to

[12:30:55] <Cubud> Yes that is what it suggests, so then if the data ended up screwed up due to a coding error it can be corrected

[12:31:46] <Cubud> But I am struggling to apply this idea to a youtube type app requirement

[12:31:48] <NodeX> I do it to find out how users interact with my apps but that is an added bonus

[12:32:21] <Cubud> In the case where it is the first thumb on the video it should just add 1 to "ThumbsUp"

[12:32:42] <Cubud> In the case where the user changes their mind it should add 1 to "ThumbsDown" and subtract 1 from "ThumbsUp"

[12:33:17] <Cubud> The code to do this is obvious, but without a transaction it makes less sense to me

[12:33:46] <Cubud> I am trying to understand how to make sure it is correct "eventually" without having to rebuild everything repeatedly when it doesn't need it

[12:35:05] <Cubud> I was under the impression that MapReduce was the solution, and that the results are incrementally updated for successive queries after an update to the core data

[12:35:51] <remonvv> Cubud, m/r is not needed and not even a practical solution for this.

[12:36:32] <Cubud> I'd like to have some alternative suggestions, it will help me change my line of thougt

[12:36:33] <remonvv> You can either do atomic updates directly on your records or perform updates based on app-server local aggregation (e.g. collect all votes every second and update at most once per second to the database).

[12:37:46] <remonvv> thumbsUp -> db.thumbs.update({videoId:..}, {$inc:{thumbsUp:1}})

[12:38:01] <remonvv> thumbsDownFromUp -> db.thumbs.update({videoId:..}, {$inc:{thumbsUp:-1, thumbsDown:1}})

[12:38:03] <remonvv> etc.

[12:39:09] <remonvv> if it's application critical that changing from a thumbs up to a thumbs down is accurate you will have to store the thumbs up/down per user. If it isn't you can maintain an app-server in-memory cache to determine if a vote is new or a change of mind.

[12:39:36] <Cubud> Literally the same as YouTube

[12:39:49] <Cubud> I don't mind it being wrong for a while

[12:40:52] <Cubud> I thought I would need (A) A collection for user thumb actions; (B) A collection for the latest action against a video; (C) The video itself

[12:41:03] <Cubud> A thread would update B from A

[12:41:07] <Cubud> and then C from B

[12:42:49] <Cubud> So for example I would record that Cubud voted Up on video1 on a certain date/time

[12:43:47] <Cubud> Then another bit of code would update the latestuservotes collection by doing an upsert where user=Cubud, video=Video1, and LastThumb NE new thumb

[12:44:00] <Cubud> so it will only change it if the user has changed their mind

[12:44:59] <Cubud> which would also update a "LastModified" date perhaps

[12:45:55] <Cubud> and then I could update videos where the video ThumbsLastUpdated < latestuservotes.LastModified

[12:46:25] <Cubud> Does that make sense?

[12:48:05] <Cubud> MapReduce really does look like a possible appropriate implementation you know

[12:48:31] <Cubud> using a permantent named collection

[12:49:02] <Cubud> Then an incremental M/R could be run as an atomic operation

[12:49:42] <Cubud> That looks like it makes sense to me, what is wrong with it? :)

[12:53:25] <kali> Cubud: map reduce is usualy a bad idea

[12:53:52] <kali> Cubud: it's quite slow, and there is a global lock on the javascript interpreter

[12:54:08] <kali> Cubud: so it's not suitable for frontend stuff

[12:54:20] <Cubud> It would be a back-end job I expect

[12:54:36] <Cubud> but the javascript interpreter is not multi-threaded?

[12:55:07] <ppetermann> nope

[12:55:19] <ppetermann> at least not for the m/r

[12:55:57] <Cubud> wow, that is shocking

[12:56:23] <Cubud> a long-running m/r could hold up the entire system. I am really very surprised about that

[12:56:48] <ppetermann> no.

[12:57:08] <ppetermann> but a m/r wont use several processes for mapping / reducing

[12:57:11] <Cubud> well, entire system (of m/r)

[12:57:21] <Cubud> m/r are serialised

[13:32:41] <Signum> Does anyone know how to access (download) files from GridFS via rockmongo? The rockmongo documentation claims that this is possible but I'm too stubborn to find the actual option.

[13:33:37] <remonvv> Cubud, m/r makes use of the JavaScript context which is single threaded.

[13:33:53] <remonvv> Hence suggesting to completely ignore all JS based functionality for anything involving production systems ;)

[13:34:00] <remonvv> If you want proper m/r you can use Hadoop

[13:34:13] <remonvv> But again m/r is not a fit for your requirements.

[13:34:33] <remonvv> Also, the native m/r alternative for 2.2+ is the Aggregation Framework (AF) which is better.

[13:36:32] <remonvv> An aggregation alternative, not an m/r alternative.

[13:56:34] <Rhaven> Hello guys,

[13:56:35] <Rhaven> I'm strangling with some high latency since i have added a new shard to my cluster, specially with the moveChunk command that the balancer do every time he need to restore the balance of data between each shard. i tried to define a balance window but nothing changed. Is there a way to prevent this latency?

[14:18:45] <remonvv> Is the latency specific to the new shard?

[14:35:04] <Neptu> hej have strong problem with the c++ driver... I hget a boost assertion lock when I broke the replica and its reading...

[14:43:42] <hdm> happens with the official 2.2 client too

[16:00:28] <konr_trab> I'm new to mongo, and in a .txt file, I've got an array of elements I'd like to put into a new collection in mongo. How can I do so?

[16:10:47] <doryexmachina> Hey folks, hit an issue with bson/ruby, and I'm hoping someone can point me in the right direction. I'm a Python guy, and I'm trying to get a Ruby project up and talking to Mongo. My Mongo install is fine locally, and I can talk to Mongo via Pymongo easily. However, via Ruby, I keep getting this:

[16:10:48] <doryexmachina> `serialize_number_element': MongoDB can only handle 8-byte ints (RangeError)

[16:10:55] <konr_trab> or for that matter, how to read a text file

[16:11:02] <doryexmachina> Has anyone hit this, and/or can point me in the right place to check it out?

[16:17:58] <IAD> It is possible to sort by array? I have the array of sphinx_ids...

[16:33:40] <konr_trab> of course, to read a file, do mongo < file!

[16:33:48] <konr_trab> bye!

[17:30:47] <stanislav> Hello, I am having this problem with PyMongo+gevent recently. Once a day my connection count rises from 200 to 600-800 and MongoDB stops responding to new connections and the Python processes basically die. I am completely unsure why this is happening and if anyone has any ideas it would be greatly appreciated.

[17:46:42] <q_> is it bad to call a map reduce from within a db.eval () ? Technically, I'm calling a stored function that calls a map reduce

[17:54:22] <therealkoopa> How bad is it to use the ObjectId for a record as a public link or key? For instance, having a url endpoint: http://localhost:3000/foo/<Object Id>

[17:55:08] <therealkoopa> Should all user visible ids be randomly generated, separate from the mongo objectid? I read the mongo objectids are predictable and use a special algorithm so you find out information from the id itself.

[17:56:44] <NodeX> therealkoopa : what does it matter if someone can guess one?

[17:57:42] <therealkoopa> I'm not sure it does matter. I'm just wondering if it's a bad idea to use the mongo objectid for url endpoints.

[17:58:10] <NodeX> I use it with no problems

[18:16:48] <q_> I use it too, its does contain info about the record, when it was made etc, so if thats of importance ...

[18:17:22] <NodeX> if you need to obfusticate it then it's not that hard

[18:18:41] <therealkoopa> I don't think I care if they figure out when the record was made.

[18:18:48] <therealkoopa> I'll keep going with using the ObjectId.

[18:19:06] <NodeX> it saves another index imo

[18:19:12] <therealkoopa> NodeX: Agreed.

[18:19:38] <NodeX> they are not the best looking things in the world in terms of URL's but they suffice

[18:24:09] <cure_> is there a sample collection for mongodb ?

[19:03:55] <bhosie> i'm trying to work the aggregation framework and group docs together by day/month, but i'm running into an issue. How can group by day and month of an ISODate() field? my second example returns false here: http://pastie.org/private/s1hmqprdybyinlzdvfl2yg

[19:06:05] <NodeX> can you try it on the shell

[19:06:58] <bhosie> NodeX: yeah, let me do that.... one sec.

[19:14:22] <bhosie> NodeX: hmm. i'm guessing i can't use $created more than once? http://pastie.org/private/lblrtyceal6xztnebcfoa

[19:15:02] <NodeX> possibly, try them separately

[19:15:24] <bhosie> yeah they work fine separately (in previous pastie)

[19:16:06] <NodeX> it might be expected behaviour, you should file a jira if you would like to see it changed

[19:16:33] <bhosie> so i guess second question would be what is the correct way (or alternate way) to do this?

[20:32:00] <bhosie> NodeX: just wanted to follow up. this gives me the result i was looking for. thanks for taking a look earlier: http://pastie.org/private/undr4duwy809saijkw4qnw

[21:03:09] <Signum> Any rockmongo users here who can tell me how to access GridFS files? The docs claim that files and chunks can be accessed from RockMongo but I fail to see how.

[21:37:44] <toddnine> Hi all. Is there a way to run the smoketests against an existing mongo instance?

[21:53:52] <toddnine> I'm trying to run something like all.js, but against an existing db within a mongo instance

[22:34:09] <patrickod> is it possible to store aggregation documents in a temp collection so that I can get past the 16mb document size limit?

[22:34:33] <patrickod> the dataset I'm using is apparently too large for the aggregation that I have to run effectively

Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 24th of September, 2012