[00:41:03] <hdm> using a sharded setup, sharding definitely working, i see all of my inserts under the "notSharded" section of db.collection.stats() [ under ops ], is that right? the shard key is not random and chunk splits should lead to more even distribution of inserts
[00:46:46] <mrpro> is it possible to save to a collection only if a record doesnt exist due to some criteria?
[01:19:46] <hdm> fyi to any 10gen folks, the mms agent is causing problems: 29138 root 20 0 4862m 1.4g 4308 S 140 0.7 25:52.42 python
[01:20:13] <hdm> each offer could have its version at top level, or versions as sub-docs
[01:20:17] <unknet> i need to make reference to a product versions really
[01:20:28] <unknet> not the current product version
[01:20:36] <hdm> sounds like you need relational even for products then
[01:21:06] <hdm> if you are set on document model and want single queries, you can look at something like riak's "links" (but its much slower than mongo imo)
[01:26:00] <hdm> unknet: yes, so expire them and delete via bg jobs
[01:26:10] <hdm> delete where user.offers.offer_expire < todays_date
[01:26:16] <unknet> it could be another solution yes
[01:26:34] <unknet> to put a limit on offer number
[01:26:54] <Oddman> unknet, 16mb is not a hard limit
[01:27:04] <hdm> if you think of each offer being a unique combination of product + expiration + values for a user, you can stick it into each user's doc
[01:30:40] <Oddman> mongoid is delicious for ruby/rails peeps
[01:31:04] <unknet> hdm why i cant in mongoid have a model embbeded with a belongs_to relation?
[01:31:07] <rubydog> Oddman: hdm, just looking at it :)
[01:31:19] <hdm> video of a hurried/crappy version of the research ^ http://www.irongeek.com/i.php?page=videos/bsideslasvegas2012/1.1.4-hd-moore-empirical-exploitation&mode=print
[01:31:32] <Oddman> unknet, it's defined differently
[01:31:47] <hdm> basically scanning the internet, constantly, for months
[01:33:22] <Oddman> it refers to them as embedded_in or embeds_many
[01:33:34] <hdm> rubydog: fwiw, plain mongo was best for standalone stuff that needed to do large-scale updates/queries, orm models suck at upserts/multi-update
[07:57:13] <Rhaven> I'm strangling with some high latency since i have added a new shard to my cluster, specially with the moveChunk command that the balancer do every time he need to restore the balance of data between each shard. i tried to define a balance window but nothing changed. Is there a way to prevent this latency?
[08:20:04] <remonvv> Anyone any idea what this is : Chunk map pointed to incorrect chunk
[09:50:10] <Signum> Hi. I'm trying to introduce GridFS file storage in MongoDB in a web application here. I'm just not sure how to refer to the files from the JSON documents. Is GridFS something different than regular collections? Do I just refer to them by filename? Or can I embed files somehow in JSON documents?
[09:50:23] <Signum> Btw... the documentation part of the MongoDB home page seems broken.
[10:03:01] <NodeX> one is the files the other is the chunks of the files
[10:03:24] <NodeX> inside the files collection you can use metadata to identify / tag your files
[10:05:10] <Signum> NodeX: Thanks. So GridFS comes with that metadata collection by default which I can extend by putting more values into it?
[10:06:07] <Signum> NodeX: My application is screenshots.debian.net (currently based on PostgreSQL and files on disk) which I'd like to rewrite and also use MongoDB. My idea was to connect the screenshot PNG files directly with the packages so that if a package got deleted the files are removed automatically.
[10:38:55] <NodeX> Signum : sorry, I got distracted
[10:40:03] <NodeX> you can't connect (JOIN) anything in mongo, so your app would have to save perhaps a package ID with a screenshot and if/when a package is deleted then delete the files too
[11:48:52] <Signum> NodeX: sorry, I got disconnected from my screen session and didn't even notice. :)
[11:49:15] <Signum> NodeX: Okay, thanks. So there's nothing like a cascade and I'll have to deal with that application-wise.
[11:50:40] <NodeX> I dont know what cascade is sorry
[11:57:00] <Signum> NodeX: In relational databases cascades define that related documents are updated or removed automatically.
[11:57:42] <kali> Signum: yep, no cascade nor trigger, you need to do that app-side
[11:57:43] <Signum> NodeX: In MongoDB there seem to be DBrefs but they don't delete related documents automatically.
[11:58:29] <kali> Signum: dbref is just a convention or a protocol. mongodb does not do anything clever with them
[11:58:59] <Signum> So basically I'd have a database with a "packages" collection and a "screenshots" collection. For each package I'd store several screenshots. And I could extend the metadata JSON of a screenshot by data like "uploader-IP-address".
[12:22:47] <Cubud> I am just looking into MongoDB to see if it will be suitable for what I need
[12:22:55] <Cubud> The closes thing I can think of is YouTube
[12:23:32] <Cubud> I am currently under the impression that I would have a collection "uservotes" which would record the user + videoID + thumb up/down
[12:23:33] <NodeX> Signum : Mongo is not SQL - when you wrap your way of thinking round that you will understand it alot more
[12:23:55] <Cubud> And then do a map/reduce to work out the current votes for a video. Am I right so far?
[12:24:18] <NodeX> better to keep a counter of votes
[12:24:36] <Cubud> That makes sense to me, except I cannot have transactions across collections
[12:24:48] <NodeX> you can't have transactions period
[12:24:49] <Cubud> I would otherwise imagine doing this
[12:25:10] <Cubud> Well, update to a single document is atomic, and a batch update I think is atomic too
[12:25:16] <Cubud> that's what I mean by "transactions" :)
[12:26:25] <Cubud> So if I have a single source of data for thumbs "uservotes" then that would only require a single atomic operation, but if that also updates totals in a video then they could easily become out of sync
[12:27:21] <NodeX> yes because that's a "transaction" :)
[12:27:40] <Cubud> Yeah, but I want to understand the MongoDB solution to the same problem
[12:27:51] <NodeX> you could bake them into your app by counting the number of votes before and after
[12:27:54] <Cubud> How would I implement such an app with Mongo
[12:28:05] <NodeX> if you want true transactions then dont use mongo
[12:28:16] <Signum> NodeX: Right, I'm getting into nosql slowly. I have a demo implementation of my data storage in CouchDB but for several reasons (missing data types, revisions that blow up the database, missing search features) I stopped following that approach.
[12:28:28] <NodeX> if you realise that a vote either way means nothing then it doesn't matter
[12:28:39] <Cubud> I am not bothered about transactions so much as I just want to make sure that the video ratings don't become permanently incorrect
[12:29:29] <NodeX> Cubud : you would probably write a sanity script then to run once an X and count the votes and compare to the number of votes in the field coutner
[12:29:38] <Cubud> You see I have been reading a Manning book called something like "Big Data" (I don't recall) and it suggests that I record all user actions
[12:29:45] <Cubud> PersonX voted video Y with a thumb up
[12:29:49] <NodeX> it wont tell you which ones didn't transact but it will tell you there is a mismatc
[12:29:58] <Cubud> 2 days later - PersonX voted video Y with a thumb down
[12:30:29] <NodeX> Cubud : I track every user interaction on my sites by default so I can always go back to it should I need to
[12:30:55] <Cubud> Yes that is what it suggests, so then if the data ended up screwed up due to a coding error it can be corrected
[12:31:46] <Cubud> But I am struggling to apply this idea to a youtube type app requirement
[12:31:48] <NodeX> I do it to find out how users interact with my apps but that is an added bonus
[12:32:21] <Cubud> In the case where it is the first thumb on the video it should just add 1 to "ThumbsUp"
[12:32:42] <Cubud> In the case where the user changes their mind it should add 1 to "ThumbsDown" and subtract 1 from "ThumbsUp"
[12:33:17] <Cubud> The code to do this is obvious, but without a transaction it makes less sense to me
[12:33:46] <Cubud> I am trying to understand how to make sure it is correct "eventually" without having to rebuild everything repeatedly when it doesn't need it
[12:35:05] <Cubud> I was under the impression that MapReduce was the solution, and that the results are incrementally updated for successive queries after an update to the core data
[12:35:51] <remonvv> Cubud, m/r is not needed and not even a practical solution for this.
[12:36:32] <Cubud> I'd like to have some alternative suggestions, it will help me change my line of thougt
[12:36:33] <remonvv> You can either do atomic updates directly on your records or perform updates based on app-server local aggregation (e.g. collect all votes every second and update at most once per second to the database).
[12:39:09] <remonvv> if it's application critical that changing from a thumbs up to a thumbs down is accurate you will have to store the thumbs up/down per user. If it isn't you can maintain an app-server in-memory cache to determine if a vote is new or a change of mind.
[12:39:49] <Cubud> I don't mind it being wrong for a while
[12:40:52] <Cubud> I thought I would need (A) A collection for user thumb actions; (B) A collection for the latest action against a video; (C) The video itself
[12:42:49] <Cubud> So for example I would record that Cubud voted Up on video1 on a certain date/time
[12:43:47] <Cubud> Then another bit of code would update the latestuservotes collection by doing an upsert where user=Cubud, video=Video1, and LastThumb NE new thumb
[12:44:00] <Cubud> so it will only change it if the user has changed their mind
[12:44:59] <Cubud> which would also update a "LastModified" date perhaps
[12:45:55] <Cubud> and then I could update videos where the video ThumbsLastUpdated < latestuservotes.LastModified
[13:32:41] <Signum> Does anyone know how to access (download) files from GridFS via rockmongo? The rockmongo documentation claims that this is possible but I'm too stubborn to find the actual option.
[13:33:37] <remonvv> Cubud, m/r makes use of the JavaScript context which is single threaded.
[13:33:53] <remonvv> Hence suggesting to completely ignore all JS based functionality for anything involving production systems ;)
[13:34:00] <remonvv> If you want proper m/r you can use Hadoop
[13:34:13] <remonvv> But again m/r is not a fit for your requirements.
[13:34:33] <remonvv> Also, the native m/r alternative for 2.2+ is the Aggregation Framework (AF) which is better.
[13:36:32] <remonvv> An aggregation alternative, not an m/r alternative.
[13:56:35] <Rhaven> I'm strangling with some high latency since i have added a new shard to my cluster, specially with the moveChunk command that the balancer do every time he need to restore the balance of data between each shard. i tried to define a balance window but nothing changed. Is there a way to prevent this latency?
[14:18:45] <remonvv> Is the latency specific to the new shard?
[14:35:04] <Neptu> hej have strong problem with the c++ driver... I hget a boost assertion lock when I broke the replica and its reading...
[14:43:42] <hdm> happens with the official 2.2 client too
[16:00:28] <konr_trab> I'm new to mongo, and in a .txt file, I've got an array of elements I'd like to put into a new collection in mongo. How can I do so?
[16:10:47] <doryexmachina> Hey folks, hit an issue with bson/ruby, and I'm hoping someone can point me in the right direction. I'm a Python guy, and I'm trying to get a Ruby project up and talking to Mongo. My Mongo install is fine locally, and I can talk to Mongo via Pymongo easily. However, via Ruby, I keep getting this:
[16:10:48] <doryexmachina> `serialize_number_element': MongoDB can only handle 8-byte ints (RangeError)
[16:10:55] <konr_trab> or for that matter, how to read a text file
[16:11:02] <doryexmachina> Has anyone hit this, and/or can point me in the right place to check it out?
[16:17:58] <IAD> It is possible to sort by array? I have the array of sphinx_ids...
[16:33:40] <konr_trab> of course, to read a file, do mongo < file!
[17:30:47] <stanislav> Hello, I am having this problem with PyMongo+gevent recently. Once a day my connection count rises from 200 to 600-800 and MongoDB stops responding to new connections and the Python processes basically die. I am completely unsure why this is happening and if anyone has any ideas it would be greatly appreciated.
[17:46:42] <q_> is it bad to call a map reduce from within a db.eval () ? Technically, I'm calling a stored function that calls a map reduce
[17:54:22] <therealkoopa> How bad is it to use the ObjectId for a record as a public link or key? For instance, having a url endpoint: http://localhost:3000/foo/<Object Id>
[17:55:08] <therealkoopa> Should all user visible ids be randomly generated, separate from the mongo objectid? I read the mongo objectids are predictable and use a special algorithm so you find out information from the id itself.
[17:56:44] <NodeX> therealkoopa : what does it matter if someone can guess one?
[17:57:42] <therealkoopa> I'm not sure it does matter. I'm just wondering if it's a bad idea to use the mongo objectid for url endpoints.
[18:19:38] <NodeX> they are not the best looking things in the world in terms of URL's but they suffice
[18:24:09] <cure_> is there a sample collection for mongodb ?
[19:03:55] <bhosie> i'm trying to work the aggregation framework and group docs together by day/month, but i'm running into an issue. How can group by day and month of an ISODate() field? my second example returns false here: http://pastie.org/private/s1hmqprdybyinlzdvfl2yg
[19:15:24] <bhosie> yeah they work fine separately (in previous pastie)
[19:16:06] <NodeX> it might be expected behaviour, you should file a jira if you would like to see it changed
[19:16:33] <bhosie> so i guess second question would be what is the correct way (or alternate way) to do this?
[20:32:00] <bhosie> NodeX: just wanted to follow up. this gives me the result i was looking for. thanks for taking a look earlier: http://pastie.org/private/undr4duwy809saijkw4qnw
[21:03:09] <Signum> Any rockmongo users here who can tell me how to access GridFS files? The docs claim that files and chunks can be accessed from RockMongo but I fail to see how.
[21:37:44] <toddnine> Hi all. Is there a way to run the smoketests against an existing mongo instance?
[21:53:52] <toddnine> I'm trying to run something like all.js, but against an existing db within a mongo instance
[22:34:09] <patrickod> is it possible to store aggregation documents in a temp collection so that I can get past the 16mb document size limit?
[22:34:33] <patrickod> the dataset I'm using is apparently too large for the aggregation that I have to run effectively