PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 1st of November, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:46:06] <therealkoopa> What's the best datatype for an encrypted object? Store it as a string? Or as binary
[01:15:08] <skot> binary
[01:16:26] <Determinist> octal
[02:30:12] <JoseffB> heyall
[02:30:16] <JoseffB> anyone here to help?
[02:31:35] <JoseffB> ne1 here?
[02:43:00] <hdm> JoseffB: try asking a question
[02:43:10] <JoseffB> hey
[02:43:18] <JoseffB> tryea might help
[02:43:20] <JoseffB> :)
[02:43:35] <JoseffB> trying to add a embedded doc to my doc via php
[02:43:41] <JoseffB> schema look slike this:
[02:44:08] <JoseffB> -object->ratings->{..}
[02:44:16] <JoseffB> where {..} are arrays per rating
[02:44:27] <JoseffB> I tried save
[02:44:45] <JoseffB> but its adds the result under the object level
[02:45:40] <JoseffB> actually it saves right but to a new record altogether
[02:45:51] <JoseffB> instead of appending that existing record
[02:47:43] <hdm> sorry, not familiar with the PHP api for it, in ruby;s driver you use update or upserts
[02:47:55] <JoseffB> have the same in php
[02:48:02] <JoseffB> I just dont know how to format it right
[02:48:18] <IAD> JoseffB: use $set
[02:49:17] <JoseffB> set to add a new array ?
[02:50:26] <IAD> och, embedded. $addToSet http://www.mongodb.org/display/DOCS/Updating#Updating-%24addToSetand%24each
[02:53:52] <JoseffB> ok so it updates the array to the root
[02:54:06] <JoseffB> so not _object and ratings are peers
[02:54:14] <JoseffB> but ratings should be under _object
[02:54:34] <JoseffB> when I try to fix the data array to include htat top array I get this error
[02:54:45] <JoseffB> Cannot apply $addToSet modifier to non-array
[02:55:09] <Determinist> If I try to pull the users being followed by 1234, this doesn't really work.
[02:56:16] <Determinist> If I have to maintain names for both the follower userID and the userID of the user being followed, that means I'll have to keep updating the names in a bunch of docs when a user changes their name.
[02:56:24] <IAD> JoseffB: please paste you doc to anywhere
[02:56:26] <Determinist> any recommendations?
[02:58:12] <JoseffB> IAD: https://gist.github.com/3991354
[02:58:35] <JoseffB> trying to add a new array under the ratings branch
[02:58:48] <IAD> Determinist: yep, you should
[02:59:10] <JoseffB> shoot deleted by accident
[02:59:33] <Determinist> IAD: i should what?
[02:59:59] <JoseffB> IAD: https://gist.github.com/3991365
[03:00:09] <JoseffB> trimmed it down to the ratings branch
[03:02:36] <IAD> Determinist: you have to keep the key values
[03:04:12] <Determinist> IAD: you mean embed the names into the followers collection? What happens if a single user is following 6000 users and that user changes his name? That means an update command for 6000 documents. Sounds like a bad recipe for scaling? Unless, of course there's no way around that...
[03:06:37] <JoseffB> I dont know why addtoset says object is a non-array
[03:06:41] <IAD> Determinist: you can use the _id field for link users (not username). it will be a little easier
[03:07:18] <Determinist> IAD: still doesn't aleviate the main bottlenecks of having to do massive updates, but what can ya do.
[03:09:55] <IAD> JoseffB: show the PHP code to update
[03:10:33] <JoseffB> updated with php
[03:10:55] <JoseffB> that gives error of non-array
[03:12:39] <IAD> Determinist: perhaps these requests are rare
[03:13:10] <Determinist> IAD: perhaps. scaling 101 - plan for the worst, hope for the best.
[03:15:53] <IAD> JoseffB: you just want to add new {array} into "records" ?
[03:16:03] <JoseffB> into ratings yea
[03:16:31] <JoseffB> so like there should be _object->ratings->mewarray
[03:18:34] <IAD> it will be like: db.[your_collection].update({_id : your_id}, {$set : {'ratings' : {$push : new_data_array}}})
[03:21:25] <JoseffB> ahh missed the push!
[03:22:09] <IAD> and may be it will work without $set
[03:26:39] <JoseffB> says notokFor Staorage
[03:26:42] <JoseffB> as an error
[03:27:03] <JoseffB> if I take away teh set it overwrites every record in the table/collection
[03:27:03] <JoseffB> lol
[03:27:37] <JoseffB> $r = $ratingRec->update(array('_id' => new MongoId($pid)),array('$set' => array('ratings' => array('$push' => $obj))));
[03:27:49] <JoseffB> thats what I used and I took the object and ratings off the data
[03:34:47] <IAD> JoseffB: may be it will be helpful: http://stackoverflow.com/questions/10420635/mongodb-php-how-to-push-item-to-array-with-a-spcecific-key
[03:35:23] <IAD> it will be more convenient with the key
[03:37:59] <JoseffB> works but doesn't append
[03:38:07] <JoseffB> it completed replaces all the other ratings
[03:38:16] <JoseffB> in every other record as well
[03:40:27] <IAD> You can use a save with full overwrite: $a= $collection->findOne($where); $a['ratings'][] = $new_array; $collection->save($a);
[03:40:46] <JoseffB> figured it out
[03:40:58] <JoseffB> $r = $ratingRec->update(
[03:40:59] <JoseffB> array('_id' => new MongoId($pid)),
[03:40:59] <JoseffB> array( '$addToSet' => array( '_object.ratings' => $obj ))
[03:40:59] <JoseffB> );
[03:41:09] <JoseffB> need to use . notiation
[03:41:13] <JoseffB> notation
[03:41:34] <JoseffB> IAD your awesome
[03:41:39] <IAD> och I forgot, what you use _object
[03:41:40] <JoseffB> thanks for puttin gme on right track
[03:41:54] <JoseffB> now need to figure out how to edit this
[03:41:55] <JoseffB> :)
[03:43:37] <JoseffB> its jsut set instead of addtoset right?
[03:47:04] <IAD> Promise me to think about "id" in embedded documents =)
[03:47:33] <JoseffB> to add an id inside
[03:47:35] <JoseffB> good idea
[04:28:19] <JoseffB> ok so now I can t get this to update v
[04:52:04] <hdm> i am seeing performance drops on insert for a single instance of mongod, even when writes occur to different databases, these go away when i manually restart mongodb, and dont represent as higher % of faults (but higher % of locks)
[04:52:36] <hdm> any ideas why an instance when a new db every ~6 hours would spiral down on performance, with similar contents for each db, and inserts only into the latest
[04:53:36] <hdm> performance drops from 12k to 5k to 3k and now 1k per second over 48 hours, but across 7+ dbs, with only the latest being inserted
[04:54:03] <hdm> vsize/mapped shows ALL dbs accounted for, res is harder to tell
[04:54:15] <hdm> ex: 1357 1 0 0 0 2 0 681g 1363g 138g 0 critical_201208:30.5% 0 0|0 0|19 1m 3k 20 23:51:24
[05:08:53] <neiz> anyone famliar with the C# driver? Been trying to retrieve the only document in my collection (.FindOne()), but I cannot seem to coerce it into working
[05:19:26] <tpae> hello
[05:19:47] <tpae> using tailable collection, can i listen for changes in that collection, and execute a command?
[05:24:58] <hdm> tpae: sure, if you write some code that does that
[05:25:31] <tpae> let's say.. i can build a queue using tailable collection?
[05:29:06] <socket> from what i see, mongo does not require user/password. how does it make sure data isnt hacked?
[05:51:03] <hdm> socket: enable a username/password and ssl
[05:51:13] <hdm> google for mongo security turns it up pretty quick
[05:51:22] <hdm> worst case, ip acls
[06:06:53] <holly1> Anybody know any mongodb admin UI's that support batch updates?: http://www.mongodb.org/display/DOCS/Admin+UIs
[06:46:42] <ali__> any ideas on mongodb versioning?
[06:46:57] <ali__> I need to track changes only
[06:54:28] <IAD> ali__: pretend to be the slave, or to have a central point for communication with the base
[08:10:47] <socke> hey dudes, im migrating from rdbms to mongo, i have a user system and a gps tracking system that tracks the users (which are cars). when designing a mongo db, should the user system be a part of the tracking system?
[08:10:54] <socke> if so, how?
[08:11:46] <ron> socke: when moving to nosql dbs, your data structure should almost always conform to the way you want to query your data rather than how you want to save it.
[08:12:58] <socke> i see, thanks
[08:14:30] <socke> the thing is, i need to do user authentication first and then start track the user. at later point i'd like to see the path the user took on a given time frame
[10:21:57] <samurai2> hi there, which one is faster : 1. filtering before map/reduce or 2. filtering inside map function? thanks
[10:22:10] <NodeX> before
[10:23:53] <kali> samurai2: make sure you have a look at the aggregation framework too. anything you can do with it will be more efficient than map/reduce
[10:24:08] <NodeX> +1
[10:25:31] <samurai2> kali : but aggregation framework has limitation of only 16 MB on it's result right?
[10:26:24] <kali> samurai2: sure. but if you're generating a result bigger than 16MB, chances are you're abusing mongodb
[10:26:46] <kali> samurai2: mongo is designed for fast and small queries
[10:27:51] <samurai2> kali : and if I need to also filtering based on geospatial indexing it won't work as well right?
[10:28:23] <kali> samurai2: i'm not clear on what work and does not on geospatial indexing, i'm not using it
[10:28:41] <NodeX> you can't aggregate geospatial (yet)
[10:28:52] <NodeX> it's coming in the future apparently
[10:30:27] <samurai2> I'm hoping it can directly output the result into a certain collection and also we can decide the output key, not only key and value. :)
[10:30:47] <samurai2> and without output size limitation
[11:22:23] <topriddy> got this error from a java/morphia app. out of memory. mongodb top suspects http://pastebin.com/mFrgnWJz
[11:22:31] <topriddy> hoping someone is familiar with same
[11:24:08] <kali> topriddy: a quick fix is to increase the heap size with -Xmx
[11:25:14] <kali> topriddy: but you need probably need to get some insight about what's going on in your app too
[11:26:12] <kali> topriddy: try -verbosegc, to start, but then you may need something more invasive (try yourkit profiler for instance, there is a trial)
[11:27:12] <kali> topriddy: the fact mongodb appears so much in the stack is not that suspect, a database driver is likely to allocate lots of things
[11:28:21] <topriddy> kali: please do you use mongodb/morphia/java stacj?
[11:28:47] <kali> mongodb and java, not morphia
[11:29:07] <topriddy> kali: how do you skip morphia??
[11:29:25] <kali> topriddy: ... i just call the java driver
[11:30:51] <topriddy> kali: hmm...am suspecting maybe the jvm is simply the problem. it probably needs more heapspace.
[11:31:44] <kali> topriddy: you can try and up Xmx, but if the problem does not go away, you'll have to profile gc and memory usage
[11:32:39] <topriddy> kali: experience with profiling gc/memory usage low.
[11:33:18] <kali> topriddy: well, unfortunately, this is a the real world of the jvm in production :)
[11:34:29] <topriddy> kali: thanks man. you have been helpful. i sure hope mongodb not good enough for production is not the culprit though
[11:36:28] <kali> topriddy: this is memory consumption in the jvm, not in mongodb. i'm pretty sure there is no hidden dragons in the mongo java driver
[11:38:17] <kali> topriddy: usually this is due to bad application code (like overenthusiastic caching, leading to reference/memory leak) or heap size not big enough for the jvm working set
[12:14:14] <topriddy> kali: how do you even cache explicitly? i'm not doing any caching or read about it in mongo yet
[12:34:20] <fatninja> I have the var a = "string"; and I want to search all fields that contain a, so I should do something like db.users.find({name: /a/}); But we know that doesn't work, (using server side javascript)
[12:36:45] <NodeX> 1. that won't use an index
[12:37:06] <NodeX> 2. I am not fully sure that mongo reads global variables in it's query syntax
[12:39:01] <fatninja> NodeX, I use a driver to communicate to the db. But I think the solution is this: var regex = new RegExp(a, "g"); and {name: regex}
[12:41:35] <NodeX> perhaps, I dont do that sort of thing on the shell so I couldn't comment
[13:28:19] <NodeX> http://www.guardian.co.uk/technology/2012/nov/01/apple-samsung-statement
[13:28:20] <NodeX> lol
[13:28:42] <NodeX> 'Apple tried to argue that it would take at least 14 days to put a corrective statement on the site – a claim that one judge said he "cannot believe".'
[13:29:17] <NodeX> Apple = 10 year old child who's been told off and is now sulking
[13:45:39] <lapdis> Apple = Geniouses that is gaming the system
[13:45:55] <NodeX> lol
[13:46:11] <robo> hello: right now we're using replica sets and I'm seeing heavy I/O across all are replica set nodes because of the overhead of replication itself I believe. I'm trying to talk our DBA's into going with shards but they are a bit nervous because of the testing. Anyone have any gems of wisdom to drop on this?
[13:46:38] <lapdis> sry, I believe this channel is dedicated to bashing apple
[13:47:02] <robo> sounds challenging
[13:47:14] <NodeX> or to put another way to put it is.... Apple are overpriced manufacturors of equipment that use open source cores and close them off for their own personal gain
[13:47:45] <robo> yeah, that's a very profitable business model these days
[13:48:07] <NodeX> then they whine when they don't get their own way
[13:48:11] <NodeX> like children :)
[13:48:23] <skot> robo: are you on mms, or do you have mongostat + iostat numbers for your nodes?
[13:48:36] <skot> robo: if so, please post links to them.
[13:48:37] <robo> skot, yes to both
[13:49:27] <robo> skot, https://mms.10gen.com/dashboard/view/4f5f957a87d1d86fa8b8821c#chartLinkMinuteOneHour
[13:50:16] <skot> what time period are you seeing high io?
[13:50:45] <skot> within the last hour?
[13:51:35] <robo> 4am we had a spike. Yesterday at 7am-10am
[13:51:53] <skot> timezone?
[13:51:57] <robo> ET
[13:52:01] <robo> it's 9:49am right now
[13:52:03] <robo> sorry :-)
[13:54:01] <robo> skot, the actual disk i/o graphs are in zenoss
[13:54:11] <robo> let me see if i can get them uploaded
[13:55:22] <robo> skot, http://imgur.com/yQ9av
[13:55:34] <skot> If you look at MMS: https://mms.10gen.com/host/chartReplicaSetHosts/4f5f957a87d1d86fa8b8821c/mongodbprod#chartLinkMinuteTwelveHours
[13:56:15] <skot> you can see that you did lots of writes (updates/inserts) at 3:30am ET
[13:56:36] <skot> and those writes were replicated causing all nodes to do heavy io
[13:56:54] <robo> yeah, about right. They run these import jobs that do massive inserts
[13:56:54] <skot> looks like it last 15-20 min
[13:57:17] <robo> skot, how can you tell they are doing inserts by those graphs?
[13:58:10] <skot> In MMS, yes
[13:58:15] <robo> n/m, i see that opcounter graph that shows what it was doing when you hold your mouse over it
[13:58:16] <skot> look at the opcounter
[13:58:22] <skot> yep
[13:58:44] <skot> looks like normal and expected behavior, and disk io.
[13:59:00] <skot> I'd suggest throttling your import if you want it to tax your systems less.
[13:59:09] <robo> so they have to run these to keep our data fresh. Problem is with replica sets is that it has to replicate to all nodes. I figure that if you use shards it will cut down on the overall i/o used
[13:59:33] <skot> it will redistribute it, but the same data needs to be written and repliced
[13:59:52] <skot> You can also just add more/faster disks.
[14:00:23] <skot> I wouldn't suggest sharding for this issue, just making the process less impactful or adding more resources to the nodes.
[14:00:32] <robo> the backend data disks are fiber channel luns to EMC VNX
[14:00:53] <robo> probably can't get much faster unless we go with DAS
[14:01:25] <robo> skot, what do you think of sharding overall? I talked to a few people that hit some major bugs with it
[14:01:45] <robo> i think it's a must-do for us because it's going to be hard to scale with replica sets as we put more traffic on it
[14:01:47] <skot> It has its uses but it depends what your problems are.
[14:02:25] <skot> your set doesn't seem to have traffic patterns where a single instance can't handle the load, does it?
[14:02:26] <robo> okay, so first hit problems then think of sharding
[14:02:44] <robo> nope. But we only have about 20% of our traffic going to it right now (slowly moving off oracle.)
[14:03:07] <robo> ugh, brb meeting
[14:03:09] <skot> The replica set looks mostly idle
[14:03:51] <skot> there is probably lots you can optimize in your app/queries/indexes before you need sharding
[14:03:57] <skot> hard to guess without more info
[14:09:05] <kali> throttling imports on a production db is a must, or mongo will eat all it can until it chokes
[14:19:12] <xcat> Can I downgrade to 2.0.0 on Ubuntu?
[14:19:23] <xcat> 2.2.0 I mean
[14:19:26] <xcat> Using the package manager
[14:32:27] <wiseguysonly> I'm a bit unsure how I would do this in Mongo -> SELECT SUM(field1) GROUP BY DAY(date_field). So I get all days available
[14:56:14] <kali> wiseguysonly: look at the aggregation framework
[14:56:36] <wiseguysonly> Hey kali cheers - just having a go with mapreduce now
[14:56:47] <kali> wiseguysonly: AF is more efficient
[14:56:59] <kali> wiseguysonly: and you're right in the use case
[14:57:21] <mrpro> why are writes faster when the table is empty
[14:57:21] <wiseguysonly> I think my current version of mongo is 2.1
[14:57:26] <mrpro> as more and more stuff gets inserted
[14:57:30] <mrpro> % locked increases
[14:57:53] <kali> wiseguysonly: 2.1 is a development version, but latest 2.1 have the AF
[14:58:10] <wiseguysonly> ok, I'll get my head in those docs then :)
[14:58:13] <kali> mrpro: index maintenance is more expensive when its size grows
[14:58:20] <mrpro> thats terrible
[14:58:22] <kali> mrpro: ln(n)
[14:58:25] <mrpro> my tps is not even high
[14:58:32] <mrpro> and then i see queued writes
[14:58:36] <mrpro> global lock through the roof
[14:58:49] <mrpro> not even that many records
[14:58:55] <kali> mrpro: check that you RAM is big enough to have you indexes in there
[14:59:04] <mrpro> my ram is huge
[15:00:28] <kali> and the indexes ? :)
[15:01:23] <mrpro> kali
[15:01:33] <mrpro> my process uses only like 1.2GB
[15:01:38] <mrpro> i got 16GB on box
[15:07:21] <kali> mrpro: process memory is irrelevant. the indexes will be in the kernel cache and buffer zones
[15:07:37] <kali> mrpro: look at stats() on your collections
[15:07:45] <mrpro> while things are running?
[15:07:50] <kali> mrpro: or at least the collection you're inserting to
[15:07:53] <kali> yes
[15:17:24] <mrpro> kali ok
[15:17:28] <mrpro> i'll try to ramp it up now
[16:21:47] <mrpro> hey kali
[16:21:50] <mrpro> there?
[16:24:29] <gshipley> Anyone know if 2.2.1 is just a bugfix release with the 3-4 issues listed in the change doc? I am not sure I am reading it correctly?
[16:41:43] <_m> According to their server's JIRA: Odd version numbers are unstable dev branches (1.1, 1.3, 1.5) Even numbers are stable branches and only get bug fixes, etc... (1.2, 1.4, 1.6)
[16:41:53] <_m> https://jira.mongodb.org/browse/SERVER
[16:43:27] <_m> Which would mean 2.2.1 is a mostly-bugfix release.
[16:43:56] <NodeX> I tink it is
[16:44:30] <_m> ^ Succinct answer. =)
[17:01:22] <mikejw> with doctrine odm is it possible to remove a reference to a document from within a executed querybuilder cursor?
[17:26:32] <NodeX> 42
[17:31:21] <wereHamster> does mongodb sort by unique documents or can a document appear multiple times if the query matches it multiple times?
[17:31:53] <Derick> it can return documents multiple times, but not for the reason you mention
[17:32:21] <IAD1> NodeX: Don't Panic
[17:32:41] <wereHamster> I basically have: Game = { players: [...]}, each player has a score, and I want to rank games by the highest score any player has reached in it.
[17:33:17] <mikejw> on second thoughts I guess it doesn't really make sense retrieving objects from the cursor more than once :)
[17:33:19] <wereHamster> or should I create a field on the Game (maxScoreByAnyPlayer) and sort by that?
[17:33:38] <IAD1> yep, or map-reduce
[17:33:57] <hdm> mikejw: the lazy way is just index on the score
[17:34:14] <mikejw> score?
[17:34:20] <hdm> and then db.players.find().sort({ score : -1 }).limit(1)
[17:34:23] <ron> score!
[17:34:25] <hdm> set your score index to be -1
[17:34:35] <hdm> and that uses the index to always find the highest ranked player(s)
[17:34:42] <wereHamster> db.players don't have score. the players in the game have scores.
[17:34:52] <hdm> sure, wherever you store it
[17:34:53] <mikejw> hdm: I think you're confusing me with another mongolian :)
[17:34:54] <hdm> index it
[17:34:58] <IAD1> "game.score"
[17:35:07] <wereHamster> so it would be something like: db.games.find(...).sort({ 'players.score': -1 })
[17:35:29] <hdm> mikejw: woops, thanks
[17:35:35] <mikejw> np :)
[17:35:53] <hdm> wereHamster: yup, just index it with db.games.ensureIndex({ 'players.score' : -1 })
[17:36:00] <wereHamster> I guess it's probably better to make a separate field on the Game document. Makes the index smaller.
[17:36:00] <hdm> and the default sort order would be descending
[19:35:21] <andrecolin> hello everyone
[19:37:05] <andrecolin> have a setup running auth, i am using a simple ruby script, i can login to the admin db, change to my test db, can query the collections and i see the system.users collection, how can i access the user names in that collection using my code
[19:38:43] <hdm> treat it as any other collection
[19:39:03] <hdm> db.system.users.find()
[19:39:19] <hdm> collection name "system.users" database "admin"
[19:40:57] <andrecolin> i can do the db.system.users.find() fine in the shell console
[19:41:19] <andrecolin> but something is happening when i try it with code
[19:44:25] <crudson> andrecolin: 'something'?
[19:45:25] <andrecolin> i can do "my_db.collection_names" and see the "system.users" collection
[19:45:43] <_m> andrecolin: Pastie or gist the error you're getting and we'll be better able to help you.
[19:46:09] <_m> Perhaps also the offending lines
[19:46:19] <andrecolin> i tried my_db["system.users"].each
[19:46:36] <_m> Okay, that's still not helpful.
[19:48:38] <_m> Is my_db really a hash? How did you create the connection? What are the actual errors? Pastie/gist so we have enough context to actually help.
[19:48:58] <andrecolin> let me do a pastie
[19:49:48] <bricker> Hello - at /var/lib/mongo I have a bunch of files of databasename.1, databasename.2, etc. Are these backups (i.e., safe to delete the older ones)?
[19:50:15] <bricker> each is exactly 2GB big
[19:50:38] <crudson> my_db["system.users"].each should be my_db["system.users"].find.each
[19:51:44] <crudson> bricker: no! That just means that your db is at least nfiles-1 * 2GB in size
[19:51:54] <andrecolin> http://pastie.org/5168266
[19:52:23] <crudson> bricker: that's your data, it gets split up into files of increasing size, maxing at 2gb(ish), but also preallocated before you hit the limit
[19:52:38] <bricker> crudson: I see! Thanks
[19:53:05] <crudson> bricker: not quite the equation I gave as they start out small, but that's the general idea
[19:53:48] <crudson> andrecolin: my_db["system.users"].each should be my_db["system.users"].find.each
[19:54:07] <andrecolin> ah, let me try that
[19:56:42] <crudson> bricker: you'll see 64M, 128M, 256M, 1G, 2G, 2G, 2G, 2G....... http://www.mongodb.org/display/DOCS/Excessive+Disk+Space#ExcessiveDiskSpace-DatafilePreallocation
[19:57:08] <andrecolin> crudson: my test box just went south, need to reboot, will be back with status
[20:12:21] <mrichman> Where can I find a sample database?
[20:14:55] <bricker> Forgive me, I'm new to mongodb and am afraid of deleting something that I didn't mean to. Will this query remove any entries with the timestamp before October 1? db.stream_minutes_events.remove("t" : { $lt: ISODate("2012-10-01T00:00:00.000Z") })
[20:15:17] <bricker> er...
[20:15:26] <bricker> Forgive me, I'm new to mongodb and am afraid of deleting something that I didn't mean to. Will this query remove any entries with the timestamp before October 1? db.stream_minutes_events.remove({ "t" : { $lt: ISODate("2012-10-01T00:00:00.000Z") } })
[20:15:37] <bricker> oopsie
[20:15:39] <bricker> This query:
[20:15:40] <bricker> db.stream_minutes_events.remove({ "t" : { $lt: ISODate("2012-10-01T00:00:00.000Z") } })
[20:15:43] <bricker> :)
[20:16:35] <naquad> can i use mongodb for high load project (i'm talking about 10k-25k concurrent requests)?
[20:19:06] <hdm> naquad: yes, you might need a few shards, and make sure queries are working from indexes in ram, SSDs help with high throughput too
[20:19:15] <crudson> bricker: I'd shove a load of documents with varying dates into a test collection and check it.
[20:19:26] <naquad> hdm, thanks
[20:19:32] <naquad> also what about this: http://pastebin.com/FD3xe6Jt - true or false?
[20:20:21] <hdm> its someone's experience, everyone has their own opinion, and until a few months ago i had unrelated headaches that drove me crazy
[20:20:44] <hdm> but mongo 2.2 has been a huge improvement overall and msot of the issues i ran into are now just a matter of allocation/sizing up front
[20:20:46] <crudson> that's also a pretty old blog that has done the rounds a lot
[20:21:55] <hdm> the safe writes thing is overblown, you can enable it in your driver, or not, it depends if you need confirmation of every write, some uses cases dont
[20:22:25] <hdm> the sharding stuff is still suboptimal imo, ive seen weird/bad balancing, and adding/removing shards is a pain, especially compared to something like elastic
[20:23:47] <hdm> fwiw, i run a ~1.5 billion record / 3.5Tb dataset for a hobby project, my roadblocks tend to be things like concurrent cpu use since im focusing on single/huge nodes, or slow disk io once indexes go over ram
[20:24:25] <mrichman> that sounds like quite a hobby project -- what is it?
[20:24:27] <hdm> using mongo properly with horizontal scaling/replica sets/sharding would work better if i could afford to
[20:24:54] <hdm> mrichman: scanning the internet for ~6 months or so
[20:25:01] <hdm> banners of various services
[20:25:31] <hdm> https://speakerdeck.com/hdm/derbycon-2012-the-wild-west <- not much there about the mongo setup, but thats the source of it
[20:30:55] <mrichman> are there any sample databases out there to speak off, something like Northwind in SQL Server? Just need to sink my teeth into something real
[20:31:16] <mrichman> s/off/of
[20:32:16] <NodeX> not really
[20:32:42] <NodeX> one main point of nosql style data stores is that it's your choice of data
[20:33:42] <mrichman> understood…but say i wanted to play with map reduce on a large dataset…kinda hard unless i contrive the data myself….seems like there should be some archetypal examples out there
[20:34:00] <hdm> mrichman: pretty easy to load up things like us census data
[20:34:13] <NodeX> any csv / json doc will load easy
[20:34:25] <NodeX> doc / collection of docs
[20:34:49] <hdm> http://aws.amazon.com/publicdatasets/ < lots of stuff there and there -> https://explore.data.gov/catalog/raw/
[20:34:57] <mrichman> thanks
[20:40:23] <TecnoBrat> hdm: this is yours?
[20:41:45] <hdm> TecnoBrat: the slide deck yeah
[20:41:59] <TecnoBrat> neat
[20:42:04] <TecnoBrat> sounds like a fun hobby project
[20:42:21] <hdm> totally :)
[20:42:27] <TecnoBrat> normally ISPs just kick you rather than letting you do educational stuff like that
[20:42:32] <TecnoBrat> so glad you found a proper home
[20:42:40] <hdm> its been a long path to find a home for it
[20:42:51] <hdm> ~2200 abuse reports or something now
[20:42:59] <TecnoBrat> is the data public?
[20:43:17] <hdm> not yet, but working with the guys at Shodan to incorporate it
[20:43:23] <TecnoBrat> mars rovers!
[20:43:25] <hdm> ill dump out some sample sets / aggregate data soon
[20:44:26] <hdm> TechCel: https://twitter.com/hdmoore/status/262080277915004928/photo/1 < two weeks ago
[20:44:33] <hdm> got to go see them up close and personal at jpl
[20:45:12] <TecnoBrat> nice
[20:46:10] <TecnoBrat> hdm: wow ... people really don't understand security
[20:46:30] <TecnoBrat> even technical people (or should be ... cisco for example)
[20:46:38] <LouisT> what is this security you speak of!?
[20:48:12] <hdm> hmm, so back to mongo, how do you do the equivalent of "use database" in JS scripts?
[20:48:20] <TecnoBrat> hdm: thats fun as hell
[20:48:31] <TecnoBrat> do you currently run a sharded cluster for mongo?
[20:48:39] <TecnoBrat> whats your mongo setup like?
[20:49:36] <hdm> single server for most of it, 2x8-core xeon, 192gb ram, 1T of 4xSSD, 40Tb NAS, 3x4Tb direct HDDs
[20:50:14] <hdm> second box is 2-6-core, 96gb ram, same storage specs, used to prototype stuff
[20:50:51] <hdm> sharding is nice on single-boxes since you get to use multiple cpus for queries that do calculations
[20:50:54] <TecnoBrat> 3.5TB dataset you said?
[20:51:01] <hdm> but the disk i/o is still the bottleneck
[20:51:07] <TecnoBrat> right
[20:51:20] <hdm> yeah, changes depending on what set, im reloading it to a differetn schema on one of the boxes almost all the time
[20:51:25] <TecnoBrat> are you running multiple mongod instances etc .. or are you using VMs?
[20:52:03] <hdm> VMs failed miserably (kvm and vmware), weird storage issues, multiple mongods was ok, but sharding fell on its face to the point it wouldnt scale
[20:52:30] <hdm> the primary shard ended up with 3 x the data of the others and the disk i/o was maxed, so it couldnt slough it off and handle incoming inserts fast enough
[20:53:09] <hdm> even with only _id indexes, the read overhead got crazy, at the time drives were raid-0 x 4, so decent number of spindles
[20:53:45] <hdm> the 1T ssds (raid-0 as well) get 1000MB r/w and 25,000 IOPS, but only go to 1T, which doesnt work so well for the current size
[20:54:42] <TecnoBrat> right
[20:54:44] <TecnoBrat> interesting
[20:55:01] <TecnoBrat> so sorry, I'm confused, you have some sharding setup now? or not at all?
[20:55:20] <hdm> not any more, it choked after 3 weeks with the primary unable to hand off fast enough
[20:55:44] <hdm> currently setup is to create new dbs on the fly for each month and put unique contraints into a multi-key _id
[20:56:00] <hdm> limits it to a single index, keeps indexes under 32Gb or so
[20:56:07] <hdm> (per db)
[20:56:45] <hdm> its reloading the data still, but going way faster, better than mongo's normal downward spiral as the index goes beyond ram, your insert performance dies
[21:14:30] <Guest46986> Following this information: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/
[21:14:30] <Guest46986> I've tried time and time again. And all I can get is: exception: need most members up to reconfigure, not ok.
[21:14:30] <Guest46986> All three instances are up, followed the documentation just fine.
[21:14:30] <Guest46986> I was able to run rs.initiate() once, looked like it worked, so I did it again for the other instance,
[21:14:30] <Guest46986> and the errors rained.
[21:14:30] <Guest46986> Of course I did some looking around, I noticed mention of a "\etc\hosts" <--- The heck is that? Do I need it? Is there something else I'm supposed to review that I might have missed?
[21:14:30] <Guest46986> Running Windows Server 2008 R2 64-bit
[21:14:31] <Guest46986> Thank you.
[21:15:14] <frsk> Have you used rs.add() at some point?
[21:16:48] <Guest46986> I do belive that is when ti throws the error.
[21:17:36] <frsk> Does rs.conf() mention the other servers at all?
[21:18:26] <Guest46986> Yes. Mensions 1 of the name "rs0"
[21:18:46] <Guest46986> WHich would be the first one I initiated.
[21:19:01] <frsk> And the one which you're running the command on?
[21:19:16] <Guest46986> correct
[21:19:27] <frsk> Run rs.add("hostname") to add the other servers as well
[21:19:39] <frsk> If it fails, what does it say?
[21:20:54] <Guest46986> Cannot use localhost.
[21:21:22] <Guest46986> Trying 128.0.0.1
[21:21:30] <frsk> You're specifying a port as well
[21:21:30] <frsk> ?
[21:21:39] <hdm> add a hostname to /etc/hosts after 127.0.0.1
[21:21:48] <hdm> myserver or something, this will elt you change it later
[21:21:52] <hdm> and not break your rs
[21:22:09] <Guest46986> At the moment. Follow what you posted.
[21:22:15] <Guest46986> NOt at***
[21:22:35] <hdm> $ cat /etc/hosts
[21:22:35] <hdm> 127.0.0.1 localhost shard1 shard2 shard3 shard4 config1 config2 config3 config4
[21:22:46] <hdm> ^ example of how to test sharding/replicas/config locally
[21:24:10] <Guest46986> With or without port yield the same error.
[21:24:29] <Guest46986> exception: need most members up to reconfig...
[21:28:16] <Guest46986> If anyone is available to help me solve my issue, I've posted it here as well: https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/QTqV1G3LUGU
[21:34:26] <Guest46986> Following this information: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/
[21:34:26] <Guest46986> I've tried time and time again. And all I can get is: exception: need most members up to reconfigure, not ok.
[21:34:26] <Guest46986> All three instances are up, followed the documentation just fine. What's the deal?
[21:34:26] <Guest46986> I was able to run rs.initiate() once, looked like it worked, so I did it again for the other instance,
[21:34:26] <Guest46986> and the errors rained.
[21:34:27] <Guest46986> Of course I did some looking around, I noticed mention of a "\etc\hosts" <--- The heck is that? Do I need it? If so why didn't I come across it
[21:34:27] <Guest46986> in the installation or mongo setup documentation? Is there something else I'm supposed to read that I might have missed?
[21:34:28] <Guest46986> Running Windows Server 2008 R2 64-bit
[21:34:28] <Guest46986> Thank you.
[21:35:15] <cyberfart> I have a quick question: are "$in" queries considered equality test or range query? I'm trying to work out the better compound_index ordering
[21:39:49] <TecnoBrat> cyberfart: its not a range ... its an array. You are finding any records which have a value that exists in the array
[21:40:09] <TecnoBrat> "$in: [1, 4]" means that the field is either a 1 or a 4
[21:54:40] <Guest46986> Following this information: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/
[21:54:41] <Guest46986> I've tried time and time again. And all I can get is: exception: need most members up to reconfigure, not ok.
[21:54:41] <Guest46986> All three instances are up, followed the documentation just fine. What's the deal?
[21:54:41] <Guest46986> I was able to run rs.initiate() once, looked like it worked, so I did it again for the other instance,
[21:54:41] <Guest46986> and the errors rained.
[21:54:41] <Guest46986> Of course I did some looking around, I noticed mention of a "\etc\hosts" <--- The heck is that? Do I need it? If so why didn't I come across it
[21:54:41] <Guest46986> in the installation or mongo setup documentation? Is there something else I'm supposed to read that I might have missed?
[21:54:42] <Guest46986> Running Windows Server 2008 R2 64-bit
[21:54:42] <Guest46986> Thank you.
[21:55:39] <raja> how can i do a query where i don't want a unique on a field but a limit.. for example.. i want to query all stories but limit it to 2 per userid
[21:56:50] <Guest46986> db.user.findOne()?
[21:58:29] <cyberfart> ty TecnoBrat , I will put that field before range fields on the index then
[21:58:56] <LouisT> raja: only thing that i can come up with is something using $where =/
[21:59:19] <raja> hmmm.. any further hints into what to supply $where
[22:01:02] <LouisT> raja: not of the top of my head, no
[22:02:19] <hdm> raja: you can do that via aggregation framework
[22:02:30] <hdm> raja: if the easy ways arent doable
[22:02:54] <Guest46986> This is depressing. Can someone at least shoot me a link to a decent tutorial on how to set up replication sets?
[22:03:06] <raja> hdm: looking at aggregation docs but don't see it.. aggregation supports distinct by key but not a limit by key value
[22:03:13] <hdm> Guest46986: you ignored previous advice and then pasted the same thing twice, no thanks
[22:03:30] <hdm> raja: you can create any key combination you want and then limit it
[22:04:13] <Guest46986> ? I'm assuming you mean: <hdm> add a hostname to /etc/hosts after 127.0.0.1
[22:04:20] <hdm> http://pastie.org/5168971
[22:04:21] <Guest46986> Had no idea you where talking to me.
[22:04:40] <Guest46986> Because I clearly stated, I don't have those directories in my post.
[22:04:43] <hdm> raja: ^ add a {{ '$limit' => 2 }} after $project 'ing your unique key
[22:05:04] <raja> hdm: cool thanks for the example
[22:07:01] <Guest46986> Look I'm sorry I upset you, was not my intention. I just figured you where talking to someone else. Don't see why you would being up /etc/hosts is you where.
[22:10:40] <Guest46986> cat /etc/hosts returns: etc is not defined.
[22:11:20] <ron> ah?
[22:11:50] <Guest46986> I do not understand where I should put these directories or configure them. THey where not covered in the documentation I followed.
[22:13:12] <LouisT> Guest46986: this is on a windows server?
[22:13:19] <Guest46986> Correct
[22:13:26] <LouisT> then you won't have an /etc/hosts
[22:13:45] <Guest46986> This is what I figured as well. But people keep teling me I need it
[22:13:58] <Guest46986> Like that hdm guy lol
[22:14:58] <Guest46986> Well thanks for responding at least. Then is there somehting I can do? Some other documentation I can follow for windows?
[22:15:09] <LouisT> "Configure DNS names appropriately"
[22:15:14] <LouisT> you'd have to do that on windows
[22:15:29] <LouisT> well, i'm not aware of a /etc/hosts equivalent on windows
[22:15:37] <LouisT> for windows*
[22:16:39] <jcims> c:\windows\windows32\drivers\etc\hosts
[22:17:03] <LouisT> oh really? that's cool
[22:17:06] <Guest46986> Really?
[22:17:07] <jcims> derp windows32 -> system32
[22:17:11] <jcims> yass'm
[22:17:29] <jcims> there is probably a hosts.sam file (sam = sample)
[22:17:34] <LouisT> Guest46986: http://en.wikipedia.org/wiki/Hosts_file
[22:18:50] <Guest46986> Sweet thanks a ton.
[22:43:45] <Raynos> Whats a good way to store meta data on a collection
[22:43:49] <Raynos> like for example "last map reduced"
[23:05:25] <Gargoyle> evenin'
[23:16:09] <jrdn> random question
[23:16:27] <ron> do pigs live on the moon?
[23:17:29] <jrdn> if a value is known in a column, is it the same performance doing, x = 1; x++; db.foo.update(criteria, { $set: { bar: x } }
[23:17:42] <ron> mine was more random.
[23:17:44] <jrdn> vs doing an $inc on it
[23:18:05] <jrdn> in regards to mongo performance
[23:18:38] <ron> I imagine $inc would be safer.
[23:18:42] <jrdn> i was just wondering if i should build increment functionality in my domain model persistence manager since the update needs the domain model
[23:19:12] <Zelest> somewhat odd question; a server with only ~10MB/s write-speed, is that usable for mongodb?
[23:19:16] <Zelest> or should I find another host?
[23:19:56] <jrdn> if you have bad IO and high traffic, you'll cry
[23:19:59] <ron> jrdn: using the $inc is an atomic operation. using the first option is not. in a multi-threaded environment, that matters.
[23:20:13] <jrdn> and @ron, pings don't live on the moon.. your mom is on earth
[23:20:14] <jrdn> OHH!!!
[23:20:15] <jrdn> jk
[23:21:22] <Zelest> yeah, I guess the proper answer is "it's up to your applications needs" ..
[23:21:24] <jrdn> .. set is atomic
[23:21:31] <jrdn> ?
[23:21:53] <Zelest> i get to choose between low disk I/O or high network delays. :(
[23:21:53] <ron> set is atomic. your calculation isn't.
[23:21:58] <jrdn> Zelest, yeah we're on EC2 on a regular EBS volume and were getting high IO alerts every minute when data is being flushed to disk
[23:22:18] <jrdn> @ron, yah that's why I said in regards to mongodb
[23:22:37] <ron> but.. but...
[23:22:42] <ron> oh, I'm going to sleep.
[23:23:08] <jrdn> having my application do $x += 10, then update with $set vs $inc with value of 10 is micro micro micro
[23:23:39] <jrdn> anyway, gnight
[23:27:39] <Raynos> What should the sort Object parameter look like in mapReduce
[23:27:48] <Raynos> I know sort arrays look like [[key, direction]]
[23:27:53] <Raynos> does it want { key: direction } ?
[23:30:19] <bricker> When I delete entries from a collection, when does that data get removed from disk?
[23:39:39] <wereHamster> bricker: at an unspecified time later.
[23:42:23] <bricker> wereHamster: Thanks, just wanted to make sure I didn't have to do it manually or something :)
[23:44:26] <wereHamster> that unspecified time may also be never. Why do you care?
[23:45:01] <bricker> wereHamster: Oh... because we have some data that is taking up a huge amount of space on our server, and we're just trying to clean it up
[23:45:27] <wereHamster> you may want to try db.repair()
[23:45:39] <bricker> wereHamster: old data that we no longer care about (stats collected from Cube)
[23:45:45] <bricker> wereHamster: I'll look at it, thanks again
[23:48:46] <wereHamster> ah, cube. Do you still use it?
[23:55:30] <bricker> wereHamster: yes but we only need stats month-to-month