pmxbot IRC Log Viewer

[00:46:06] <therealkoopa> What's the best datatype for an encrypted object? Store it as a string? Or as binary

[01:15:08] <skot> binary

[01:16:26] <Determinist> octal

[02:30:12] <JoseffB> heyall

[02:30:16] <JoseffB> anyone here to help?

[02:31:35] <JoseffB> ne1 here?

[02:43:00] <hdm> JoseffB: try asking a question

[02:43:10] <JoseffB> hey

[02:43:18] <JoseffB> tryea might help

[02:43:20] <JoseffB> :)

[02:43:35] <JoseffB> trying to add a embedded doc to my doc via php

[02:43:41] <JoseffB> schema look slike this:

[02:44:08] <JoseffB> -object->ratings->{..}

[02:44:16] <JoseffB> where {..} are arrays per rating

[02:44:27] <JoseffB> I tried save

[02:44:45] <JoseffB> but its adds the result under the object level

[02:45:40] <JoseffB> actually it saves right but to a new record altogether

[02:45:51] <JoseffB> instead of appending that existing record

[02:47:43] <hdm> sorry, not familiar with the PHP api for it, in ruby;s driver you use update or upserts

[02:47:55] <JoseffB> have the same in php

[02:48:02] <JoseffB> I just dont know how to format it right

[02:48:18] <IAD> JoseffB: use $set

[02:49:17] <JoseffB> set to add a new array ?

[02:50:26] <IAD> och, embedded. $addToSet http://www.mongodb.org/display/DOCS/Updating#Updating-%24addToSetand%24each

[02:53:52] <JoseffB> ok so it updates the array to the root

[02:54:06] <JoseffB> so not _object and ratings are peers

[02:54:14] <JoseffB> but ratings should be under _object

[02:54:34] <JoseffB> when I try to fix the data array to include htat top array I get this error

[02:54:45] <JoseffB> Cannot apply $addToSet modifier to non-array

[02:55:09] <Determinist> If I try to pull the users being followed by 1234, this doesn't really work.

[02:56:16] <Determinist> If I have to maintain names for both the follower userID and the userID of the user being followed, that means I'll have to keep updating the names in a bunch of docs when a user changes their name.

[02:56:24] <IAD> JoseffB: please paste you doc to anywhere

[02:56:26] <Determinist> any recommendations?

[02:58:12] <JoseffB> IAD: https://gist.github.com/3991354

[02:58:35] <JoseffB> trying to add a new array under the ratings branch

[02:58:48] <IAD> Determinist: yep, you should

[02:59:10] <JoseffB> shoot deleted by accident

[02:59:33] <Determinist> IAD: i should what?

[02:59:59] <JoseffB> IAD: https://gist.github.com/3991365

[03:00:09] <JoseffB> trimmed it down to the ratings branch

[03:02:36] <IAD> Determinist: you have to keep the key values

[03:04:12] <Determinist> IAD: you mean embed the names into the followers collection? What happens if a single user is following 6000 users and that user changes his name? That means an update command for 6000 documents. Sounds like a bad recipe for scaling? Unless, of course there's no way around that...

[03:06:37] <JoseffB> I dont know why addtoset says object is a non-array

[03:06:41] <IAD> Determinist: you can use the _id field for link users (not username). it will be a little easier

[03:07:18] <Determinist> IAD: still doesn't aleviate the main bottlenecks of having to do massive updates, but what can ya do.

[03:09:55] <IAD> JoseffB: show the PHP code to update

[03:10:33] <JoseffB> updated with php

[03:10:55] <JoseffB> that gives error of non-array

[03:12:39] <IAD> Determinist: perhaps these requests are rare

[03:13:10] <Determinist> IAD: perhaps. scaling 101 - plan for the worst, hope for the best.

[03:15:53] <IAD> JoseffB: you just want to add new {array} into "records" ?

[03:16:03] <JoseffB> into ratings yea

[03:16:31] <JoseffB> so like there should be _object->ratings->mewarray

[03:18:34] <IAD> it will be like: db.[your_collection].update({_id : your_id}, {$set : {'ratings' : {$push : new_data_array}}})

[03:21:25] <JoseffB> ahh missed the push!

[03:22:09] <IAD> and may be it will work without $set

[03:26:39] <JoseffB> says notokFor Staorage

[03:26:42] <JoseffB> as an error

[03:27:03] <JoseffB> if I take away teh set it overwrites every record in the table/collection

[03:27:03] <JoseffB> lol

[03:27:37] <JoseffB> $r = $ratingRec->update(array('_id' => new MongoId($pid)),array('$set' => array('ratings' => array('$push' => $obj))));

[03:27:49] <JoseffB> thats what I used and I took the object and ratings off the data

[03:34:47] <IAD> JoseffB: may be it will be helpful: http://stackoverflow.com/questions/10420635/mongodb-php-how-to-push-item-to-array-with-a-spcecific-key

[03:35:23] <IAD> it will be more convenient with the key

[03:37:59] <JoseffB> works but doesn't append

[03:38:07] <JoseffB> it completed replaces all the other ratings

[03:38:16] <JoseffB> in every other record as well

[03:40:27] <IAD> You can use a save with full overwrite: $a= $collection->findOne($where); $a['ratings'][] = $new_array; $collection->save($a);

[03:40:46] <JoseffB> figured it out

[03:40:58] <JoseffB> $r = $ratingRec->update(

[03:40:59] <JoseffB> array('_id' => new MongoId($pid)),

[03:40:59] <JoseffB> array( '$addToSet' => array( '_object.ratings' => $obj ))

[03:40:59] <JoseffB> );

[03:41:09] <JoseffB> need to use . notiation

[03:41:13] <JoseffB> notation

[03:41:34] <JoseffB> IAD your awesome

[03:41:39] <IAD> och I forgot, what you use _object

[03:41:40] <JoseffB> thanks for puttin gme on right track

[03:41:54] <JoseffB> now need to figure out how to edit this

[03:41:55] <JoseffB> :)

[03:43:37] <JoseffB> its jsut set instead of addtoset right?

[03:47:04] <IAD> Promise me to think about "id" in embedded documents =)

[03:47:33] <JoseffB> to add an id inside

[03:47:35] <JoseffB> good idea

[04:28:19] <JoseffB> ok so now I can t get this to update v

[04:52:04] <hdm> i am seeing performance drops on insert for a single instance of mongod, even when writes occur to different databases, these go away when i manually restart mongodb, and dont represent as higher % of faults (but higher % of locks)

[04:52:36] <hdm> any ideas why an instance when a new db every ~6 hours would spiral down on performance, with similar contents for each db, and inserts only into the latest

[04:53:36] <hdm> performance drops from 12k to 5k to 3k and now 1k per second over 48 hours, but across 7+ dbs, with only the latest being inserted

[04:54:03] <hdm> vsize/mapped shows ALL dbs accounted for, res is harder to tell

[04:54:15] <hdm> ex: 1357 1 0 0 0 2 0 681g 1363g 138g 0 critical_201208:30.5% 0 0|0 0|19 1m 3k 20 23:51:24

[05:08:53] <neiz> anyone famliar with the C# driver? Been trying to retrieve the only document in my collection (.FindOne()), but I cannot seem to coerce it into working

[05:19:26] <tpae> hello

[05:19:47] <tpae> using tailable collection, can i listen for changes in that collection, and execute a command?

[05:24:58] <hdm> tpae: sure, if you write some code that does that

[05:25:31] <tpae> let's say.. i can build a queue using tailable collection?

[05:29:06] <socket> from what i see, mongo does not require user/password. how does it make sure data isnt hacked?

[05:51:03] <hdm> socket: enable a username/password and ssl

[05:51:13] <hdm> google for mongo security turns it up pretty quick

[05:51:22] <hdm> worst case, ip acls

[06:06:53] <holly1> Anybody know any mongodb admin UI's that support batch updates?: http://www.mongodb.org/display/DOCS/Admin+UIs

[06:46:42] <ali__> any ideas on mongodb versioning?

[06:46:57] <ali__> I need to track changes only

[06:54:28] <IAD> ali__: pretend to be the slave, or to have a central point for communication with the base

[08:10:47] <socke> hey dudes, im migrating from rdbms to mongo, i have a user system and a gps tracking system that tracks the users (which are cars). when designing a mongo db, should the user system be a part of the tracking system?

[08:10:54] <socke> if so, how?

[08:11:46] <ron> socke: when moving to nosql dbs, your data structure should almost always conform to the way you want to query your data rather than how you want to save it.

[08:12:58] <socke> i see, thanks

[08:14:30] <socke> the thing is, i need to do user authentication first and then start track the user. at later point i'd like to see the path the user took on a given time frame

[10:21:57] <samurai2> hi there, which one is faster : 1. filtering before map/reduce or 2. filtering inside map function? thanks

[10:22:10] <NodeX> before

[10:23:53] <kali> samurai2: make sure you have a look at the aggregation framework too. anything you can do with it will be more efficient than map/reduce

[10:24:08] <NodeX> +1

[10:25:31] <samurai2> kali : but aggregation framework has limitation of only 16 MB on it's result right?

[10:26:24] <kali> samurai2: sure. but if you're generating a result bigger than 16MB, chances are you're abusing mongodb

[10:26:46] <kali> samurai2: mongo is designed for fast and small queries

[10:27:51] <samurai2> kali : and if I need to also filtering based on geospatial indexing it won't work as well right?

[10:28:23] <kali> samurai2: i'm not clear on what work and does not on geospatial indexing, i'm not using it

[10:28:41] <NodeX> you can't aggregate geospatial (yet)

[10:28:52] <NodeX> it's coming in the future apparently

[10:30:27] <samurai2> I'm hoping it can directly output the result into a certain collection and also we can decide the output key, not only key and value. :)

[10:30:47] <samurai2> and without output size limitation

[11:22:23] <topriddy> got this error from a java/morphia app. out of memory. mongodb top suspects http://pastebin.com/mFrgnWJz

[11:22:31] <topriddy> hoping someone is familiar with same

[11:24:08] <kali> topriddy: a quick fix is to increase the heap size with -Xmx

[11:25:14] <kali> topriddy: but you need probably need to get some insight about what's going on in your app too

[11:26:12] <kali> topriddy: try -verbosegc, to start, but then you may need something more invasive (try yourkit profiler for instance, there is a trial)

[11:27:12] <kali> topriddy: the fact mongodb appears so much in the stack is not that suspect, a database driver is likely to allocate lots of things

[11:28:21] <topriddy> kali: please do you use mongodb/morphia/java stacj?

[11:28:47] <kali> mongodb and java, not morphia

[11:29:07] <topriddy> kali: how do you skip morphia??

[11:29:25] <kali> topriddy: ... i just call the java driver

[11:30:51] <topriddy> kali: hmm...am suspecting maybe the jvm is simply the problem. it probably needs more heapspace.

[11:31:44] <kali> topriddy: you can try and up Xmx, but if the problem does not go away, you'll have to profile gc and memory usage

[11:32:39] <topriddy> kali: experience with profiling gc/memory usage low.

[11:33:18] <kali> topriddy: well, unfortunately, this is a the real world of the jvm in production :)

[11:34:29] <topriddy> kali: thanks man. you have been helpful. i sure hope mongodb not good enough for production is not the culprit though

[11:36:28] <kali> topriddy: this is memory consumption in the jvm, not in mongodb. i'm pretty sure there is no hidden dragons in the mongo java driver

[11:38:17] <kali> topriddy: usually this is due to bad application code (like overenthusiastic caching, leading to reference/memory leak) or heap size not big enough for the jvm working set

[12:14:14] <topriddy> kali: how do you even cache explicitly? i'm not doing any caching or read about it in mongo yet

[12:34:20] <fatninja> I have the var a = "string"; and I want to search all fields that contain a, so I should do something like db.users.find({name: /a/}); But we know that doesn't work, (using server side javascript)

[12:36:45] <NodeX> 1. that won't use an index

[12:37:06] <NodeX> 2. I am not fully sure that mongo reads global variables in it's query syntax

[12:39:01] <fatninja> NodeX, I use a driver to communicate to the db. But I think the solution is this: var regex = new RegExp(a, "g"); and {name: regex}

[12:41:35] <NodeX> perhaps, I dont do that sort of thing on the shell so I couldn't comment

[13:28:19] <NodeX> http://www.guardian.co.uk/technology/2012/nov/01/apple-samsung-statement

[13:28:20] <NodeX> lol

[13:28:42] <NodeX> 'Apple tried to argue that it would take at least 14 days to put a corrective statement on the site – a claim that one judge said he "cannot believe".'

[13:29:17] <NodeX> Apple = 10 year old child who's been told off and is now sulking

[13:45:39] <lapdis> Apple = Geniouses that is gaming the system

[13:45:55] <NodeX> lol

[13:46:11] <robo> hello: right now we're using replica sets and I'm seeing heavy I/O across all are replica set nodes because of the overhead of replication itself I believe. I'm trying to talk our DBA's into going with shards but they are a bit nervous because of the testing. Anyone have any gems of wisdom to drop on this?

[13:46:38] <lapdis> sry, I believe this channel is dedicated to bashing apple

[13:47:02] <robo> sounds challenging

[13:47:14] <NodeX> or to put another way to put it is.... Apple are overpriced manufacturors of equipment that use open source cores and close them off for their own personal gain

[13:47:45] <robo> yeah, that's a very profitable business model these days

[13:48:07] <NodeX> then they whine when they don't get their own way

[13:48:11] <NodeX> like children :)

[13:48:23] <skot> robo: are you on mms, or do you have mongostat + iostat numbers for your nodes?

[13:48:36] <skot> robo: if so, please post links to them.

[13:48:37] <robo> skot, yes to both

[13:49:27] <robo> skot, https://mms.10gen.com/dashboard/view/4f5f957a87d1d86fa8b8821c#chartLinkMinuteOneHour

[13:50:16] <skot> what time period are you seeing high io?

[13:50:45] <skot> within the last hour?

[13:51:35] <robo> 4am we had a spike. Yesterday at 7am-10am

[13:51:53] <skot> timezone?

[13:51:57] <robo> ET

[13:52:01] <robo> it's 9:49am right now

[13:52:03] <robo> sorry :-)

[13:54:01] <robo> skot, the actual disk i/o graphs are in zenoss

[13:54:11] <robo> let me see if i can get them uploaded

[13:55:22] <robo> skot, http://imgur.com/yQ9av

[13:55:34] <skot> If you look at MMS: https://mms.10gen.com/host/chartReplicaSetHosts/4f5f957a87d1d86fa8b8821c/mongodbprod#chartLinkMinuteTwelveHours

[13:56:15] <skot> you can see that you did lots of writes (updates/inserts) at 3:30am ET

[13:56:36] <skot> and those writes were replicated causing all nodes to do heavy io

[13:56:54] <robo> yeah, about right. They run these import jobs that do massive inserts

[13:56:54] <skot> looks like it last 15-20 min

[13:57:17] <robo> skot, how can you tell they are doing inserts by those graphs?

[13:58:10] <skot> In MMS, yes

[13:58:15] <robo> n/m, i see that opcounter graph that shows what it was doing when you hold your mouse over it

[13:58:16] <skot> look at the opcounter

[13:58:22] <skot> yep

[13:58:44] <skot> looks like normal and expected behavior, and disk io.

[13:59:00] <skot> I'd suggest throttling your import if you want it to tax your systems less.

[13:59:09] <robo> so they have to run these to keep our data fresh. Problem is with replica sets is that it has to replicate to all nodes. I figure that if you use shards it will cut down on the overall i/o used

[13:59:33] <skot> it will redistribute it, but the same data needs to be written and repliced

[13:59:52] <skot> You can also just add more/faster disks.

[14:00:23] <skot> I wouldn't suggest sharding for this issue, just making the process less impactful or adding more resources to the nodes.

[14:00:32] <robo> the backend data disks are fiber channel luns to EMC VNX

[14:00:53] <robo> probably can't get much faster unless we go with DAS

[14:01:25] <robo> skot, what do you think of sharding overall? I talked to a few people that hit some major bugs with it

[14:01:45] <robo> i think it's a must-do for us because it's going to be hard to scale with replica sets as we put more traffic on it

[14:01:47] <skot> It has its uses but it depends what your problems are.

[14:02:25] <skot> your set doesn't seem to have traffic patterns where a single instance can't handle the load, does it?

[14:02:26] <robo> okay, so first hit problems then think of sharding

[14:02:44] <robo> nope. But we only have about 20% of our traffic going to it right now (slowly moving off oracle.)

[14:03:07] <robo> ugh, brb meeting

[14:03:09] <skot> The replica set looks mostly idle

[14:03:51] <skot> there is probably lots you can optimize in your app/queries/indexes before you need sharding

[14:03:57] <skot> hard to guess without more info

[14:09:05] <kali> throttling imports on a production db is a must, or mongo will eat all it can until it chokes

[14:19:12] <xcat> Can I downgrade to 2.0.0 on Ubuntu?

[14:19:23] <xcat> 2.2.0 I mean

[14:19:26] <xcat> Using the package manager

[14:32:27] <wiseguysonly> I'm a bit unsure how I would do this in Mongo -> SELECT SUM(field1) GROUP BY DAY(date_field). So I get all days available

[14:56:14] <kali> wiseguysonly: look at the aggregation framework

[14:56:36] <wiseguysonly> Hey kali cheers - just having a go with mapreduce now

[14:56:47] <kali> wiseguysonly: AF is more efficient

[14:56:59] <kali> wiseguysonly: and you're right in the use case

[14:57:21] <mrpro> why are writes faster when the table is empty

[14:57:21] <wiseguysonly> I think my current version of mongo is 2.1

[14:57:26] <mrpro> as more and more stuff gets inserted

[14:57:30] <mrpro> % locked increases

[14:57:53] <kali> wiseguysonly: 2.1 is a development version, but latest 2.1 have the AF

[14:58:10] <wiseguysonly> ok, I'll get my head in those docs then :)

[14:58:13] <kali> mrpro: index maintenance is more expensive when its size grows

[14:58:20] <mrpro> thats terrible

[14:58:22] <kali> mrpro: ln(n)

[14:58:25] <mrpro> my tps is not even high

[14:58:32] <mrpro> and then i see queued writes

[14:58:36] <mrpro> global lock through the roof

[14:58:49] <mrpro> not even that many records

[14:58:55] <kali> mrpro: check that you RAM is big enough to have you indexes in there

[14:59:04] <mrpro> my ram is huge

[15:00:28] <kali> and the indexes ? :)

[15:01:23] <mrpro> kali

[15:01:33] <mrpro> my process uses only like 1.2GB

[15:01:38] <mrpro> i got 16GB on box

[15:07:21] <kali> mrpro: process memory is irrelevant. the indexes will be in the kernel cache and buffer zones

[15:07:37] <kali> mrpro: look at stats() on your collections

[15:07:45] <mrpro> while things are running?

[15:07:50] <kali> mrpro: or at least the collection you're inserting to

[15:07:53] <kali> yes

[15:17:24] <mrpro> kali ok

[15:17:28] <mrpro> i'll try to ramp it up now

[16:21:47] <mrpro> hey kali

[16:21:50] <mrpro> there?

[16:24:29] <gshipley> Anyone know if 2.2.1 is just a bugfix release with the 3-4 issues listed in the change doc? I am not sure I am reading it correctly?

[16:41:43] <_m> According to their server's JIRA: Odd version numbers are unstable dev branches (1.1, 1.3, 1.5) Even numbers are stable branches and only get bug fixes, etc... (1.2, 1.4, 1.6)

[16:41:53] <_m> https://jira.mongodb.org/browse/SERVER

[16:43:27] <_m> Which would mean 2.2.1 is a mostly-bugfix release.

[16:43:56] <NodeX> I tink it is

[16:44:30] <_m> ^ Succinct answer. =)

[17:01:22] <mikejw> with doctrine odm is it possible to remove a reference to a document from within a executed querybuilder cursor?

[17:26:32] <NodeX> 42

[17:31:21] <wereHamster> does mongodb sort by unique documents or can a document appear multiple times if the query matches it multiple times?

[17:31:53] <Derick> it can return documents multiple times, but not for the reason you mention

[17:32:21] <IAD1> NodeX: Don't Panic

[17:32:41] <wereHamster> I basically have: Game = { players: [...]}, each player has a score, and I want to rank games by the highest score any player has reached in it.

[17:33:17] <mikejw> on second thoughts I guess it doesn't really make sense retrieving objects from the cursor more than once :)

[17:33:19] <wereHamster> or should I create a field on the Game (maxScoreByAnyPlayer) and sort by that?

[17:33:38] <IAD1> yep, or map-reduce

[17:33:57] <hdm> mikejw: the lazy way is just index on the score

[17:34:14] <mikejw> score?

[17:34:20] <hdm> and then db.players.find().sort({ score : -1 }).limit(1)

[17:34:23] <ron> score!

[17:34:25] <hdm> set your score index to be -1

[17:34:35] <hdm> and that uses the index to always find the highest ranked player(s)

[17:34:42] <wereHamster> db.players don't have score. the players in the game have scores.

[17:34:52] <hdm> sure, wherever you store it

[17:34:53] <mikejw> hdm: I think you're confusing me with another mongolian :)

[17:34:54] <hdm> index it

[17:34:58] <IAD1> "game.score"

[17:35:07] <wereHamster> so it would be something like: db.games.find(...).sort({ 'players.score': -1 })

[17:35:29] <hdm> mikejw: woops, thanks

[17:35:35] <mikejw> np :)

[17:35:53] <hdm> wereHamster: yup, just index it with db.games.ensureIndex({ 'players.score' : -1 })

[17:36:00] <wereHamster> I guess it's probably better to make a separate field on the Game document. Makes the index smaller.

[17:36:00] <hdm> and the default sort order would be descending

[19:35:21] <andrecolin> hello everyone

[19:37:05] <andrecolin> have a setup running auth, i am using a simple ruby script, i can login to the admin db, change to my test db, can query the collections and i see the system.users collection, how can i access the user names in that collection using my code

[19:38:43] <hdm> treat it as any other collection

[19:39:03] <hdm> db.system.users.find()

[19:39:19] <hdm> collection name "system.users" database "admin"

[19:40:57] <andrecolin> i can do the db.system.users.find() fine in the shell console

[19:41:19] <andrecolin> but something is happening when i try it with code

[19:44:25] <crudson> andrecolin: 'something'?

[19:45:25] <andrecolin> i can do "my_db.collection_names" and see the "system.users" collection

[19:45:43] <_m> andrecolin: Pastie or gist the error you're getting and we'll be better able to help you.

[19:46:09] <_m> Perhaps also the offending lines

[19:46:19] <andrecolin> i tried my_db["system.users"].each

[19:46:36] <_m> Okay, that's still not helpful.

[19:48:38] <_m> Is my_db really a hash? How did you create the connection? What are the actual errors? Pastie/gist so we have enough context to actually help.

[19:48:58] <andrecolin> let me do a pastie

[19:49:48] <bricker> Hello - at /var/lib/mongo I have a bunch of files of databasename.1, databasename.2, etc. Are these backups (i.e., safe to delete the older ones)?

[19:50:15] <bricker> each is exactly 2GB big

[19:50:38] <crudson> my_db["system.users"].each should be my_db["system.users"].find.each

[19:51:44] <crudson> bricker: no! That just means that your db is at least nfiles-1 * 2GB in size

[19:51:54] <andrecolin> http://pastie.org/5168266

[19:52:23] <crudson> bricker: that's your data, it gets split up into files of increasing size, maxing at 2gb(ish), but also preallocated before you hit the limit

[19:52:38] <bricker> crudson: I see! Thanks

[19:53:05] <crudson> bricker: not quite the equation I gave as they start out small, but that's the general idea

[19:53:48] <crudson> andrecolin: my_db["system.users"].each should be my_db["system.users"].find.each

[19:54:07] <andrecolin> ah, let me try that

[19:56:42] <crudson> bricker: you'll see 64M, 128M, 256M, 1G, 2G, 2G, 2G, 2G....... http://www.mongodb.org/display/DOCS/Excessive+Disk+Space#ExcessiveDiskSpace-DatafilePreallocation

[19:57:08] <andrecolin> crudson: my test box just went south, need to reboot, will be back with status

[20:12:21] <mrichman> Where can I find a sample database?

[20:14:55] <bricker> Forgive me, I'm new to mongodb and am afraid of deleting something that I didn't mean to. Will this query remove any entries with the timestamp before October 1? db.stream_minutes_events.remove("t" : { $lt: ISODate("2012-10-01T00:00:00.000Z") })

[20:15:17] <bricker> er...

[20:15:26] <bricker> Forgive me, I'm new to mongodb and am afraid of deleting something that I didn't mean to. Will this query remove any entries with the timestamp before October 1? db.stream_minutes_events.remove({ "t" : { $lt: ISODate("2012-10-01T00:00:00.000Z") } })

[20:15:37] <bricker> oopsie

[20:15:39] <bricker> This query:

[20:15:40] <bricker> db.stream_minutes_events.remove({ "t" : { $lt: ISODate("2012-10-01T00:00:00.000Z") } })

[20:15:43] <bricker> :)

[20:16:35] <naquad> can i use mongodb for high load project (i'm talking about 10k-25k concurrent requests)?

[20:19:06] <hdm> naquad: yes, you might need a few shards, and make sure queries are working from indexes in ram, SSDs help with high throughput too

[20:19:15] <crudson> bricker: I'd shove a load of documents with varying dates into a test collection and check it.

[20:19:26] <naquad> hdm, thanks

[20:19:32] <naquad> also what about this: http://pastebin.com/FD3xe6Jt - true or false?

[20:20:21] <hdm> its someone's experience, everyone has their own opinion, and until a few months ago i had unrelated headaches that drove me crazy

[20:20:44] <hdm> but mongo 2.2 has been a huge improvement overall and msot of the issues i ran into are now just a matter of allocation/sizing up front

[20:20:46] <crudson> that's also a pretty old blog that has done the rounds a lot

[20:21:55] <hdm> the safe writes thing is overblown, you can enable it in your driver, or not, it depends if you need confirmation of every write, some uses cases dont

[20:22:25] <hdm> the sharding stuff is still suboptimal imo, ive seen weird/bad balancing, and adding/removing shards is a pain, especially compared to something like elastic

[20:23:47] <hdm> fwiw, i run a ~1.5 billion record / 3.5Tb dataset for a hobby project, my roadblocks tend to be things like concurrent cpu use since im focusing on single/huge nodes, or slow disk io once indexes go over ram

[20:24:25] <mrichman> that sounds like quite a hobby project -- what is it?

[20:24:27] <hdm> using mongo properly with horizontal scaling/replica sets/sharding would work better if i could afford to

[20:24:54] <hdm> mrichman: scanning the internet for ~6 months or so

[20:25:01] <hdm> banners of various services

[20:25:31] <hdm> https://speakerdeck.com/hdm/derbycon-2012-the-wild-west <- not much there about the mongo setup, but thats the source of it

[20:30:55] <mrichman> are there any sample databases out there to speak off, something like Northwind in SQL Server? Just need to sink my teeth into something real

[20:31:16] <mrichman> s/off/of

[20:32:16] <NodeX> not really

[20:32:42] <NodeX> one main point of nosql style data stores is that it's your choice of data

[20:33:42] <mrichman> understood…but say i wanted to play with map reduce on a large dataset…kinda hard unless i contrive the data myself….seems like there should be some archetypal examples out there

[20:34:00] <hdm> mrichman: pretty easy to load up things like us census data

[20:34:13] <NodeX> any csv / json doc will load easy

[20:34:25] <NodeX> doc / collection of docs

[20:34:49] <hdm> http://aws.amazon.com/publicdatasets/ < lots of stuff there and there -> https://explore.data.gov/catalog/raw/

[20:34:57] <mrichman> thanks

[20:40:23] <TecnoBrat> hdm: this is yours?

[20:41:45] <hdm> TecnoBrat: the slide deck yeah

[20:41:59] <TecnoBrat> neat

[20:42:04] <TecnoBrat> sounds like a fun hobby project

[20:42:21] <hdm> totally :)

[20:42:27] <TecnoBrat> normally ISPs just kick you rather than letting you do educational stuff like that

[20:42:32] <TecnoBrat> so glad you found a proper home

[20:42:40] <hdm> its been a long path to find a home for it

[20:42:51] <hdm> ~2200 abuse reports or something now

[20:42:59] <TecnoBrat> is the data public?

[20:43:17] <hdm> not yet, but working with the guys at Shodan to incorporate it

[20:43:23] <TecnoBrat> mars rovers!

[20:43:25] <hdm> ill dump out some sample sets / aggregate data soon

[20:44:26] <hdm> TechCel: https://twitter.com/hdmoore/status/262080277915004928/photo/1 < two weeks ago

[20:44:33] <hdm> got to go see them up close and personal at jpl

[20:45:12] <TecnoBrat> nice

[20:46:10] <TecnoBrat> hdm: wow ... people really don't understand security

[20:46:30] <TecnoBrat> even technical people (or should be ... cisco for example)

[20:46:38] <LouisT> what is this security you speak of!?

[20:48:12] <hdm> hmm, so back to mongo, how do you do the equivalent of "use database" in JS scripts?

[20:48:20] <TecnoBrat> hdm: thats fun as hell

[20:48:31] <TecnoBrat> do you currently run a sharded cluster for mongo?

[20:48:39] <TecnoBrat> whats your mongo setup like?

[20:49:36] <hdm> single server for most of it, 2x8-core xeon, 192gb ram, 1T of 4xSSD, 40Tb NAS, 3x4Tb direct HDDs

[20:50:14] <hdm> second box is 2-6-core, 96gb ram, same storage specs, used to prototype stuff

[20:50:51] <hdm> sharding is nice on single-boxes since you get to use multiple cpus for queries that do calculations

[20:50:54] <TecnoBrat> 3.5TB dataset you said?

[20:51:01] <hdm> but the disk i/o is still the bottleneck

[20:51:07] <TecnoBrat> right

[20:51:20] <hdm> yeah, changes depending on what set, im reloading it to a differetn schema on one of the boxes almost all the time

[20:51:25] <TecnoBrat> are you running multiple mongod instances etc .. or are you using VMs?

[20:52:03] <hdm> VMs failed miserably (kvm and vmware), weird storage issues, multiple mongods was ok, but sharding fell on its face to the point it wouldnt scale

[20:52:30] <hdm> the primary shard ended up with 3 x the data of the others and the disk i/o was maxed, so it couldnt slough it off and handle incoming inserts fast enough

[20:53:09] <hdm> even with only _id indexes, the read overhead got crazy, at the time drives were raid-0 x 4, so decent number of spindles

[20:53:45] <hdm> the 1T ssds (raid-0 as well) get 1000MB r/w and 25,000 IOPS, but only go to 1T, which doesnt work so well for the current size

[20:54:42] <TecnoBrat> right

[20:54:44] <TecnoBrat> interesting

[20:55:01] <TecnoBrat> so sorry, I'm confused, you have some sharding setup now? or not at all?

[20:55:20] <hdm> not any more, it choked after 3 weeks with the primary unable to hand off fast enough

[20:55:44] <hdm> currently setup is to create new dbs on the fly for each month and put unique contraints into a multi-key _id

[20:56:00] <hdm> limits it to a single index, keeps indexes under 32Gb or so

[20:56:07] <hdm> (per db)

[20:56:45] <hdm> its reloading the data still, but going way faster, better than mongo's normal downward spiral as the index goes beyond ram, your insert performance dies

[21:14:30] <Guest46986> Following this information: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/

[21:14:30] <Guest46986> I've tried time and time again. And all I can get is: exception: need most members up to reconfigure, not ok.

[21:14:30] <Guest46986> All three instances are up, followed the documentation just fine.

[21:14:30] <Guest46986> I was able to run rs.initiate() once, looked like it worked, so I did it again for the other instance,

[21:14:30] <Guest46986> and the errors rained.

[21:14:30] <Guest46986> Of course I did some looking around, I noticed mention of a "\etc\hosts" <--- The heck is that? Do I need it? Is there something else I'm supposed to review that I might have missed?

[21:14:30] <Guest46986> Running Windows Server 2008 R2 64-bit

[21:14:31] <Guest46986> Thank you.

[21:15:14] <frsk> Have you used rs.add() at some point?

[21:16:48] <Guest46986> I do belive that is when ti throws the error.

[21:17:36] <frsk> Does rs.conf() mention the other servers at all?

[21:18:26] <Guest46986> Yes. Mensions 1 of the name "rs0"

[21:18:46] <Guest46986> WHich would be the first one I initiated.

[21:19:01] <frsk> And the one which you're running the command on?

[21:19:16] <Guest46986> correct

[21:19:27] <frsk> Run rs.add("hostname") to add the other servers as well

[21:19:39] <frsk> If it fails, what does it say?

[21:20:54] <Guest46986> Cannot use localhost.

[21:21:22] <Guest46986> Trying 128.0.0.1

[21:21:30] <frsk> You're specifying a port as well

[21:21:30] <frsk> ?

[21:21:39] <hdm> add a hostname to /etc/hosts after 127.0.0.1

[21:21:48] <hdm> myserver or something, this will elt you change it later

[21:21:52] <hdm> and not break your rs

[21:22:09] <Guest46986> At the moment. Follow what you posted.

[21:22:15] <Guest46986> NOt at***

[21:22:35] <hdm> $ cat /etc/hosts

[21:22:35] <hdm> 127.0.0.1 localhost shard1 shard2 shard3 shard4 config1 config2 config3 config4

[21:22:46] <hdm> ^ example of how to test sharding/replicas/config locally

[21:24:10] <Guest46986> With or without port yield the same error.

[21:24:29] <Guest46986> exception: need most members up to reconfig...

[21:28:16] <Guest46986> If anyone is available to help me solve my issue, I've posted it here as well: https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/QTqV1G3LUGU

[21:34:26] <Guest46986> Following this information: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/

[21:34:26] <Guest46986> I've tried time and time again. And all I can get is: exception: need most members up to reconfigure, not ok.

[21:34:26] <Guest46986> All three instances are up, followed the documentation just fine. What's the deal?

[21:34:26] <Guest46986> I was able to run rs.initiate() once, looked like it worked, so I did it again for the other instance,

[21:34:26] <Guest46986> and the errors rained.

[21:34:27] <Guest46986> Of course I did some looking around, I noticed mention of a "\etc\hosts" <--- The heck is that? Do I need it? If so why didn't I come across it

[21:34:27] <Guest46986> in the installation or mongo setup documentation? Is there something else I'm supposed to read that I might have missed?

[21:34:28] <Guest46986> Running Windows Server 2008 R2 64-bit

[21:34:28] <Guest46986> Thank you.

[21:35:15] <cyberfart> I have a quick question: are "$in" queries considered equality test or range query? I'm trying to work out the better compound_index ordering

[21:39:49] <TecnoBrat> cyberfart: its not a range ... its an array. You are finding any records which have a value that exists in the array

[21:40:09] <TecnoBrat> "$in: [1, 4]" means that the field is either a 1 or a 4

[21:54:40] <Guest46986> Following this information: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/

[21:54:41] <Guest46986> I've tried time and time again. And all I can get is: exception: need most members up to reconfigure, not ok.

[21:54:41] <Guest46986> All three instances are up, followed the documentation just fine. What's the deal?

[21:54:41] <Guest46986> I was able to run rs.initiate() once, looked like it worked, so I did it again for the other instance,

[21:54:41] <Guest46986> and the errors rained.

[21:54:41] <Guest46986> Of course I did some looking around, I noticed mention of a "\etc\hosts" <--- The heck is that? Do I need it? If so why didn't I come across it

[21:54:41] <Guest46986> in the installation or mongo setup documentation? Is there something else I'm supposed to read that I might have missed?

[21:54:42] <Guest46986> Running Windows Server 2008 R2 64-bit

[21:54:42] <Guest46986> Thank you.

[21:55:39] <raja> how can i do a query where i don't want a unique on a field but a limit.. for example.. i want to query all stories but limit it to 2 per userid

[21:56:50] <Guest46986> db.user.findOne()?

[21:58:29] <cyberfart> ty TecnoBrat , I will put that field before range fields on the index then

[21:58:56] <LouisT> raja: only thing that i can come up with is something using $where =/

[21:59:19] <raja> hmmm.. any further hints into what to supply $where

[22:01:02] <LouisT> raja: not of the top of my head, no

[22:02:19] <hdm> raja: you can do that via aggregation framework

[22:02:30] <hdm> raja: if the easy ways arent doable

[22:02:54] <Guest46986> This is depressing. Can someone at least shoot me a link to a decent tutorial on how to set up replication sets?

[22:03:06] <raja> hdm: looking at aggregation docs but don't see it.. aggregation supports distinct by key but not a limit by key value

[22:03:13] <hdm> Guest46986: you ignored previous advice and then pasted the same thing twice, no thanks

[22:03:30] <hdm> raja: you can create any key combination you want and then limit it

[22:04:13] <Guest46986> ? I'm assuming you mean: <hdm> add a hostname to /etc/hosts after 127.0.0.1

[22:04:20] <hdm> http://pastie.org/5168971

[22:04:21] <Guest46986> Had no idea you where talking to me.

[22:04:40] <Guest46986> Because I clearly stated, I don't have those directories in my post.

[22:04:43] <hdm> raja: ^ add a {{ '$limit' => 2 }} after $project 'ing your unique key

[22:05:04] <raja> hdm: cool thanks for the example

[22:07:01] <Guest46986> Look I'm sorry I upset you, was not my intention. I just figured you where talking to someone else. Don't see why you would being up /etc/hosts is you where.

[22:10:40] <Guest46986> cat /etc/hosts returns: etc is not defined.

[22:11:20] <ron> ah?

[22:11:50] <Guest46986> I do not understand where I should put these directories or configure them. THey where not covered in the documentation I followed.

[22:13:12] <LouisT> Guest46986: this is on a windows server?

[22:13:19] <Guest46986> Correct

[22:13:26] <LouisT> then you won't have an /etc/hosts

[22:13:45] <Guest46986> This is what I figured as well. But people keep teling me I need it

[22:13:58] <Guest46986> Like that hdm guy lol

[22:14:58] <Guest46986> Well thanks for responding at least. Then is there somehting I can do? Some other documentation I can follow for windows?

[22:15:09] <LouisT> "Configure DNS names appropriately"

[22:15:14] <LouisT> you'd have to do that on windows

[22:15:29] <LouisT> well, i'm not aware of a /etc/hosts equivalent on windows

[22:15:37] <LouisT> for windows*

[22:16:39] <jcims> c:\windows\windows32\drivers\etc\hosts

[22:17:03] <LouisT> oh really? that's cool

[22:17:06] <Guest46986> Really?

[22:17:07] <jcims> derp windows32 -> system32

[22:17:11] <jcims> yass'm

[22:17:29] <jcims> there is probably a hosts.sam file (sam = sample)

[22:17:34] <LouisT> Guest46986: http://en.wikipedia.org/wiki/Hosts_file

[22:18:50] <Guest46986> Sweet thanks a ton.

[22:43:45] <Raynos> Whats a good way to store meta data on a collection

[22:43:49] <Raynos> like for example "last map reduced"

[23:05:25] <Gargoyle> evenin'

[23:16:09] <jrdn> random question

[23:16:27] <ron> do pigs live on the moon?

[23:17:29] <jrdn> if a value is known in a column, is it the same performance doing, x = 1; x++; db.foo.update(criteria, { $set: { bar: x } }

[23:17:42] <ron> mine was more random.

[23:17:44] <jrdn> vs doing an $inc on it

[23:18:05] <jrdn> in regards to mongo performance

[23:18:38] <ron> I imagine $inc would be safer.

[23:18:42] <jrdn> i was just wondering if i should build increment functionality in my domain model persistence manager since the update needs the domain model

[23:19:12] <Zelest> somewhat odd question; a server with only ~10MB/s write-speed, is that usable for mongodb?

[23:19:16] <Zelest> or should I find another host?

[23:19:56] <jrdn> if you have bad IO and high traffic, you'll cry

[23:19:59] <ron> jrdn: using the $inc is an atomic operation. using the first option is not. in a multi-threaded environment, that matters.

[23:20:13] <jrdn> and @ron, pings don't live on the moon.. your mom is on earth

[23:20:14] <jrdn> OHH!!!

[23:20:15] <jrdn> jk

[23:21:22] <Zelest> yeah, I guess the proper answer is "it's up to your applications needs" ..

[23:21:24] <jrdn> .. set is atomic

[23:21:31] <jrdn> ?

[23:21:53] <Zelest> i get to choose between low disk I/O or high network delays. :(

[23:21:53] <ron> set is atomic. your calculation isn't.

[23:21:58] <jrdn> Zelest, yeah we're on EC2 on a regular EBS volume and were getting high IO alerts every minute when data is being flushed to disk

[23:22:18] <jrdn> @ron, yah that's why I said in regards to mongodb

[23:22:37] <ron> but.. but...

[23:22:42] <ron> oh, I'm going to sleep.

[23:23:08] <jrdn> having my application do $x += 10, then update with $set vs $inc with value of 10 is micro micro micro

[23:23:39] <jrdn> anyway, gnight

[23:27:39] <Raynos> What should the sort Object parameter look like in mapReduce

[23:27:48] <Raynos> I know sort arrays look like [[key, direction]]

[23:27:53] <Raynos> does it want { key: direction } ?

[23:30:19] <bricker> When I delete entries from a collection, when does that data get removed from disk?

[23:39:39] <wereHamster> bricker: at an unspecified time later.

[23:42:23] <bricker> wereHamster: Thanks, just wanted to make sure I didn't have to do it manually or something :)

[23:44:26] <wereHamster> that unspecified time may also be never. Why do you care?

[23:45:01] <bricker> wereHamster: Oh... because we have some data that is taking up a huge amount of space on our server, and we're just trying to clean it up

[23:45:27] <wereHamster> you may want to try db.repair()

[23:45:39] <bricker> wereHamster: old data that we no longer care about (stats collected from Cube)

[23:45:45] <bricker> wereHamster: I'll look at it, thanks again

[23:48:46] <wereHamster> ah, cube. Do you still use it?

[23:55:30] <bricker> wereHamster: yes but we only need stats month-to-month

Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 1st of November, 2012