PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 6th of August, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:00:17] <TkTech> sapht: I presume so. I'm on horrendous internet, can't load a webpage to check.
[00:00:32] <sapht> turns out i can check collection.options()['capped'] as well
[00:00:41] <sapht> i'll go with that
[00:06:43] <tomoyuki28jp> What's the equivalent function to mysql POWER() aka POW()?
[00:08:46] <sapht> i think there's Math.pow
[00:09:21] <tomoyuki28jp> sapht: thanks, I will try that one.
[00:09:56] <tomoyuki28jp> sapht: works perfect! thanks.
[00:10:08] <sapht> nice np :)
[00:10:13] <tomoyuki28jp> :)
[00:10:29] <sapht> you can check out spidermonkey's stdlib if you have similar problems
[00:10:52] <sapht> same js engine to power firefox
[00:12:11] <tomoyuki28jp> sapht: I see, that's really useful info to me, thanks!
[02:01:20] <tomoyuki28jp> How can I do thing like this? db.news.find().sort(Math.pow(((new Date() - news.updated_at) / 3600000) + 2, 1.5));
[02:05:51] <tomoyuki28jp> To implement reddit's like ranking algorithm, using map reduce is the only way? It should be slower than the native function, right?
[02:06:55] <tomoyuki28jp> Anybody watching this?
[02:07:04] <tomoyuki28jp> sapht: you there?
[02:08:14] <tomoyuki28jp> :(
[02:23:47] <tomoyuki28jp> I'm back.
[05:41:53] <lupisak> Hi, everyone. what do you guys recommend? mongoid or mongomapper?
[06:15:04] <wereHamster> lupisak: whatever floats your boat.
[06:18:56] <lupisak> wereHamster: well none of them do at the moment :)
[06:52:54] <jwilliams_> how can we know that if balancer finishes its task?
[06:53:36] <jwilliams_> any key word can be used to check if balancing task is accomplished?
[06:54:18] <jwilliams_> with the v 2.0.1, only [balancer] can be seem when it is being executed.
[08:16:59] <Turicas> what's the best way to ensure that a specific index was used in my query?
[08:17:45] <ron> check in the past or ensure in the future?
[08:18:57] <Turicas> ron, the problem is that I want indexOnly=true for some query but it is not using the index I want and then indexOnly=false... (if it uses, probably indexOnly = true)
[08:21:21] <ron> Turicas: http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24hint
[08:23:18] <ron> assuming that's what you were asking about.
[08:25:29] <Turicas> ron, thank you! :)
[08:25:29] <ron> Turicas: np
[09:02:23] <jwilliams_> is there any possibilility that during moving chunk the docs count value may be reduce? e.g. original count is 123 during insert there is a chunk moving, then count() again during the procedure, the count become 100?
[09:02:28] <jwilliams_> temperary
[09:11:09] <[AD]Turbo> hi there
[09:35:08] <jwilliams_> i have clients executing insert command with java driver 2.6.5. at the same time, server is running balancer. after insert processes finish, i notice the collection count value may fluctuate
[09:35:34] <jwilliams_> e.g. count before doing insert is 200; first count during insert shows 180, then increase back to 200, then 210. and then again drops to 180.
[09:35:51] <jwilliams_> is this normal (e.g. because of moving chunk)?
[10:03:52] <Turicas> Anyone using Python here? I've just released mongodict: http://pypi.python.org/pypi/mongodict (use MongoDB as a key-value store)
[10:13:27] <newcomer> hi folks.. I'm new to mongodb.
[10:13:38] <newcomer> I have the below structure under RDBMS that I plan to move to mongodb. However when I check the siz of the mongo db I can see it bloats RDBMS - table hotel_basic_info (hotel_id), table hotel_dtl_info, table hotel_images, table hotel_facility, table hotel_blocked_dates I tried creating a Mongodb as hotel_id [basic+dtl+images+facility+blockeddates]
[10:13:49] <newcomer> any ideas ?
[10:16:20] <NodeX> what's the problem?
[10:17:36] <newcomer> the RDBMS shows a total size of 1.25GB odd but mongo goes to 22GB+
[10:20:23] <NodeX> mongo pre-allocates
[10:20:44] <NodeX> how many documents are in the collection?
[10:20:53] <newcomer> yes I read those doc links & hence set prealloc 0's to false
[10:21:04] <newcomer> also set small size to true
[10:21:22] <newcomer> ran a repair & compress too
[10:21:34] <NodeX> is journaling on?
[10:21:37] <newcomer> yes
[10:21:43] <newcomer> master-slave mode
[10:21:55] <NodeX> how many documents?
[10:22:03] <Derick> master-slave or replicaset?
[10:22:15] <newcomer> 60533 collections
[10:22:24] <newcomer> @Derick: master-slave
[10:22:30] <NodeX> 60k collections?
[10:22:36] <NodeX> do you mean documents?
[10:22:38] <Derick> newcomer: you really want a replicaset I think
[10:23:16] <newcomer> well I susepect my logic.. maybe if you could help me clear the air
[10:24:15] <NodeX> documetns = rows
[10:24:20] <NodeX> collections = tables
[10:24:38] <newcomer> for each hotel I create a collection then for each collection 1 doc - basic + 1 doc dtl + many docs for imgs + many docs for facility
[10:25:13] <NodeX> does your RDBMS have 65k tables?
[10:25:29] <newcomer> the idea being I pick a collection & get all its related info (basic+dtl+imgs+facility) in 1 shot
[10:25:42] <newcomer> no RDBMS has 5 tables
[10:25:45] <jwilliams_> is db.runCommand({ flushRouterConfig : 1 }) the right command to flush the cache so that the newer inserted data can be displayed?
[10:25:58] <Derick> newcomer: you should stick all of those in one *document* and all documents containing a hotel + info each, in one collection
[10:26:06] <NodeX> +1
[10:26:16] <NodeX> 60k + collections is madness
[10:27:22] <NodeX> each collection carries an overhead with the journaling iirc which is why your data is so large I would imagine
[10:31:13] <newcomer> I'm running a collection from hotel1 to hotel60k & each of those having their own docs which are [info+basic+imgs(multiple)+facilities(multiple)]
[10:31:39] <newcomer> basic idea being to avoid sql joins & get all data at 1 go
[10:31:50] <NodeX> use nested elements
[10:32:03] <NodeX> 1 collection, multiple documents with nested elements
[10:32:39] <newcomer> any place where I can refer / JAVA codes to refer ?
[10:33:31] <NodeX> refer to what?
[10:34:07] <newcomer> docs
[10:34:20] <NodeX> I dont understand, docs on what?
[10:34:58] <newcomer> I have no clue on nested elements ... a search on the mongo site neither helps
[10:35:21] <newcomer> where can I start to read up on this ?
[10:35:28] <NodeX> do you know json ?
[10:36:33] <newcomer> I'm mainly from the RDBMS world.. but can read JSON ... extreme basics
[10:36:46] <newcomer> also know a bit of JAVA
[10:37:00] <NodeX> you can think of a documetn as a json object
[10:37:18] <newcomer> yes..
[10:37:23] <NodeX> and the same as you can nest Json you can nest in mongo
[10:37:53] <NodeX> perhaps a document might look like this.
[10:37:57] <newcomer> nest JSON ... gone for a toss ... sorry
[10:38:53] <NodeX> {name:"Hotel 1", attributes : {images: [1,2,3,4],info:"SOme imformation",facilities:["swimming pool","tennis"]}}
[10:39:24] <newcomer> ok .. clear now ... conceptually
[10:39:28] <newcomer> thanks
[10:40:34] <newcomer> now in JAVA should I say doc1.put("hotel1", doc1info) ??? where doc1info is another doc
[10:41:14] <NodeX> a nested document yes
[10:41:26] <newcomer> ok .. gr8
[10:44:45] <newcomer> also once I manage to insert .. how do I query ? say I want only images(multiple) for hotel10
[10:45:22] <ron> have you read the documentation at all?
[10:45:29] <ron> tutorials? anything?
[10:46:29] <newcomer> basics doc have been successful in querying, updating, deleting ... what abt nested ?
[10:48:05] <newcomer> also what abt the perfomance impact if I use nesting ?
[10:49:18] <NodeX> you get to the nested by querying the document
[10:49:30] <newcomer> ok..
[10:49:34] <NodeX> the nested part is obtained using dot notation
[10:49:55] <newcomer> ok... hotel10.getImages() something like that ..
[10:50:08] <NodeX> no
[10:50:34] <NodeX> I dont know your programming language so I cant offer advice
[10:51:40] <newcomer> I'm having some JAVA knowledge
[10:52:31] <NodeX> Perhaps it's a good idea to either employ someone who knows what they're doing or read some books
[10:59:21] <newcomer> hmm
[11:03:29] <newcomer> I'll do 1 thing.. create a collection as "Hotels" then with each for each hotel I'll retrive the basic,details, images, facility as separate docs & then put em into the hotel doc ... that way I shall have only 60k docs with the infos nested
[11:04:07] <NodeX> you dont need to put them into separate docs
[11:14:02] <newcomer> k ... then
[11:14:24] <newcomer> {name:"Hotel 1", attributes : {images: [1,2,3,4],info:"SOme imformation",facilities:["swimming pool","tennis"]}}
[11:15:02] <NodeX> yes, that's what I suggested you do
[11:15:20] <newcomer> outer as hotel ... inner as attributes where outer & inner are docs ?
[11:15:38] <newcomer> using ur example as it works well with the idea
[11:15:55] <NodeX> I dont know where inner and outer has appeared rom
[11:15:57] <NodeX> from *
[11:16:24] <newcomer> means inner doc & outer doc
[11:16:40] <NodeX> :)
[11:16:53] <newcomer> not go around adding all the attributes as individual doc
[11:16:57] <newcomer> right ?
[11:17:24] <NodeX> yes correct as "Nested"
[11:17:35] <NodeX> (as above) - the "attributes" key is nested
[11:19:57] <newcomer> name = key || attributes = key ......hotel1 =doc & {im,age ...... tennis} =doc
[11:20:23] <newcomer> so "nested"
[11:20:36] <NodeX> I don't know what you;re trying to say
[11:20:54] <NodeX> image and tennis are NOT a document
[11:20:58] <newcomer> there are only 2 keys right
[11:20:59] <ron> NodeX: I'm starting to adore you, in SPITE of PHP.
[11:20:59] <NodeX> they are NESTED values
[11:21:06] <NodeX> hah
[11:21:42] <newcomer> thanks ... taken an hour off yours but yes I can now proceed on whats called "NESTED"
[11:21:56] <newcomer> confident a biton the concept
[11:22:00] <newcomer> laugh
[11:22:19] <NodeX> :)
[11:22:34] <newcomer> this is what happens when somebody makes a UMP across ... the J goes
[11:23:03] <newcomer> also this would improve my (code) performance right
[11:23:35] <NodeX> depends how your code is written
[11:23:58] <newcomer> ahh yes ... :)
[11:24:25] <newcomer> anyways will now get to coding ... let the party begin
[11:24:35] <newcomer> thanks .. bye.
[11:27:27] <Bartzy> Hi
[11:27:36] <Bartzy> What is a good case to disable safe mode inserts/updates ?
[11:27:59] <ron> hmm.. feeling suicidal.
[11:28:10] <Bartzy> When safe mode is disabled, the driver doesn't even check if sending to the socket suceeded ?
[11:28:27] <Bartzy> ron: Why? I mean, we develop a social app. Likes, comments, views on photos
[11:28:41] <kali> Bartzy: logging for instance... I prefer taking the risk to loose a few lines of log to the performance impact
[11:28:44] <ron> I wasn't asking, I was giving you a good case.
[11:28:54] <ron> :)
[11:29:06] <Bartzy> User liked something. we update that something's document with increment to the like count and the user id. Why enable safe mode here ?
[11:29:41] <Bartzy> kali: OK - so for user data (likes, comments, profile of course) - it is preferable to "lose" the performance benefits and have safe mode on ?
[11:30:23] <kali> Bartzy: this is for you to decide, really, but this is my setup. safe everywhere except for logging
[11:30:44] <Bartzy> kali: And this is usually the best practice ?
[11:30:54] <Bartzy> I saw that mentioned on MongoDB in Action too.
[11:31:04] <Null_Route> Hi Guys! We're running MongoDB 2.0.2, and we've been recently seeing strange network disconnect - replSet unexpected exception in ReplSetHealthPollTask
[11:31:06] <ron> it mostly depends on your needs.
[11:31:09] <Tankado> Is there going to be (or there is) a version of mongodb which is working on power pc ?
[11:31:10] <kali> Bartzy: this is my practise, and i fully aggree with myself :)
[11:31:25] <Null_Route> Obviously, we're looking into infrastructure errors, (no obvious networking issues)
[11:31:42] <Bartzy> kali: safe mode off means that the driver doesn't even know if the socket is opened? :)
[11:31:43] <Null_Route> ...but I was curious as to whether this is a symptom of any MongoDB issues
[11:31:51] <Bartzy> kali: Or for example if there is a network error ?
[11:32:00] <ron> Bartzy: keep in mind that kali is a hindu goddess, so she knows best.
[11:33:00] <kali> Bartzy: nope, it just does not wait for mongo server to "confirm" the write, but it is received by the server process
[11:33:08] <Bartzy> if my MongoDB server is 2-3ms away from my web server clients (don't yell :)) - enabling safe mode means each write will take at least that much - right ?
[11:33:39] <ron> well, there are different levels of safety.
[11:34:08] <Bartzy> kali: So when safe mode is off, the driver still knows that the server received the request, but doesn't wait for the server to "confirm" that it has written it ?
[11:34:19] <Bartzy> if so, what would make the server reject it...
[11:34:30] <Bartzy> besides obvious uniqueness stuff
[11:36:23] <Tankado> anyone has an answer regarding the support of mongoDB in ppc? :)
[11:36:26] <kali> Bartzy: i think the server has an incoming queue for writes, but we're at the limit of my knowledge of mongodb's internal... here be dragons
[11:36:32] <Tankado> its seems its not supported but i want to be sure
[11:36:37] <kali> ron: god, no goddess please
[11:36:58] <ron> kali: don't blame me, blame the hindu. they made her a goddess.
[11:37:30] <ron> kali: looks pretty though, blue, long hair, third eye, lots of heads hanging from her neck. kinda sexy.
[11:37:39] <kali> ron: yeah, she's an usurper
[11:38:09] <Bartzy> kali, ron: So anyone knows perhaps? Interesting to know instead of just enabling safe mode
[11:38:12] <Bartzy> :)
[11:38:37] <ron> knows what?
[11:38:45] <Bartzy> how safe mode off behaves
[11:38:50] <ron> imagine disk space.. index constraints...
[11:39:08] <ron> if you have a replica set, then possibly unavailability of one of the servers
[11:41:40] <ron> basically, you need to consider the ramifications of data-loss for different parts of your application, and decide whether you want to handle it differently for different data types. if you can 'average' the safety, use that for simplification (especially at first). if not, employ a more complicated solution.
[11:42:13] <ron> for us, most data loss is not something we can afford, so we go safe. I imagine that if we used mongo for logging as well, we could have afforded it to be unsafe.
[11:43:23] <NodeX> I use safe mode for some things and not for others
[11:43:46] <NodeX> general consensus is if you can oafford to lose it then dont use safe mode because performance is imporved
[11:43:50] <NodeX> improved *
[11:44:25] <ron> NodeX: now you lost 2 points for stating the obvious.
[11:44:26] <ron> :)
[11:44:56] <NodeX> lmfao
[11:45:11] <ron> and 500 more for using PHP.
[11:45:12] <ron> today.
[11:45:13] <NodeX> tbh I didn't read your comments because you love java and hate php :P
[11:45:30] <NodeX> ding ning is playing ping pong
[11:45:34] <NodeX> and she won
[11:45:48] <ron> so give her a ding dong
[11:46:02] <NodeX> lolol
[11:46:09] <Derick> for her ping pong?
[11:46:45] <NodeX> no her wing wang
[11:48:08] <ron> okay, that's just sick.
[11:48:09] <ron> ;)
[11:48:50] <NodeX> anyone else think it's a massive waste of gas to burn the olympic torch for 16 days?
[11:49:24] <Tankado> Derick : is mongoDB supported for ppc ? or are there plans to make it work on ppc?
[11:50:01] <Derick> Tankado: I don't think we officially support it, or are planning to do that *right* now. But if you compile it yourself, what doesn't work?
[11:51:22] <Tankado> I didnt try it, i just saw there is a bug with decoding the data listed in few places when i searched it on the web (says mongo only support little endian while ppc is big endian)
[11:51:50] <Tankado> i dont want to start integrating it and then find out its not compatible or has bugs in ppc
[11:52:17] <Derick> why are you using PPC btw? (just curious)
[11:52:25] <Tankado> an embedded platform
[11:52:50] <Tankado> running linux on it
[11:53:17] <Derick> MongoDB isn't really suited for that I think, how much memory does that have?
[11:53:55] <kali> NodeX: i was wondering if they switch it off by night when nobody is in the stadium
[11:55:04] <NodeX> kali : I dont think they do, but even so it's a waste
[11:55:04] <Null_Route> kali: One of the Olympic "things" is the the torch burns throughout the games
[11:55:22] <Null_Route> it was big news when they turned it off in order to move it to the side of the stadium
[11:55:38] <NodeX> Death to our planet and eat more resources but if it's int he name of the Olympics it's ok LOLOL
[11:55:47] <NodeX> in the name *
[11:56:11] <Null_Route> Burning a gas torch for a 17 days isn't as bad as, say, one morning's commute in Los Angeles
[11:56:18] <kali> they could have a stand-by mode, like in any home water heater :)
[11:56:29] <Derick> i think they do
[11:56:34] <NodeX> Burning gas is either good or bad, the context it's burnt in doesn't come into play
[11:57:04] <NodeX> Next week they'll be moaning again about the finite resources of our planet
[11:58:23] <Bartzy> Hey again :)
[11:58:37] <Bartzy> Why will anyone ever need to change the vote count per server ?
[11:58:39] <Null_Route> Anyway, compare that to the "urbanization" of countless acres of London to prepare for the games
[11:58:41] <Bartzy> in a replica set
[11:58:58] <NodeX> Null_Route : I dont agree with any of it in the name of sport or anything else
[11:59:12] <NodeX> things are either broke or they are not
[11:59:25] <Derick> Null_Route: that area was mostly industrial wasteland anyway... it's a whole lot better now
[11:59:30] <Null_Route> NodeX: The olympics is in the name of humanity - The ability of human endeaver, and other blabla
[11:59:34] <Derick> Bartzy: no, hardly ever
[11:59:46] <NodeX> Null_Route: it doesn't matter what it's in the name of
[12:00:06] <Null_Route> _I"m_ not justifying it
[12:00:09] <Bartzy> Derick: But when do you need it ?
[12:00:09] <NodeX> it doesn't change the fact the planet is screwed already before it
[12:00:25] <NodeX> I just like the hypocracy
[12:00:30] <Null_Route> All I want is for my Replica Sets to work!
[12:00:31] <NodeX> (of them)
[12:00:45] <Null_Route> :-D
[12:01:58] <Null_Route> My MongoDB issues go pretty deep - But it's the Symptoms which are killing me - Mongos returns a DBException error to the Java collection when it encounters a RETRYABLE issue
[12:02:24] <Null_Route> ...which kills all our mongos's whenever something fails over (which redundancy is suppost to make transparent)
[12:02:39] <Null_Route> Java Connector*
[12:04:13] <Null_Route> Y'see, for some reason, our Mongo's are losing connectivity. As such, the mongos's need to update their sharding tables. The first request to each shard, though, returns an EXCEPTION, which our JAVA code rightly croaks on. But all that's needed is to retry the query. Why does it throw an exception if it's not fatal?
[12:04:33] <ron> solution: fix the damn connectivity
[12:04:59] <Null_Route> ron: Working on it, but that's NOT the problem - I should be able to failover MongoDB's without restarting all my Mongos's!
[12:05:29] <Null_Route> and Mongos shouldn't return an Exception for something which it knows is retryable
[12:06:00] <ron> you don't think that a database server should have its connectivity? I'd say it's A problem if not THE problem. dude, it's your backend!
[12:06:01] <Null_Route> the first of the 2 being the biggest issue. A failover should be COMPLETELY transparent - that's how it's built
[12:06:37] <sunoano> any meteor users in here? if so, do you run meteor on top a replica set, maybe even sharded?
[12:07:13] <Null_Route> ron: There_IS_ an issue with connectivity (or somewhere in the OS/infrastructure), but the systenm STILL shouild be able to failover cleanly
[12:07:20] <ron> Null_Route: sorry, I can't really help you with that :-/
[12:07:56] <Null_Route> ron: You can't help me with the infrastructure issues, I understand. But why is the first suggestion when there is a failover "Have you restarted the Mongos's?
[12:08:18] <ron> Null_Route: no, no, I can't help you with THAT.
[12:09:00] <ron> Derick can, but he doesn't want to.
[12:09:05] <ron> don't know why.
[12:09:13] <Null_Route> huh?
[12:09:28] <ron> I'm.. kidding.
[12:10:23] <Derick> ? :-)
[12:10:44] <NodeX> ron = biggest troll on freenode
[12:10:49] <ron> Derick: Null_Route wants answers only you can provide.
[12:10:57] <ron> NodeX: dude, you have no idea how wrong you are.
[12:11:04] <Derick> during failover it's possible that you can't do writes until the new primary is elected. During this period, all non-retryable operations (such as inserts/updates) will fail
[12:11:08] <NodeX> lulz
[12:11:53] <Null_Route> Derick: is it possible that the same thing happens on all shard moves?
[12:12:00] <Null_Route> shared balances*
[12:14:06] <Derick> no
[12:14:22] <Null_Route> ok
[12:15:16] <Null_Route> Well, I'm going to continue looking at the networking (try bringing down the port-channels - maybe there's an issue there..)
[12:33:40] <remonvv> Null_Route, everything is your own fault unless you prove otherwise at which point we will claim your question was confusing or incomplete.
[12:34:06] <remonvv> Also, ron wears baby pink briefs.
[12:36:23] <ron> Null_Route: and trust me when I say it, remonvv knows it close-hand.
[12:42:43] <remonvv> I can neither confirm nor deny the truth of that statement.
[12:43:25] <Null_Route> Yaaay!
[12:57:36] <PDani> hi
[12:59:09] <PDani> is there a way I can bulk-update many documents in one request? The problem is that I have many relatively small documents, and it seems to be the network latency is the bottleneck in updating them one-by-one
[12:59:26] <NodeX> update them all with the same value?
[13:00:01] <PDani> no, with different values
[13:01:05] <NodeX> not that I am aware of, there are bulk insert tools
[13:01:14] <NodeX> but I have not heard of a bulk update tool
[13:01:23] <PDani> thx
[13:52:31] <Bartzy> if I have an update that is doing both $addToSet to an array in the document, and $inc to some counter
[13:52:52] <Bartzy> How can I make Mongo not increment the counter if the addToSet didn't go through because it was duplicate ?
[13:53:50] <Derick> you can't do that in one operation
[13:55:03] <Bartzy> Derick: Why not ?
[13:55:38] <Derick> you can of course prevent it by using a query in your update
[13:56:14] <Derick> db.col.update( { arrayField: { $ne: 'value' } }, { $addToSet: ..., $inc } )...
[13:56:21] <Derick> or similar
[13:56:50] <Bartzy> What do you mean ?
[13:56:58] <Bartzy> And then, why use add to set ?
[13:57:01] <Derick> show me your query please
[13:58:06] <Bartzy> db.things.update({$addToSet: { likes: {uid: 123, name: "Bar"}}, $inc: {likes_c: 1}}
[13:58:08] <Bartzy> Something like that
[13:58:27] <Bartzy> i.e add this user and their name to the likes array, and increment the like count.
[13:59:42] <Derick> "something like that"... no
[13:59:47] <Derick> your addToSet is wrong like this
[14:00:00] <Derick> you miss the "select" part in that query statement
[14:00:35] <Derick> db.collection.update( criteria, objNew, upsert, multi )
[14:00:44] <Derick> you left out "criteria"
[14:01:00] <Bartzy> yeah sorry.
[14:01:14] <Bartzy> But you get what I mean? I do it in PHP and didn't want to paste that into here.
[14:01:24] <Derick> so use a pastebin
[14:01:29] <Derick> (pastebin.ca f.e.)
[14:01:48] <Derick> on the shell, you'd do:
[14:02:27] <Derick> db.things.update( { likes.uid : { $ne: 123 } }, { $addToSet: { likes: { uid: 123, name: "Bar"} }, $inc: {likes_c: 1}} );
[14:03:07] <Derick> i guess you need to put the item ID in the criteria part as well though
[14:03:48] <Bartzy> yes, but why use $ne ?
[14:03:52] <Bartzy> and not just addToset ?
[14:04:02] <Bartzy> because without I can't not $inc if it exists ?
[14:04:09] <Derick> correct
[14:04:15] <Bartzy> So why addToset and not push ?
[14:04:26] <Bartzy> with $ne, that is.
[14:04:29] <Derick> that's a fair point - push would work fine
[14:04:55] <Bartzy> And push should be more appropriate here because it would be faster and the addToSet functionality is already provided by $ne ?
[14:05:02] <Derick> yes
[14:05:06] <Bartzy> Also - if the likes array is 50,000 elements long
[14:05:13] <Bartzy> $ne wouldn't be a problem?
[14:05:17] <Bartzy> if likes is not indexes
[14:05:29] <Bartzy> indexed* . Or even if it is - I don't think an index is used for $ne ?
[14:05:34] <Derick> you'd want to index it
[14:05:40] <Derick> sorry, yes, you're right
[14:05:55] <Derick> but you'd have an index on the item ID anyway that you want to add a "likes" too, not?
[14:07:26] <Bartzy> yes, it's _id .
[14:07:43] <Bartzy> so if I don't index it and it's 50,000 elements long - that's a problem with $ne ?
[14:07:50] <Bartzy> Or fast enough? :|
[14:07:51] <Derick> it will still use that index on _id then
[14:08:09] <Bartzy> right - to find the document. But to find out if that like is there - it won't.. That's an issue ?
[14:12:24] <Bartzy> Derick: ? :)
[14:14:51] <Derick> Bartzy: sorry, didn't get that one
[14:17:59] <Bartzy> Do you need me to resend it ?
[14:18:53] <Derick> Bartzy: no, I just didn't understand you
[14:19:02] <Bartzy> Ah
[14:19:25] <Bartzy> so the query is update a document where _id is 123 and likes.uid $ne 100
[14:19:27] <Bartzy> for example
[14:19:39] <Derick> yes
[14:20:00] <Bartzy> So index is used to find where in disk is that document
[14:21:00] <Bartzy> but to know the answer to whether there is no uid = 100 in that likes array in that document - no index can be used
[14:21:07] <Bartzy> so the document has to be fetched from disk.
[14:21:27] <Bartzy> Now if it's an array of 50 elements, probably doesn't matter, performance wise.
[14:21:34] <Bartzy> My question is what if this is a 50,000 element array?
[14:21:35] <Bartzy> or 200,000 ?
[14:21:43] <NodeX> do you have an index on _id and uid ?
[14:21:43] <Bartzy> Does that become a performance issue ?
[14:21:49] <NodeX> (compound index)
[14:22:02] <Bartzy> NodeX: On _id yes. On likes.uid there is no need, since $ne doesn't use an index.
[14:22:18] <Bartzy> compound index - no. Again because $ne will not use that.
[14:22:22] <Derick> Bartzy: it needs to do the same for addToSet anyway... doesn't really matter
[14:22:39] <Bartzy> Derick: Cool. And that should be "Fast Enough" ?:P
[14:22:47] <Derick> Yes, fast enough™
[14:22:55] <Bartzy> you signed on it!
[14:22:55] <Bartzy> :P
[14:23:08] <Bartzy> What is the general limit of element count where it is going to be an issue ?
[14:23:19] <NodeX> if you're worried about performance why dont you update all of them then in a second query update (revert) uid=123
[14:23:31] <Bartzy> I know I should benchmark. But in general... what is a bad practice when talking about array elements count and $ne
[14:23:57] <Bartzy> NodeX: Didn't get that , sorry ? Can you further explain
[14:24:42] <NodeX> hold on, I'm confused
[14:24:59] <NodeX> sorry, my mistake
[14:25:23] <NodeX> I didn't read your problem correctly
[14:33:03] <Bartzy> NodeX, Derick: So any insight on the general "best practice" limit of the element count ?
[14:33:15] <Bartzy> What would be a bad idea to use $ne on ?
[14:33:45] <Derick> Bartzy: elements inside the array in a document?
[14:34:21] <Derick> you're limited to 16MB per document
[14:34:21] <Derick> and you need to realize that MongoDB might need to move documents around if you're making them larger (by appending to an array)
[14:34:27] <Null_Route> NodeX: Bartzy if you guys remember my issue with failling MongoDB instances - It was caused by CentOS setting ulimit nproc to 1024
[14:34:53] <Null_Route> - it's default for CentOS6
[14:35:27] <Derick> nproc?
[14:35:43] <Null_Route> Number of user processes allowed
[14:35:50] <Derick> oh
[14:35:52] <Null_Route> more info at https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/je-GuiBh-60
[14:35:59] <Derick> why do you have so many processes? :-)
[14:36:21] <Null_Route> Dev environments for 20 developers
[14:36:36] <Null_Route> That's my guess
[14:36:55] <Null_Route> but I was getting issues with pthreads not being able to fork
[14:37:12] <Derick> right, cause threads are also "processes"
[14:37:21] <Bartzy> Derick: Yes, I understand that (about moving documents). I have nothing to do about it - people like photos. I need to push that :D
[14:37:43] <Bartzy> Derick: I know of the 16MB per document. How can I get the size of a specific document, or the average size of documents in the collection.. or the maximum size? :)
[14:37:54] <Derick> you could push your likes out of the document into their own collection if it becomes a performance issue
[14:38:25] <Bartzy> Derick: And last question - The 16MB limit is general. What is considered a bad number to do that $ne query on ?
[14:38:38] <Bartzy> Derick: Yeah but I would like to know in what number of likes I should do that
[14:38:54] <Derick> Bartzy: average size you can do on the shell
[14:38:54] <Bartzy> and that is pretty much normalization, which I wanted to avoid by using Mongo in the first place
[14:38:55] <Derick> one sec
[14:39:42] <houms> i have signed up for 10gen monitoring and have a test bed of just three servers. They are in a replica set only, and i have the agent running on the 3 server. it shows on the dashboard but it does not show any data or any of the other members
[14:39:45] <Derick> db.things.stats();
[14:39:47] <Derick> has a
[14:40:14] <Derick> "avgObjSize" : 32,
[14:40:44] <Null_Route> houms: how long has iot been up?
[14:41:01] <Null_Route> I've seem the MMS take some time to identify things - IIRC, it identifies things in stages
[14:41:05] <Derick> max size, I don't know :-)
[14:41:41] <Null_Route> houms: IIRC, there is a place to see the logs
[14:41:41] <Null_Route> the discovery logs in the web console
[14:41:53] <houms> almost 5minutes now.
[14:41:59] <houms> ahh you were right
[14:42:02] <houms> had t be patient
[14:42:05] <houms> they just showed up
[14:42:08] <houms> thanks Null
[14:42:09] <Null_Route> sweet
[14:42:31] <Null_Route> no worries!
[14:42:33] <Null_Route> cheers!
[14:42:41] <Derick> Bartzy: as for the number of likes... difficult to tell without knowing about your data load and access patterns
[14:44:26] <NodeX> Bartzy: I have a ratings collection that's a similar concept, I have around 800 "ratings" in a similar nested document to your's and the performance is good
[14:44:40] <NodeX> (800 ratings for one document)
[14:45:18] <Bartzy> NodeX: Yeah 800 is really small for some of our documents
[14:45:26] <Bartzy> it would be in the hundreds of thousands
[14:45:37] <Derick> that many likes per document?
[14:45:48] <Bartzy> Derick: How can I check a specific object size? (document)
[14:46:03] <Bartzy> Derick: the popular ones - and I'm talking about views this time (same concept as likes)
[14:46:28] <NodeX> Bartzy : do you always need the likes for the document in a call or just the count of them?
[14:46:51] <Bartzy> NodeX: Well, It's an integral part of the document (it's a photos collection)
[14:47:08] <NodeX> but do you need -all- of them
[14:47:23] <NodeX> surely 100k likes in a document would take forever to render anyway?
[14:47:32] <Bartzy> in the index page I show the likes and views counts, and the last 3 comments and the comments counts - and also if the current user has liked, viewed or commented on that photo
[14:47:54] <Bartzy> and I show 50 of these photos in the index page
[14:47:57] <NodeX> so you only really need the last 3 then?
[14:48:01] <Derick> Bartzy: you can call bsonsize() on it on the shell
[14:48:05] <Bartzy> When a user clicks on the photo, I need all.
[14:48:29] <Bartzy> Derick: bsonsize(db.photos.find(...)) ?
[14:48:31] <NodeX> you need all 100k likes .. do you know how long that will take to render out in html?
[14:48:36] <Derick> Bartzy: findOne
[14:48:40] <Bartzy> yeah.
[14:48:42] <NodeX> my point is paginate it
[14:49:06] <Bartzy> NodeX: Right but where the likes/comments/views will be ? Not in the same document ?
[14:49:06] <NodeX> as Derick suggested store some and the rest in a separate collection which you can paginate
[14:49:06] <Bartzy> in their own collection ?
[14:49:18] <Derick> Bartzy: denormalize that data out - store all the likes in one cal, but only the last three and the additional info with a document - it's ok to have to do two updates..
[14:49:41] <Bartzy> what if a user deletes a comment then
[14:49:53] <Bartzy> I delete it in both the photo document and in the comments collection ?
[14:50:07] <Bartzy> comment=like=views , kinda. so I mix and match those words
[14:50:07] <NodeX> store the meta data inside it's own collection
[14:50:58] <Bartzy> NodeX: Meaning ?
[14:51:05] <NodeX> {id:'the-photo-id,'type:'like', uid:123,} .... {id:'the-photo-id',type:'comment', uid:123, value: 'THis is my comment'
[14:51:27] <Bartzy> why not a separate collections for comments, likes and views?
[14:51:28] <NodeX> much easier to manage and delete/update / paginate/ backup / restore
[14:52:05] <NodeX> because 3 collections means 3 separate queries
[14:52:18] <Derick> Bartzy: it's Object.bsonsize(db.c.findOne()); btw
[14:52:30] <NodeX> ({id:'the-photo-id'}) get's you all meta data in one query
[14:52:34] <BurtyB2> ur version is too old" :(
[14:53:55] <Bartzy> NodeX: So when I need all the data for a specific photo I do db.metadata.find({_id:photo_id}) ... How do I differentiate in my app code between comment and view and like ? Just loop through the results and fill an array of comments when type=comment, likes when type=likes ?
[14:53:55] <NodeX> if that's what your app requires then yes
[14:54:03] <NodeX> else query id= foo and type=bar
[14:54:12] <Bartzy> And that's 3 queries, again.
[14:54:17] <Bartzy> But why not in the same collection ?
[14:54:18] <NodeX> then dont do it
[14:54:23] <Bartzy> Because likes array gets big ?
[14:54:36] <NodeX> dude, it's your data, do whatever you feel you need to
[14:55:06] <Bartzy> I mean - everything (photo data + metadata) is much easier. Question is if we'll hit a performance problem.
[14:55:21] <NodeX> for me personaly I structure my data and my app to comprimise on speed and User interface
[14:55:43] <NodeX> with that I get 0.5 second page loads and little load on my servers
[14:55:55] <Bartzy> So I'll ask my initial question -
[14:57:48] <NodeX> I will say from a networking POV that moving that much data over the wire will introduce alot of latency into your app
[14:58:00] <NodeX> you also wont be able to paginate it very well, nor sort nor order
[14:58:09] <NodeX> (database side at least)
[14:59:54] <NodeX> In my own personal experience I used to store references and meta data to images as nested arrays inside a "galleries" collection, it seemed a good idea at the time until I wanted to do more with the data, I changed it very quickly to it's own collection
[15:02:58] <Bartzy> NodeX: Each member is very small - uid and name string.
[15:03:23] <Bartzy> NodeX: Why can't sort or order very well ?
[15:03:45] <Bartzy> I can also not get the entire "likes" array in the query. Just get the likes counter when needed.
[15:03:59] <NodeX> because they're nested elements
[15:04:13] <NodeX> store the counter with the document
[15:04:25] <NodeX> best thing is to do it the way you think is best
[15:04:29] <NodeX> when it doesn't work come back
[15:04:44] <NodeX> because going round and round in circles is not productive ;)
[15:05:34] <Bartzy> Yeah, I know.
[15:05:49] <Bartzy> I'm just designing it right now, we started to code it, and hit some "thinking" walls :p
[15:06:09] <Bartzy> We actually do need to get all the likes / comments / views array, because we need to know if the user is in that array.
[15:06:36] <Bartzy> in order to solve it in a prettier way - we need a users collection where each user has a "likes", "comments", "views" array, right?
[15:07:07] <NodeX> not in my opinion
[15:07:13] <Bartzy> So how ?
[15:07:19] <NodeX> but that is one way to solve it
[15:07:52] <NodeX> but you still lose query ability, so it's best to have likes,views,comments collections (or one for all)
[15:08:28] <Bartzy> NodeX: But with a likes collections, if I show 50 photos in a page and want to know for each one if the user liked any of them (to show it in the UI) - I need to do 50 queries
[15:09:04] <Bartzy> db.metadata({_id:'photo-id-1', uid:12345}).... for each photo-id in the page I'm showing
[15:09:10] <Bartzy> or am I missing something ?
[15:10:09] <NodeX> correct, and again, it all depends on how you model your data
[15:10:19] <NodeX> you can't have it every way
[15:10:24] <Bartzy> So how 50 + queries per page are acceptable ? :|
[15:11:53] <NodeX> diud I say they were?
[15:12:33] <NodeX> you have to have a comprimise between perfomance and UI
[15:12:36] <Bartzy> You said that you would not do it with a users collection
[15:12:43] <NodeX> NO I wouldn't
[15:12:57] <Bartzy> OK - with the UI I explained - how would you do it ?
[15:13:03] <NodeX> but I would not also make my app be stuck to having one view of data
[15:13:16] <Bartzy> It's a pinterest like app
[15:13:20] <Bartzy> :)
[15:15:29] <NodeX> If i -had- to do that sort of thing I would probably suplicate data
[15:15:33] <NodeX> duplicate *
[15:16:23] <NodeX> something along the lines of storing the uid and if it's a like/comment and maybe the ID to it in the likes/comments collection
[15:17:25] <Bartzy> NodeX: Can you further explain ?:|
[15:17:31] <Bartzy> I didn't understand your example
[15:18:31] <NodeX> it's what you have already said you're doing but with a collection on top
[15:19:44] <NodeX> but duplicate data may or may not be acceptible to you
[15:20:58] <Bartzy> It is
[15:21:04] <Bartzy> What the collection on top is doing ?
[15:21:09] <NodeX> in reality you need an edge graph database
[15:23:31] <Bartzy> It's a pretty simple schema :\
[15:24:40] <ron> neo4j!
[15:25:10] <NodeX> +_1
[15:26:45] <ron> thought I wrote that only with response to NodeX's latest comment, not the whole conversation.
[15:26:50] <ron> too lazy to read it all.
[15:27:42] <NodeX> lolol
[15:33:21] <dkap> why is it that all my docuemnts have an {_v : *} entry?
[15:33:59] <ron> dkap: because you put it there.
[16:08:03] <NodeX> lolol
[16:09:00] <ron> NodeX: tell me that was trolling. I dare ya.
[16:17:24] <NodeX> no, that was pure funny
[17:00:18] <diego_k> Hi, is there some way I can refer with dot notation to the last element of an array?
[17:00:31] <diego_k> something like 'array.-1'
[17:00:32] <diego_k> :)
[17:14:21] <NodeX> no
[17:28:52] <jergason> hello friends
[17:28:55] <jergason> trying to do pagination with mongodb
[17:29:03] <jergason> are range queries super slow generally?
[17:32:35] <jergason> Sorry, I mean queries using skip
[17:32:53] <jergason> According to the docs, those are quite slow
[17:34:30] <mala> people can someone help me, i have a lots of datas with a singular state(ex: NY,CA) with
[17:38:34] <ron> jergason: well, depends on how much you skip.
[17:39:06] <jergason> ron: if you skip as part of a find, will it only skip within results returned by that query, or through the whole collection/
[17:39:08] <jergason> ?
[17:39:40] <kali> jergason: results returned by the query. is the query fast without the skip ?
[17:39:55] <ron> jergason: what kali said.
[17:40:02] <jergason> you know, I don't actually know.
[17:40:17] <ron> heh :)
[17:40:40] <kali> oO
[17:41:00] <jergason> this is the part where you tell me to measure before I optimize, right?
[17:41:02] <jergason> :)
[17:41:03] <ron> jergason: the skip operation basically scans through the results on the server side so that you don't have to on the client side.
[17:41:13] <ron> jergason: always.
[17:42:15] <jergason> any built-in support for just timing stuff on the mongo client? or just do Date.now() stuff?
[17:42:25] <jergason> mongo command line client that is
[17:43:02] <ron> explain.
[17:43:04] <kali> http://www.mongodb.org/display/DOCS/Database+Profiler#DatabaseProfiler-Throughtheprofilecommand
[17:43:14] <kali> don't do that on a prod server
[17:43:31] <kali> don't do "profile" on a prod server. "explain" is fine
[17:45:24] <kali> (olympics are fun but synchronized swimming is way to creepy for me)
[17:56:38] <jergason> ron and kali: thanks a bunch for your hlep
[17:56:39] <jergason> *help
[19:46:41] <jaha> whats the best way to store html in a mongo doc?
[19:50:18] <ra1n3r> hi. noob here. i'm trying to use the java driver for mongodb. but i'm not exactly sure what to do with the jar file i just downloaded... how do i add it to my classpath?
[19:50:52] <ra1n3r> if i do echo $CLASSPATH theres no output
[19:54:36] <ron> have you written any java code before?
[19:55:26] <ra1n3r> ron: yes i have
[19:55:43] <ron> ra1n3r: then you shouldn't ask such a question.
[19:56:12] <ra1n3r> ron: i also did string classpath=system.getProperty("java.class.path");
[19:56:19] <ra1n3r> and all i got was "."
[19:56:36] <ra1n3r> so it means current directory, but its still not working when i'm trying to use mongo in java
[19:56:43] <ra1n3r> its been a while since i used java, rusty...
[19:56:43] <ron> okay, so add it to your classpath.
[20:08:32] <bichonfrise74> I'm using 2.0.6 version, and I wanted to change the oplog size through /etc/mongodb.conf (oplogSize = 50000)… but when I do show dbs, the local collection shows 1 GB… is this correct?
[20:08:50] <bichonfrise74> I was expecting to see 50GB in the local collection
[20:11:13] <bichonfrise74> but the mongodb.log shows 50GB for the oplog size
[20:42:02] <ra1n3r> having trouble with java driver for mongo. when i just do a simple Mongo connection = new Mongo("localhost", 27017);
[20:42:08] <ra1n3r> I get the error "Default constructor cannot handle exception type UnknownHostException thrown by implicit super constructor. Must define an explicit constructor"
[20:49:20] <jjbohn|lunch> back
[20:49:31] <ron> yay.
[22:44:59] <alexyz> how does Mongo handle client connections? connection per thread or something else?
[22:46:02] <Derick> alexyz: depends on the driver
[22:46:16] <alexyz> Derick: server side
[22:46:25] <Derick> one connection per thread
[22:46:49] <Derick> (sorry for the somewhat terse replies :-) )
[22:47:20] <alexyz> Derick: no problem. thanks
[23:08:41] <vsmatck> I read some blurb a while ago about that possibly changing in the future.
[23:09:50] <vsmatck> I wonder if anyone has thought about out of order responses. The mongo protocol contains enough information to support that now.
[23:11:44] <vsmatck> Seems like if you moved from one thread per connection, to 1 network thread and a pool of workers, then it would allow you to get more performance by supporting out of order responses.
[23:14:38] <vsmatck> If you had two pipelined requests where the first would cause a pagefault, and the second would be served from memory then the second fast request would get stuck behind the first slow one.
[23:17:09] <vsmatck> Seems like the queue the pool of workers would pull from could be protected by a reader/writer lock.
[23:17:56] <vsmatck> NVM on that last thing. Makes no sense.
[23:19:24] <vsmatck> Or maybe it does. Each queue element could contain a lock function which would either read lock or write lock depending on the job.
[23:19:50] <vsmatck> That would insure fairness. Or that only one worker gets released when there is write job.
[23:22:42] <vsmatck> NVM this idea seems dumb.