pmxbot IRC Log Viewer

[03:43:52] <hdon> hi all :) what do you think about spitting out a warning at this line? https://github.com/mongodb/node-mongodb-native/blob/master/lib/mongodb/db.js#L2057

[03:46:25] <hdon> hmm, that doesn't quite accomplish what i want

[03:46:37] <hdon> ah, those come from urlOptions

[03:48:34] <hdon> hmm, it looks like the Db options are not validated in the constructor

[04:30:18] <detaos> hi everybody

[04:34:03] <detaos> how can i select a database in C++? i'm looking for the C++ equivalent of the PHP function selectDB.

[07:30:40] <[AD]Turbo> ciao

[07:31:44] <dforce> Can someone help me with getting pictures from Gridfs and showing them in django

[08:47:14] <dforce> Can someone help me with getting pictures from Gridfs and showing them in django

[09:16:36] <Bartzy> Hello :)

[09:16:42] <Bartzy> Is that correct - http://www.deanlee.cn/programming/mongodb-optimize-index-avoid-scanandorder/ ?

[09:31:17] <NodeX> case by case basis

[09:54:04] <topriddy> hi

[09:55:12] <NodeX> low

[09:55:27] <topriddy> NodeX: i remember you. thanks for last time :D

[09:55:38] <topriddy> NodeX: very educativer and informative for me :)

[09:55:38] <NodeX> no probs ;)

[09:55:50] <topriddy> ...considering using mongodb to store users profile pictures, i dont know how transactiosn would play here.

[09:56:05] <NodeX> Mongo doesn't have transactions

[09:56:30] <topriddy> what happens when multiple writes happen?

[09:56:46] <NodeX> last write wins

[09:56:56] <NodeX> FIFO (first in first out)

[09:57:21] <topriddy> NodeX: okay. should be faster than doing it from filesystem?

[09:57:24] <Derick> this is why you use update operators, and not just "set"

[09:57:40] <NodeX> topriddy : no

[09:57:57] <NodeX> Mongo has to look up the file, then get it from the FS / emory

[09:58:00] <NodeX> Memory*

[09:58:53] <NodeX> I use a mixture, I have a cache of frequently accessed images and let the OS manage the cache

[09:59:01] <NodeX> (nginx in my case)

[10:02:15] <NodeX> [10:56:50] <NodeX> I use a mixture, I have a cache of frequently accessed images and let the OS manage the cache

[10:17:33] <topriddy> NodeX: thanks. office network shitty as always

[10:17:46] <NodeX> no probs

[10:18:12] <topriddy> NodeX: memcache?

[10:18:33] <NodeX> it's alot of memory to dedicate to caching image

[10:18:36] <NodeX> images *

[10:19:04] <topriddy> yeah OS can manage memcache

[10:20:08] <NodeX> how many images are we talking about?

[10:21:28] <topriddy> NodeX: every user has a profile picture. so say for 200,000 users.

[10:23:04] <NodeX> you'll be fine with OS image managing

[10:23:40] <NodeX> what I do is store daily folders (for ease of backup) and also store a cache that's the last week of images

[10:24:17] <NodeX> I tested Nginx as a cache serving static images vs memcache and redis and nginx was comparable and took one less daemon out of the stack

[10:25:08] <topriddy> NodeX: if i'm storing images as encoded string in mongodb, not sure OS would help me much. except i store them in filesystem

[10:25:26] <BurtyBB> Anyone know if MMS is having problems as I'm getting a error 500 reported by the agent?

[10:26:05] <NodeX> you should use gridfs to store the images if you're going to do that

[10:26:14] <NodeX> you've got a 16mb limit on docs in straight mongo

[10:27:32] <topriddy> NodeX: yeah gridfs. read somewhere that gridfs project wasnt a priority of mongodb dev team though.

[10:27:47] <NodeX> it works fine

[10:28:03] <NodeX> and it's built to deal with storing files of any size efficiently

[10:31:28] <SergeyUkolov> hi all. I have one problem. In some reason mongod process doesn't want to use all RAM on our servers. It use ~20% only but we have 220Gb DB and a lot of reads. mongostat show 70-100 faults. Does anybody know why is it possible?

[10:31:47] <topriddy> NodeX: okay. would look at the docs. i'm working with Java and morphia though. And i'm sending the images as binary encoded string from a mobile app

[10:32:06] <NodeX> topriddy : it's easy enough to decode back to a temp file ;)

[10:32:34] <SergeyUkolov> btw, I just run touch command on big collection with loading both indxes and data and I still see 20% of used memory in top output

[10:33:47] <NodeX> can you pastebin your indexes SergeyUkolov ?

[10:34:05] <NodeX> db.yourCollection.getIndexes();

[10:34:53] <topriddy> NodeX: true.

[10:35:36] <NodeX> just my experience topriddy that using gridfs is more scalable

[10:35:41] <NodeX> but it might not fit your model

[10:36:06] <SergeyUkolov> NodeX: should I input its output here?

[10:37:10] <NodeX> ;astebing

[10:37:13] <NodeX> pastebin *

[10:39:01] <topriddy> NodeX: yeah. checking out i see most ppl are of opinion of just storing files at OS level

[10:40:29] <SergeyUkolov> NodeX: Sorry, didn't use this service before :) http://pastebin.com/VepJaxUr

[10:41:58] <SergeyUkolov> btw, it is not a problem of single collection... we have around 300 collections in DB, total size is 220Gb, but it use only 20% of 32Gb RAM and doens't want to use more.

[10:42:34] <SergeyUkolov> sometimes 21%, 20,6%... but never 90%

[10:44:14] <fsfsfs> hej

[10:45:03] <NodeX> SergeyUkolov : I would say that your queries are not using indexes and ergo your data is not in ram

[10:45:25] <NodeX> can you do db.yourCollecction.find({a-typical-query}).explain();

[10:45:28] <fsfsfs> I'm trying to hack libUCI into mongod (POC quality is totally OK for now) to support Unicode collation (that is, 1. correct sorting in non-ASCII, 2. correct case-insensitive sorting)

[10:45:29] <NodeX> and pastebin the results

[10:48:20] <fsfsfs> I have so far edited src/mongo/db/key.cpp (from line 453) and src/mongo/db/pipeline/value.cpp (from line ~ 824)

[10:48:36] <fsfsfs> to include comparison using libICU, and also wrote a few jsTests

[10:49:06] <fsfsfs> alas, at least on the 'mongo' REPL, I cannot see the new sorting.

[10:49:36] <fsfsfs> (I'd like to influence 'sort()' as well as the creation of indices, of course)

[10:50:05] <fsfsfs> Q: Is there any further sorting, maybe in the (non unicode enabled, so far) 'mongo' REPL?

[10:50:26] <fsfsfs> Q: Anyone else interested in this to maybe be my sparring partner in drafting a patch?

[10:50:44] <fsfsfs> Q: Anyone interested in discussing this w/ me tomorrow on MongoDB Munich?

[10:51:47] <manveru> fsfsfs: you're in munich?

[10:51:50] <fsfsfs> yep

[10:52:11] <manveru> nice :)

[10:52:15] <manveru> i live an hour south

[10:52:44] <fsfsfs> I could upload my stuff to github, but it's, well, non-functioning and all. Still. I'll push my branch so others can have a look

[10:52:49] <manveru> been looking at the meetup notifications for mongodb munich for a while, but no idea if it's worth the drive

[10:54:28] <manveru> regarding sorting, would like to take a look at that

[10:54:55] <manveru> though i suck at C(++), but unicode support would be sweet

[10:57:09] <fsfsfs> doing unicode correctly is kinda hard, furtunately others have done that for us

[10:57:19] <fsfsfs> (->libICU)

[10:57:24] <manveru> yeah

[10:57:29] <fsfsfs> of course it will give a performance hit, but well

[10:57:31] <manveru> wouldn't attempt otherwise

[10:57:41] <fsfsfs> not having unicode support in 2012 is ... almost insulting to me

[10:58:27] <fsfsfs> I believe I can figure out how to integrate the lib (well, my stuff compiles and all, but it doesn't look like it changed anything so far???)

[10:58:40] <manveru> mongodb is mostly in the US, afaict... so people don't care much

[10:58:46] <fsfsfs> but I don't know where best to store the new DS (not on-disk, but in-memory) and how an interface should look like

[10:59:09] <fsfsfs> MacOSX also is in the US and comes with i18n.

[10:59:13] <unknet> Hi

[10:59:34] <fsfsfs> + that USofA never cared about the rest of the world in computing gave us the mess that was code pages. uck

[10:59:50] <manveru> :)

[10:59:56] <fsfsfs> well I care, and this is supposed to be open source ;)

[10:59:56] <unknet> somebody knows how objectid's are generated in a sharded environment?

[11:00:16] <fsfsfs> I believe integrating this would solve 3 of the 20 currently "most wanted" features

[11:00:32] <fsfsfs> so, if I find others that care as strongly as I do, this totally should be doable

[11:00:40] <fsfsfs> I don't know what the best interface would be, though

[11:00:56] <manveru> i think a new function might be better

[11:01:08] <manveru> because i imagine people depending on the current behaviour

[11:01:11] <fsfsfs> (should we just take the collation setting from the environment? would be very easy, and probably a good "80%" solution)

[11:01:20] <fsfsfs> (or save it together with a DB?)

[11:01:59] <fsfsfs> (might make it easier since we might need to handle less corner cases, like how to compare differently collated values)

[11:02:11] <fsfsfs> (or store it with the collection?)

[11:02:28] <fsfsfs> (and when to override this setting? make it possible to do this at every call?

[11:02:29] <fsfsfs> )

[11:02:31] <manveru> depending on locale may be a start, but will lead people to bang their heads against a wall for things like SAAS mongo

[11:02:33] <fsfsfs> well and the list goes on.

[11:03:29] <fsfsfs> well, default should be old behavior.

[11:03:48] <fsfsfs> fact is, since mongo is pretty new, we don't need to handle lots of legacy DBs encoded using some old code page

[11:04:01] <manveru> unless you want to provide support for more than unicode, i'd say just a sortUtf or such function would be enough

[11:04:25] <fsfsfs> that should make it way easier for starters/.

[11:04:27] <manveru> json/bson is unicode anyway, so not much other data gonna be there

[11:04:43] <SergeyUkolov> so... noboby know, why mongod can use so little amount of RAM?

[11:05:04] <manveru> fsfsfs: once we have something that works we can think about config

[11:05:12] <fsfsfs> well yes

[11:05:20] <manveru> other way round just leads to never get anything done :)

[11:05:27] <fsfsfs> I'll push my stuff right now inb4 lunch, just a sec

[11:05:31] <manveru> cu

[11:05:33] <fsfsfs> manveru, ACK!

[11:05:37] <SergeyUkolov> <NodeX> oops, sorry, I didn't saw new messages.

[11:07:03] <SergeyUkolov> NodeX: no... all my queries use indexes. mongostat show: idx miss % 0 always.

[11:09:03] <SergeyUkolov> NodeX: atleast touch command should push data of collection into RAM. right? but I still see 20% +-1%

[11:09:09] <fsfsfs> I pushed my current try to get POC quality Unicode collation working here:

[11:09:10] <fsfsfs> https://github.com/hacklschorsch/mongo/tree/collation

[11:09:35] <fsfsfs> ah damn, just saw I was using a quite oldish MongoDB as a base, been at this for a while

[11:09:49] <fsfsfs> (last commit before mine is from Aug 30)

[11:09:53] <fsfsfs> will update this after lunch.

[11:09:54] <fsfsfs> cu!

[11:21:42] <ron> hmm, any free reporting tools that work out of the box with mongodb?

[11:21:55] <ron> sql dbs have numerous solutions :)

[11:22:38] <NodeX> MMS?

[11:23:28] <ron> MMS?

[11:24:24] <NodeX> mongo monitoring service

[11:24:32] <NodeX> (does reporting)

[11:25:41] <ron> it doesn't say anything there about reporting

[11:26:22] <NodeX> you can setup reports and dashboards

[11:26:31] <NodeX> well more dashboards

[11:26:52] <ron> reports based on our own data in the database? that's a bit odd.

[11:27:28] <NodeX> right, you didn't say that lol

[11:32:26] <ron> hmm, eclipse BIRT now supports mongodb. interesting.

[11:35:52] <ron> remonvv! sup!

[11:37:37] <s0urce> hi

[11:40:21] <remonvv> ron you gorgeous man, how are you

[11:40:28] <remonvv> hi s0urce

[11:40:29] <ron> remonvv: just peachy

[11:40:38] <ron> remonvv: do you use any BI tool with mogno?

[11:40:45] <remonvv> BI?

[11:40:52] <NodeX> Business Intel

[11:40:58] <s0urce> Could any1 help me with nested array find problem pls: http://pastebin.com/yyukDGm9

[11:41:16] <s0urce> i try to find all items wich match a nested array find

[11:41:40] <ron> remonvv: reporting

[11:41:41] <remonvv> Ah, no, not really. We run out own aggregation on our data for customer reporting but other than that there's very little intelligence going on ;)

[11:41:43] <NodeX> parts.part_1:

[11:42:00] <ron> remonvv: http://www.mongodb.org/display/DOCS/Business+Intelligence

[11:42:22] <ron> remonvv: like, build a report on which games a certain user played within a range of time

[11:42:28] <remonvv> If you're seeing this, it means something is horribly wrong with the MongoDB confluence site.

[11:42:36] <remonvv> Oh, right, yeah we do that.

[11:42:46] <remonvv> I'm not that familiar with the term but apparently we're doing it.

[11:42:50] <ron> hehe

[11:42:53] <ron> like, eclipse BIRT

[11:43:59] <remonvv> Hm, looks interesting.

[11:44:04] <remonvv> Are you looking into that?

[11:45:08] <remonvv> We created much of our report tooling ourselves.

[11:46:11] <ron> yes, we're starting to look for a general solution

[11:46:19] <ron> I'd rather use a tool than develop from scratch

[11:47:16] <s0urce> Would be really nice if anyone with a clue could check my pastebin, don't wanna annoy you, but don't get it since hours and it's really important, tyvm: http://pastebin.com/yyukDGm9

[11:47:17] <remonvv> Well, our reporting functionality was pretty straightforward so for us having to get up to speed with a tool was actually less time efficient than just builind the aggregations and creating export routes to our CMS and PDF exports.

[11:48:07] <remonvv> s0urce, you're using $elemMatch on an embedded document rather than an array. Your schema isn't correct for your usecase.

[11:48:12] <remonvv> parts should be an array.

[11:48:14] <remonvv> Go from there.

[11:49:09] <s0urce> i saved it from json, and it created a document its self, how can i change it to save as array? btw. can an array have assocative keys?

[11:49:21] <NodeX> yes it can

[11:50:30] <SergeyUkolov> what is vsize in mongostat output?

[11:52:32] <remonvv> s0urce, without more information about which language you're using and who or what is responsible for the schema it's hard to help you further.

[11:52:53] <s0urce> i use node.js with the native-driver

[11:52:57] <remonvv> Parts needs to become an array, so parts:[{..,..}, {..,..}] rather than your current schema

[11:53:13] <remonvv> okay, then it's pretty straightforward.

[11:53:28] <s0urce> But i ned to keep keys

[11:53:32] <remonvv> As in, turn the parts field into an array in your code

[11:53:53] <remonvv> yes, so rather then parts:{key:{..,..}} you get parts:[{key:.., ...}]

[11:54:06] <remonvv> which allows for $elemMatch:{key:<something>}

[11:54:08] <remonvv> make sense?

[11:54:10] <s0urce> But i can use ['keys':'in_array'] in js?

[11:54:40] <remonvv> Uhm, i'm not a JS expert. I'm giving you the theory ;)

[11:54:53] <remonvv> Up to you to make it happen in the language you're comfortable with.

[11:54:59] <remonvv> maybe someone else can help you

[11:55:12] <s0urce> yeah, nice, buts that seems to be the problem :)

[11:56:11] <remonvv> I guess so ;) but that's a JS issue, not mongodfb

[11:56:30] <s0urce> Maybe google will find it: "save object as array mongodb" :)

[11:56:31] <ron> mongodfb? that a new product?

[11:57:12] <remonvv> Oh shush

[12:00:52] <s0urce> Are you sure it is possible to give array as keyname in mongodb? Don't find one site about this.

[12:01:06] <s0urce> as=a*

[12:01:17] <NodeX> eh?

[12:01:25] <NodeX> you want to name a key with an array?

[12:01:35] <NodeX> [1] : 'foo' <--- like that

[12:01:48] <NodeX> that's not possible afaik

[12:01:58] <s0urce> ['something':'like this','or':'this']

[12:04:28] <remonvv> You can never give an array as key *name*

[12:04:41] <remonvv> you need parts : [...]

[12:04:51] <remonvv> e.g. parts[1] = {key:"lala", amount:3}

[12:05:16] <remonvv> I'd help you along further but JavaScript is like kryptonite to me.

[12:07:11] <remonvv> s0urce : http://pastebin.com/EzA039Un

[12:07:14] <remonvv> is what i mean

[12:07:25] <s0urce> hm.. would be sad to change this, cause then i can't read parts['part_1'] in my script, then i need to loop the results :/

[12:07:56] <remonvv> well it's either or.

[12:08:13] <remonvv> Either you make it an array and you can filter on elements in MOngoDB or you make it an embedded document and you can filter in your application.

[12:08:29] <remonvv> It's up to you which is best for your application.

[12:08:57] <remonvv> Obviously parts.part1.name is easier than doing an $elemMatch query and then still having to loop through it

[12:09:00] <NodeX> s0urce : you can use dot notation lol

[12:09:08] <remonvv> Exactly.

[12:09:14] <NodeX> [12:39:39] <NodeX> parts.part_1: <----

[12:09:27] <remonvv> Ohh..was that his question?

[12:09:31] <s0urce> yes, i know, but i don't can't use it because i don't know each keys wich can match the link

[12:09:39] <remonvv> ah, yeah, so there you go

[12:09:45] <remonvv> you need $elemMatch or something similar

[12:09:51] <remonvv> which needs arrays

[12:09:52] <NodeX> [12:38:54] <s0urce> Could any1 help me with nested array find problem pls: http://pastebin.com/yyukDGm9

[12:10:08] <remonvv> Yeah, so, it's not a nested array ;)

[12:10:10] <remonvv> Anyway, afk

[12:10:12] <remonvv> meeting

[12:10:36] <s0urce> nested assoc array = document in mongodb like it seems :)

[12:10:51] <remonvv> yepo

[12:12:39] <s0urce> i need a wildcard dot notation -> parts.*.link :)

[12:25:25] <simenbrekken> Is there an easy way to figure out if I have enough memory to keep the entire data set there without having to read from disk?

[12:25:34] <simenbrekken> Or just a way to give me the current memory usage

[12:27:05] <NodeX> top

[12:27:08] <NodeX> look at "RES"

[12:30:20] <fsfsfs> back from lunch. Anybody bothered checking out my trying to get collation to work? https://github.com/hacklschorsch/mongo/tree/collation

[12:36:16] <fsfsfs> anybody up for MongoDB Munich tomorrow?

[12:36:56] <NodeX> if you buy me a plane ticket sure why not :)

[12:36:57] <NodeX> :P

[12:37:45] <fsfsfs> NodeX, sry :/ I can't

[12:37:53] <NodeX> just kidding dude

[12:38:00] <fsfsfs> yes yes

[12:38:49] <fsfsfs> still. I hope I'll meet some ppl deep into MongoDB to ask my stupid collation questions. If it's more marketing talk, well, the Hilton's food is supposed to be OK

[12:39:00] <fsfsfs> ;)

[12:41:03] <simenbrekken> NodeX: thanks :)

[12:41:14] <NodeX> :)

[12:41:38] <NodeX> also check db.stats() and see total size

[13:10:21] <trupheenix> hi. I am using the txmongo driver which doesn't support replication on mongo db. any work arounds for this? isn't replication supposed to be handled by mongo db itself?

[13:16:57] <NodeX> yes it is

[13:17:02] <NodeX> you set it all up on the shell

[13:17:14] <trupheenix> NodeX, but what about my txmongo driver?

[13:17:25] <NodeX> what about it?

[13:17:35] <trupheenix> NodeX, it doesn't have support for replication like pymongo has

[13:17:58] <trupheenix> NodeX, I understand that sharding happens through the mongos process

[13:18:37] <NodeX> I dont understand what you mean by support for

[13:19:24] <trupheenix> NodeX, http://api.mongodb.org/python/current/api/pymongo/collection.html

[13:19:36] <trupheenix> Nodex look for the insert function

[13:19:47] <trupheenix> NodeX, txmongo driver doesn't have any such thing.

[13:20:52] <NodeX> so your driver does not have an insert function?

[13:21:11] <trupheenix> NodeX, it doesn't support the w parameter in insert

[13:24:08] <NodeX> seems it was never implemented

[13:24:38] <trupheenix> NodeX, yes. But does it prevent replication? Replication is handled by mongo db itself right?

[13:24:49] <trupheenix> NodeX, it's not a function of the database driver.

[13:25:01] <trupheenix> NodeX, I hope my thinking is right?

[13:25:40] <NodeX> the "w" flag means at least one node must be written to iirc

[13:27:45] <trupheenix> NodeX, For example, to wait for replication to 3 nodes, pass w=3. It means that the insert function should not return until 3 nodes are written to. But will this have effect on a driver which doesn't understand replication but the database itself is configured to support replication?

[13:28:14] <idank> what's the correct way (indexes included) to test a date field if it's null less than a given date? right now I'm using find({'$or' : [{'mtime' : {'$type' : 10}}, {'mtime' : {'$lt' : new Date()}}]})

[13:28:32] <idank> * if it's null OR less than

[13:31:03] <trupheenix> NodeX I'm seeing Application Development with Replica Sets

[13:31:06] <NodeX> trupheenix : the DB supports replication, if you dont send the "w" flag then you lose consitency in the event of a server crash

[13:31:25] <NodeX> idank : that's inefficient

[13:31:50] <idank> NodeX: agreed, how can I improve it?

[13:32:05] <trupheenix> NodeX, that means replication doesn't happen?

[13:32:12] <NodeX> just out of curiosity see if mongo treats NULL as 0

[13:32:18] <NodeX> and use $lt : ...

[13:32:26] <idank> NodeX: it didn't which is why I added the $or

[13:32:27] <NodeX> trupheenix : no it means that it will eventualy replicate

[13:32:54] <NodeX> idank : I'll have to think about it, one minute

[13:40:16] <remonvv> idank, are you sure your example is correct. That reads "if mtime is of type Date or if Date < now"

[13:40:22] <remonvv> the second implies the first.

[13:40:42] <remonvv> sorry "or if mtime < now()"

[13:44:32] <remonvv> also, keep in mind null and non-existing are two different things

[13:52:16] <idank> remonvv: the query I need is "return all documents that mtime < now or mtime is null"

[13:52:33] <idank> I consider a null mtime as older than any given date

[13:53:00] <idank> and from what I checked null dates aren't included in mtime < now

[13:54:18] <NodeX> can you not update all dates with type null to have zero or something?

[13:54:25] <NodeX> this will make your query alot better

[13:55:06] <idank> yeah I guess that makes more sense

[13:55:18] <ron> idan kober?!

[13:55:31] <idank> ron: different idan

[13:55:38] <ron> ah, oh well.

[13:56:30] <NodeX> better to do one large update and adapt the code rather than have an inefficient query

[13:56:56] <idank> how would I go about updating those null dates to zero?

[13:57:05] <NodeX> this is where the tru power of schemaless comes in

[13:57:25] <idank> would it be better if I omitted the date field altogether?

[13:57:33] <NodeX> db.foo.update({bar:{$type:THE-TYPE}},{$set:{date:0}},false,true)

[13:58:02] <NodeX> infact, that's misleading

[13:58:16] <NodeX> db.foo.update({date_field:{$type:THE-NULL-TYPE}},{$set:{the_date_field:0}},false,true)

[13:58:27] <NodeX> db.foo.update({date_field:{$type:THE-NULL-TYPE}},{$set:{date_field:0}},false,true)

[13:58:30] <NodeX> 3rd time's a charm

[13:59:02] <idank> sweet, thanks

[14:00:58] <NodeX> you could also set the date to the lowest date possible too - this would catch

[14:03:33] <idank> it just bothers me a little that I have to special case the date field now

[14:03:54] <idank> before I had to check if it's null

[14:04:01] <remonvv> idank, is it going to be null or will it not exist?

[14:04:03] <idank> now the value 0 has a special meaning

[14:04:27] <idank> remonvv: it's collection of urls and the last time they were accessed

[14:04:36] <idank> so a url that hasn't been accessed ever has a null mtime

[14:04:57] <idank> but I still need to keep the url so I know to access it sometime in the future

[14:04:57] <remonvv> okay so you're creating the document with {a:.., mtime: null} rather than {a:..}?

[14:05:22] <idank> I asked before if it would be better to get rid of the mtime field

[14:05:33] <idank> but then how is the query going to get better?

[14:06:26] <remonvv> Well, not better. All possible options involve $or.

[14:06:36] <idank> the 0 date doesn't

[14:06:57] <NodeX> if mtime is some other function then you should add anotehr field specifically for atime

[14:07:09] <NodeX> then upon insertion add the current date for uit

[14:07:11] <NodeX> it *

[14:07:19] <remonvv> True, but that is data polution so you have to figure out if that's worth it.

[14:07:31] <remonvv> I wouldn't want special case data checks in my code

[14:07:45] <NodeX> your app can then match atime to an existence of mtime to work it out if it needs to do anything

[14:07:50] <remonvv> Note that $or is (or can be) actually very fast. Every clause can use it's own index and it has an early out.

[14:08:00] <idank> the initial state of no mtime is pretty temporary

[14:08:08] <NodeX> seems like he want's that instead of a number in mtimer

[14:08:25] <NodeX> "0" (int) can be cast to false in most languages

[14:08:37] <remonvv> right, then i'd do $or:[{mtime:{$gt...}}, {mtime:null}]

[14:08:56] <idank> I think that's how my initial query looked like

[14:09:04] <idank> and with an index on mtime it was slow

[14:09:15] <idank> ($lt rather than $gt)

[14:10:06] <NodeX> $or wont be as fast as without it, my advice is to avoid it if you can

[14:10:38] <idank> can I make it slightly faster than scanning the entire collection?

[14:10:47] <idank> because that's what I'm seeing now

[14:10:49] <remonvv> NodeX, that's only true if it has to evaluate more than one clause.

[14:11:03] <remonvv> idank, if you're scanning your entire collection you're missing the index

[14:11:10] <remonvv> post the entire query, your index is just broken then

[14:11:20] <remonvv> What i mentioned without any higher clauses should be instant

[14:11:22] <remonvv> ish

[14:11:55] <idank> ok, pasting

[14:12:52] <remonvv> db.test.find({$or:[{mtime:{$lt: 20}}, {mtime:{$exists:false}}]}).explain() hits all index and has n = returnset.length

[14:13:11] <remonvv> millis: 0, admittedly on a relatively small set

[14:13:21] <remonvv> but certainly not a table scan

[14:14:20] <remonvv> i tested with 100k documents and returning about 10

[14:14:24] <idank> I have mtime:{$type:10} instead

[14:14:31] <idank> sec, pasting it

[14:14:40] <remonvv> that's useless, that will only do anything if that field is already a date

[14:14:56] <remonvv> {mtime: null} doesn't give mongo a type to check on

[14:14:59] <remonvv> alright

[14:15:26] <idank> http://pastebin.com/FnUPRkHa

[14:15:53] <remonvv> getIndexes() as well please

[14:16:02] <remonvv> THat doesn't look like an {mtime:1} index

[14:16:16] <idank> http://pastebin.com/MFfkveFv

[14:16:16] <remonvv> AH sorry, never mind. That's your $type\

[14:16:26] <remonvv> okay, replace with my query and check results please

[14:16:40] <remonvv> that $type check does nothing for you and just increases your candidate set by a huge amount, 100% even

[14:17:06] <remonvv> also, always put your most restrictive/fast clause first in $or

[14:17:18] <remonvv> pretty sure MongoDB does them in order

[14:17:39] <idank> ok yeah that works much faster

[14:17:42] <remonvv> :)

[14:17:53] <idank> strange because I was sure I tried $exists:false before and it didn't return some null dates

[14:18:03] <remonvv> syntax issue maybe

[14:18:17] <remonvv> i see a lot of $exists:{field:1} instead of field{$exists:1} here

[14:18:24] <idank> so exists:false == true when the field doesn't exist at all or it's null?

[14:18:27] <remonvv> which doesn't actually error

[14:18:38] <remonvv> if it doesn't exist

[14:18:44] <remonvv> field: null means field exists and has a value

[14:18:47] <remonvv> namely null

[14:18:50] <idank> but my field exists, it's just null

[14:18:52] <remonvv> null is not a non-value

[14:19:01] <remonvv> then simply do mtime:null

[14:19:05] <ovidiutk> Hi. Question: is it possible to use a pager (like less) with the mongo interactive shell?

[14:19:06] <remonvv> will be even faster probably

[14:19:11] <idank> oh wait, this doesn't actually return what I wanted

[14:19:22] <idank> I need to remove the null dates first

[14:19:30] <remonvv> it wont if you have mtime: null in there

[14:19:39] <idank> yeah, gotta get rid of those first

[14:19:50] <remonvv> right, so make it $or:[{mtime:{$lt..}}, {mtime: null}]

[14:20:13] <idank> will that fast as well?

[14:20:16] <remonvv> yep

[14:20:28] <remonvv> try it ;)

[14:20:46] <idank> yeah explain() says it it used the indexes

[14:21:04] <remonvv> nscanned?

[14:21:13] <idank> it's equal to n

[14:21:17] <remonvv> awesome

[14:21:32] <remonvv> also, note that {mtime: null} as a clause is really shitty if a lot of them are null

[14:21:46] <idank> type:10 also has nscanned* == n

[14:21:58] <idank> I think it's taking a long time because there are 650k documents :)

[14:22:00] <remonvv> as in, the index won't help much because the field value isn't very selective

[14:22:04] <remonvv> well yes, but n would have been huge

[14:22:23] <remonvv> well if the index is used your n should be close to your intended result set size

[14:22:31] <remonvv> if not you have to make your query more restrictive

[14:22:35] <idank> yeah that's what I'm seeing

[14:22:39] <remonvv> great

[14:22:54] <idank> what about testing if an array field has any elements in it?

[14:22:55] <remonvv> don't forget limit(n). You have an open ended range query now which is an issue without a limit

[14:23:13] <remonvv> $elemMatch if you have to check for more than 1 field, or simply dot notation if you only have to check for 1 field.

[14:23:22] <idank> right now I'm using field:{$gt:{}}

[14:23:46] <remonvv> e.g. find({array.name:"Remon"}) or find({array:{$elemMatch:{name:"Remon", awesomeness:10}}})

[14:24:03] <idank> what does that return?

[14:24:13] <remonvv> the entire document

[14:24:16] <idank> let me rephrase what I need

[14:24:22] <remonvv> Alright ;)

[14:24:32] <remonvv> You always query on documents, never on embedded structures.

[14:24:40] <idank> all documents that have a field "foo" with array size > 0

[14:24:43] <remonvv> Or well, queries always RETURN top level documents

[14:25:21] <NodeX> idank : there is no size factor

[14:25:39] <NodeX> you have to store that separately if for example you want the size of [1,2,3]

[14:25:55] <idank> but I need something weaker

[14:26:00] <idank> whether an array is empty or not

[14:26:20] <idank> foo:{$gt:{}}

[14:26:25] <remonvv> {$not{$size:0}}

[14:26:42] <remonvv> you can't do anything more complicated with $size so if you need that you need to maintain an element counter in your document

[14:27:05] <remonvv> if you have foo:[] that works, if you can also have foo not existing you need to do an $exist as well

[14:27:07] <NodeX> you can match to a blank array iirc

[14:27:09] <remonvv> $exists*

[14:27:12] <NodeX> ^^

[14:27:13] <NodeX> lol

[14:27:31] <idank> can an index help that query?

[14:28:00] <NodeX> [15:21:42] <remonvv> e.g. find({array.name:"Remon"}) or find({array:{$elemMatch:{name:"Remon", awesomeness:10}}}) <--- no documents could be found :P

[14:28:40] <NodeX> synonym: have

[14:28:42] <NodeX> oops

[14:29:16] <ralphholzmann> is it possible to replicate between different versions of mongo?

[14:29:27] <ralphholzmann> one server on 1.8 and the other on 2.2?

[14:29:37] <remonvv> NodeX, that isn't funny at all :(

[14:29:57] <remonvv> ralphholzmann, don't think that'd be a good idea.

[14:29:59] <ralphholzmann> I'm trying to figure out the best way to upgrade my mongo server from 1.8

[14:30:07] <ralphholzmann> remonvv: ah okay

[14:30:16] <ralphholzmann> ill just deal with the downtime then

[14:30:19] <NodeX> remonvv : LOL, you can't be that thin skinned - this is the interwebz lol

[14:30:36] <remonvv> ralphholzmann, there is a document upgrade path but it does involve downtime for non repset clusters.

[14:30:56] <ralphholzmann> ill take a look

[14:30:56] <remonvv> Iirc there was a a hot uprade path for repsets from 1.8 -> 2.0 and 2.0 -> 2.2 so you could try that route

[14:31:04] <remonvv> NodeX, i'm teary eyed already!

[14:31:12] <ralphholzmann> also, while I'm upgrading, would it be a good idea to go 64-bit?

[14:31:22] <remonvv> YES!

[14:31:24] <NodeX> :P

[14:31:38] <ralphholzmann> remonvv: thanks for your help

[14:31:39] <ralphholzmann> =)

[14:31:50] <remonvv> Ask anyone. 32-bit MongoDB is like wanting a really fast car and ending up at the Kia dealer.

[14:32:09] <remonvv> No problem. The links to the upgrade paths is Googlable but let me know if you can't find it

[14:32:34] <NodeX> 32bit, those were the days

[14:32:38] <ralphholzmann> cool

[14:32:43] <ralphholzmann> one last question

[14:33:12] <ralphholzmann> I remember getting warnings in the past about not having built mongo with utf-8 support

[14:33:17] <ralphholzmann> which is obviously something I want

[14:33:45] <ralphholzmann> is that enabled by default now?

[14:34:11] <remonvv> utf-8 is the default in MongoDB

[14:34:27] <ralphholzmann> cool

[14:34:29] <remonvv> As in, if you get a prebaked MongoDB package it is on.

[14:34:40] <ralphholzmann> as in

[14:34:45] <ralphholzmann> sudo apt-get install mongodb

[14:34:50] <remonvv> There you go.

[14:34:59] <ralphholzmann> \o/

[14:35:01] <remonvv> BSON spec only allows for utf-8 strings

[14:35:12] <remonvv> http://bsonspec.org/#/specification

[14:35:16] <remonvv> for your reading pleasure

[14:36:00] <ralphholzmann> oh man

[14:36:08] <ralphholzmann> $.fn.show

[14:36:15] <NodeX> I think the apt is different for 22

[14:36:40] <NodeX> mongodb-10gen - An object/document-oriented database <-- 2.2

[14:40:01] <remonvv> ah, useful

[14:40:22] <NodeX> mongodb18-10gen, and mongodb20-10gen

[14:40:24] <NodeX> also

[16:20:29] <ron> ugh. I think we'll end up writing our own BI solution. blech.

[16:21:57] <NodeX> dont be so lazy

[16:21:57] <burley_sf> Anyone know what 'W' versus 'w' stands for in db.currentOp locks.* values? (global versus DB-specific per the docs on the lock timing, but I am seeing a DB-specific global lock in the output, which seems odd)

[16:22:17] <burley_sf> "^task" : "W"

[16:24:07] <ron> I wonder if mongo is a suitable solution for BI at all.

[16:24:39] <NodeX> what sort of reports are you after writing?

[16:24:56] <NodeX> I just wrote a load of aggregation jobs for our BI

[16:24:57] <ron> right now it's a specific limited non-dynamic set, but I don't know what the future holds.

[16:25:18] <ron> basically, different views on different data.

[16:26:01] <NodeX> that's pretty much what I have

[16:26:01] <ron> say, how many users bought this and that product within a given time. or which products did a customer buy within a given time.

[16:26:35] <ron> how many items of a given product did the users buy within a given time period.

[16:26:41] <ron> go OLAP.

[16:27:45] <NodeX> it took me a few days to write the pipelines for all that kind of things

[16:38:12] <ron> it's definitely going to be fun.

[16:42:12] <NodeX> one thing I do recommend is aggregate lower then lower again

[16:42:33] <NodeX> i/e I turned my history into a daily thing then daily to weekly etc etc

[16:43:15] <NodeX> and I keep[ my collections small and export anything more than 3 days old to json and archive it incase I need to run on it again

[16:43:41] <NodeX> unless you have dedicated aggregation boxes

[16:56:05] <ron> NodeX: when you say aggregate, do you mean using the new aggregation framework in mongodb or doing it manually within the application?

[17:06:34] <Bartzy> Hi, I'm migrating data into MongoDB, and my "Schema" in Mongo uses ObjectId (_id) as a timestamp (so we won't need a timestamp key by itself).

[17:06:59] <Bartzy> First - is that correct to do so? Or is it better to just have a datetime field too.

[17:07:36] <Bartzy> Second, I saw Ruby and Python drivers have a from_time method for creating an ObjectId with a specific timestamp. I didn't see any for PHP - any ideas?

[17:17:15] <Bartzy> NodeX: I'm your favorite #mongodb user I guess :p

[17:23:01] <squawknull> if you have secondary-only replication members, does their oplog size still need to be the same size as the primary, or can they be smaller since they would never replicate out from themselves?

[17:24:09] <fsfsfs> Hmmm, I can do unicode-correct sorting now, but I broke the creation of the index.

[17:33:32] <Ephexeve> Hey guys, question, I am learning MongoDB for my job, but I see no difference between find() findOne() - > I get the same result here

[17:40:20] <sambao21> findone will only ever return 1 result, try your query that you know will return multiple docs

[17:44:34] <Ephexeve> sambao21: yeps true, but that would be the same as find({..}).limit(1)

[17:44:59] <Ephexeve> so in some case, there is no huge difference here I guess, if we are talking about speed, then I don't know

[17:45:19] <squawknull> can a secondary only cluster member have an oplog size of less than the primary? what would be the repercussions of that?

[17:45:34] <sambao21> yeah, but i think with findone, it'll actually throw an error if it finds more than 1, i could be mistaken and thinking of something else

[18:04:00] <ukd1> I read somewhere (and now cant find it) that you can update a shard key -only- if it wouldn't cause it to move shard/chunk. However, reading the code it looks like it

[18:04:07] <ukd1> is now immutable?

[18:08:22] <Ephexeve> Hey guys, quesiton, getting this error on the web "You are trying to access MongoDB on the native driver port. For http diagnostic access, add 1000 to the port number" Any clue?

[18:13:41] <kali> Ephexeve: have you tried adding 1000 to the port number ? :P

[18:14:42] <kali> Ephexeve: 27017 (i guess this is the port you're trying) is not intended for browser access

[18:14:59] <kali> Ephexeve: 28017 will work better

[18:18:29] <Bartzy> kali :)

[18:18:37] <Bartzy> I have a question about indexes

[18:19:23] <Bartzy> I'm wondering if I should use _id in a compound index for time sorting (ObjectId), or to add to my schema (documents still not migrated to mongo) a 32-bit int timestamp and add it to a compound index

[18:20:00] <Bartzy> The question is - does it profoundly matter in big collections (hundreds of millions of documents) - 12 bytes ObjectId in the index vs 4-byte int ?

[18:21:00] <Bartzy> The storage for the data itself is of course bigger with the 4-byte int solution - because we store both _id and the timestamp - but that is pretty negligible. The question is if it's negligible for the index (size, performance) too

[18:24:58] <Ephexeve> kali: I did, and nothing :(

[18:25:06] <Ephexeve> I will try to fix this, second

[18:25:41] <kali> Ephexeve: check out mongod command line options, look for --nohttp or something like that

[18:26:22] <kali> Ephexeve: but honestly, your browser will not teach you much...

[18:28:18] <Ephexeve> kali: No no, I don't care about the browser right now, it's in Arch linux mongodb works a bit... crappy

[18:28:28] <Ephexeve> so I need to configure this whole bullshit

[18:28:32] <Bartzy> kali? :(

[18:28:35] <kali> Ephexeve: look for the mongod options...

[18:28:51] <Bartzy> Any idea about my question? a bit weird I know :)

[18:28:58] <kali> Bartzy: well, what do you want me to say ? 4 is smaller than 12 :)

[18:29:06] <kali> Bartzy: but just a bit smaller

[18:29:07] <kali> :)

[18:29:16] <Bartzy> But am I just micro optimizing here and it's not important ?

[18:29:51] <kali> well, i don't know, it depends how much you have optimized the rest... starting with keynames

[18:30:09] <Bartzy> I read everywhere that using the ObjectId's timestamp instead of a separate one is a nice trick - but anyone is really using that in production ? It's pretty hard to maintain...

[18:30:51] <kali> Bartzy: i'm using that in production on a few bif collections

[18:30:59] <kali> Bartzy: i would not do it for smaller ones

[18:32:47] <kali> Bartzy: it's interesting if you gain an index. in your case, it sounds you're just getting one a tad smaller

[18:41:37] <Bartzy> kali: It's not possible to calculate an index size from the size of the key , right ?

[18:43:35] <kali> Bartzy: not directly... you can evaluate it, but it's quite difficult

[18:46:27] <Bartzy> kali: The opposite - with ObjectId I'm getting one that is bigger than timestamp

[18:46:55] <Bartzy> kali: but there is no need for the timestamp field data (around ~1GB with our amount of documents right now)

[18:57:57] <Ephexeve> I downloaded MongoDB, added to /opt/ extracted, now I am trying to run ERROR: could not read from config file

[18:58:12] <Ephexeve> when I run /mongod --config /etc/mongod.conf this is the error I get..

[18:58:13] <Ephexeve> Any clue?

[18:58:19] <NodeX> perms ?

[19:02:58] <Ephexeve> anyone?

[19:12:11] <NodeX> per,s

[19:12:13] <NodeX> perms

[19:12:14] <NodeX> lol

[19:12:54] <NodeX> if you dont read responses then good luck :)

[19:39:57] <jrdn> anybody get clever with the aggregation framework and used it on top of an already pre-aggregated document? and if so, how are you doing it? :P

[20:40:15] <idank> does mongodb use base64 to store binary data?

[20:41:52] <idank> or is that merely how the mongoshell displays it?

[20:50:31] <wereHamster> idank: of course not. storing binary data as base64 would be completely stupid

[20:53:16] <idank> wereHamster: right ;) so it's just the shell

[20:53:17] <ron> How did you get to base64?

[20:53:54] <idank> that's what I see when I look at a document with binary data in the shel

[20:53:55] <wereHamster> ron: the shell displays binary data base64 encoded

[20:54:38] <ron> I see. We don't store binary data in mongo.

[20:57:58] <idank> can mongodb data files shrink in size if I delete data?

[20:58:11] <lonnylot> re: PHP - what is the advantage of using the MongoDate class instead of just storing a unix timestamp?

[20:58:40] <wereHamster> idank: only when you repair the database

[20:59:50] <idank> wereHamster: how do I do that?

[21:00:44] <ianblenke1> idank: http://www.mongodb.org/display/DOCS/Durability+and+Repair

[21:08:30] <wereHamster> idank: you google 'mongodb repair database'

[21:11:19] <idank> "errmsg" : "Cannot repair database mintigo having size: 32147243008 (bytes) because free disk space is: 14603206656 (bytes)"

[21:11:45] <idank> so to shrink the db I need to have free disk space the size of the db?

[21:24:25] <ribo> can/do replicas use compression to synchronize over slow/expensive links?

[22:29:38] <therealkoopa> I'm struggling to figure out how to fetch a bunch of subdocuments. How could I grab an array of all quux? https://gist.github.com/c966f781e4804fc5b3f7

[22:30:31] <therealkoopa> I'm trying to map reduce, but in the map function, I can't seem to do this.widgets.forEach(function(w) { w.quux.forEach(...)}} to emit

[22:32:23] <bitcycle> Hey all. I'm wondering if someone could recommend a good tutorial on how to define relationships between different documents in mongodb.

[22:39:35] <eka> hi all... how can I see if I'm suffering db lock?

[22:40:07] <eka> the thing is that a web request that writes to mongo takes sometimes 20secs

[22:40:15] <eka> and I discarded all other options

Log file Viewer

Help | Karma | Search:

#mongodb logs for Monday the 15th of October, 2012