pmxbot IRC Log Viewer

[00:00:24] <sirpengi> ferrouswheel: what's the actual error you get?

[00:00:27] <sirpengi> do you have a traceback?

[00:36:45] <doug> anyone install onto ubuntu and/or debian?

[00:36:55] <doug> w/chef..

[00:36:56] <doug> i'm reading http://docs.mongodb.org/manual/tutorial/install-mongodb-on-debian-or-ubuntu-linux/

[00:37:26] <doug> and comparing to https://github.com/edelight/chef-cookbooks/blob/master/mongodb/recipes/10gen_repo.rb

[01:07:05] <ferrouswheel> sirpengi, traceback at http://pastebin.com/HrcuZHR1

[02:00:47] <sirpengi> ferrouswheel: so if you make the request again (without restarting apache) does it not come back?

[02:08:45] <acidjazz> i need a lil desgin reco

[02:08:48] <acidjazz> if any1s around

[02:08:58] <acidjazz> this site im working on i have to create a 'like' system

[02:09:01] <acidjazz> similar to facebooks

[02:09:20] <acidjazz> and ofcourse i cant have a limit on the amt of likes

[02:09:35] <acidjazz> should i just create a like collection and link it to users and content?

[02:09:51] <acidjazz> have a docu for each like?

[02:11:47] <ferrouswheel> sirpengi, it repeatedly wouldn't work... I tried about a dozen times, and had about 30 prior error emails in my inbox before hand. started working again immediately after restart)

[02:12:28] <ferrouswheel> but it does seem pretty weird, I would've expected it to reconnect without issue. will see if it happens again.

[02:49:12] <dstorrs> anyone know what this means (approximate, as I can't cut/paste it and console just died, so it's from memory)? "DB 0 130004 keys volatile in HT something-or-other"

[02:49:35] <dstorrs> my server just got nuked by something. that message was being spewed to the console over and over.

[02:49:52] <dstorrs> only thing I could do was restart it

[02:50:38] <dstorrs> it looks like it's a Mongo thing, but I can't find anything on Google about it.

[03:00:32] <jstout24> how bad is it to have ~20 queries on a page? (all indexed queries, 95% on _id)

[03:00:48] <guest232131> on a page?

[03:01:05] <jstout24> (of course i'm going to start doing result caching, but if i can go live with something now, i'd rather do that then work on result caching later)

[03:01:17] <jstout24> basically in one request of my app

[03:01:24] <guest232131> ?

[03:01:31] <jstout24> o.. m.. g

[03:02:07] <jstout24> Visitor A goes to http://mysite.com/wtf… /wtf = a page = one request = ~20 queries

[03:02:57] <guest232131> the only bad thing is your question

[03:03:12] <guest232131> did you measure your performance?

[03:03:32] <guest232131> why would people be in charge for guessing your benchmark or performance?

[03:03:44] <guest232131> do people have to think for you?

[03:03:48] <guest232131> for trivial things?

[03:03:58] <jstout24> in practice

[03:05:39] <jstout24> it's normally not a question i'd ask considering i normally cache results in memcached, etc.

[03:06:23] <guest232131> this is up to you

[03:06:33] <jstout24> but if i have to do micro benchmarks on hardware and software instead of a yes or no to a simple question, i can do that also.

[03:06:35] <jstout24> =)!

[03:06:40] <guest232131> nobody knows your application, your requirements, your load

[03:11:19] <jstout24> neither do i, i've never ran mongodb in producdtion

[03:11:20] <jstout24> :P

[03:13:15] <jstout24> thanks guy!

[03:14:51] <dstorrs> hey all. I've got mongod v2.0.3. The machine just crashed and got restarted (as mentioned above). Now when I start Mongo, I see "listDatabases failed: ... "assertion" : "can't map file memory", ..."

[03:15:26] <dstorrs> what is this telling me? how do I proceed?

[03:16:09] <guest232131> 32bit?

[03:16:24] <dstorrs> should be 64, but how do I verify?

[03:16:42] <guest232131> by reading your mongod console messages or logs

[03:17:12] <dstorrs> file /usr/bin/mongod => " ELF 64-bit LSB executable, x86-64,..."

[03:19:14] <dstorrs> aha!

[03:19:36] <dstorrs> thank you guest232131, you got me to the right thing.

[03:19:50] <dstorrs> Log made it clear -- a file had become owned by root instead of 'mongod'

[03:22:11] <ferrouswheel> jstout24, depends on the size of the query and the size of the intersection of frequently accessed documents

[03:22:31] <guest232131> owned by root?

[05:37:56] <looopy> would mongodb be a 'great' fit for writing say...forum software?

[05:42:18] <guest232131> you can write a forum with every database

[05:54:49] <abstrusenick> is there other type of object id that only using numbers?

[05:55:25] <ron> object id only uses numbers?

[05:55:31] <ron> problem solved!

[05:56:18] <abstrusenick> one problem with object id is the timestamp embedded

[05:57:44] <ron> it's still a number..

[05:58:04] <guest232131> 42

[05:58:40] <ron> and that's what ObjectId is. there's no 'other' ObjectId. but maybe if you explained the issue, we could provide with a solution.

[05:59:26] <abstrusenick> i want a shorter random id which only consists of numbers

[05:59:52] <abstrusenick> this is to be used only for users collection

[05:59:59] <abstrusenick> because i want to expose the ID publicly

[06:00:00] <ron> then generate it, what's the problem?

[06:00:05] <abstrusenick> i want to use the ID for permalink

[06:00:38] <abstrusenick> is there a way to make sure its unique on the database level?

[06:00:53] <ron> sure, use an index.

[06:01:03] <ron> with unique=true, that is.

[06:01:33] <abstrusenick> also I'm looking into algorithm to generate ID that can expand

[06:01:51] <ron> well, that's kind of out of scope for the channel :)

[06:01:54] <abstrusenick> so i was thinking if somebody had solved this issue before

[06:02:40] <abstrusenick> i see

[06:02:42] <ron> not sure why you want/need it to be random though.

[06:03:09] <abstrusenick> i just want a short sequence of numbers that cannot be guessed

[06:03:20] <abstrusenick> if its in order, people would know how many users you have

[06:03:37] <abstrusenick> and you don't want to expose which data the user is created

[06:03:54] <ron> okay, that I can relate to.

[06:04:34] <ron> how about using a UUID?

[06:04:57] <abstrusenick> yeah thats nice

[06:04:59] <abstrusenick> but its too long

[06:05:05] <abstrusenick> i want it to start short

[06:05:12] <abstrusenick> and expand as the number grows

[06:05:19] <abstrusenick> which is logical i think

[06:05:53] <abstrusenick> this is nice i guess

[06:05:53] <abstrusenick> https://github.com/aaronblohowiak/Random-ID

[06:06:37] <ron> if you use js, yes :)

[06:06:53] <ron> I'm sure there are alternatives out there in other languages.

[06:07:33] <abstrusenick> i guess i can do timestamp of current hour + 8 digits of random number

[06:07:35] <abstrusenick> that will scale right

[06:08:19] <ron> if it's only current hour, you may hit dupes.

[06:11:07] <abstrusenick> whats the reason of object id again?

[06:11:20] <abstrusenick> if not mistaken is beneficial for sharding?

[06:26:00] <ron> abstrusenick: I think so, not sure entirely.

[06:26:09] <ron> anyways, I gtg. good luck!

[06:30:28] <Miko2> Can I somehow tell mongodb's sorting that "" should come after "" and not the other way around?

[06:32:17] <guest232131> mongodb does not know about locale aware sorting

[06:36:11] <Miko2> Meaning, there's no way to get the correct alphabetical ordering out of mongo for Finnish?

[06:36:55] <guest232131> mongodb does not know about locale aware sorting

[06:37:50] <Miko2> Good to know. Oh, well. Have to make some quite ugly fixes then. :)

[06:38:12] <guest232131> we have to repeat answers twice?

[07:28:13] <jondot> hello

[07:28:48] <jondot> i want to model a tree in a way that several trees reference a common leaf.

[07:29:03] <NodeX> plant a seed :P

[07:29:10] <jondot> then id like to query for all the trees that references it, with 'Materialized Path'

[07:30:01] <guest232131> use a graph database

[07:30:09] <jondot> this means i want something like /.*,b,c/ on the MP. question is - is it efficient on mongodb?

[07:30:27] <jondot> the docs claim that /^a,b,.*/ (left to right) is efficient

[07:30:31] <NodeX> http://www.mongodb.org/display/DOCS/Trees+in+MongoDB

[07:30:40] <jondot> NodeX: im looking at that, thanks

[07:31:08] <NodeX> it depends on how you are using it, as with most things in mongo it's on a per use case

[07:31:14] <guest232131> only fools store trees inside a rdbms or a nosql DB

[07:31:51] <jondot> guest232131: graph databases have a class of scale problems of their own.

[07:32:09] <guest232131> trees inside a standard db as well

[07:32:22] <jondot> guest232131: per use case it can be very efficient, actually.

[07:32:34] <jondot> querying for subtrees is very efficient and easy

[07:33:05] <NodeX> by use case I mean your data is different to other peoples and you query it differently, what works for me might not work for you and VV

[07:33:24] <jondot> if you need to walk arbitrary relationships where each step yields thousands of nodes - use graph db

[07:34:57] <jondot> NodeX: well the use case is simple. given a node N, i want to be able to fetch all its parents up to level X

[07:35:26] <NodeX> is that the -only- or -most- way you will query your data though ?

[07:35:34] <jondot> only way

[07:35:55] <NodeX> then ^a,b,/ is the most efficient

[07:36:03] <jondot> its opposite

[07:36:21] <jondot> /,N$/

[07:36:29] <jondot> thats why I'm asking - if it matters

[07:36:40] <NodeX> a,b,c .... .. query -> /c/ will give you all of them

[07:36:45] <jondot> some DBs implement this kind of search in a Trie which is a prefix tree.

[07:37:04] <jondot> so in that way it matters which direction is the wildcard

[07:37:20] <NodeX> prefixes must be used for efficiency in regex

[07:37:21] <jondot> how would mongodb index this?

[07:37:25] <NodeX> that much I can garuntee

[07:37:25] <[AD]Turbo> hola

[07:37:40] <jondot> NodeX: and postfixes?

[07:37:52] <NodeX> I have never used them so I couldn't comment

[07:38:00] <NodeX> certainly prefixes

[07:38:06] <NodeX> pre + post = not sure

[07:38:29] <NodeX> and all lowercase in regex else it wont hit an index

[07:38:35] <NodeX> lowercase/uppercase

[07:38:38] <NodeX> (one case)

[07:39:32] <jondot> any idea how it does the search in this case?

[07:40:19] <NodeX> I dont know the inner workings of it sorry

[07:40:40] <jondot> ok, thanks

[07:40:43] <NodeX> normaly the best thing to do is wait to ask a 10gen employee later on or model the data and benchamrk

[07:41:30] <NodeX> the only thing similar I had to do was with UK postcodes which can go forward and backward

[07:41:41] <NodeX> prefixing shaved 3 seconds off a query!

[07:42:23] <jondot> yea i see

[09:16:22] <robx> Hello! I Have a quick question. I have a document structure where each document has a sub document like: { field : value, subdocument : { field : value } }. I need to do a query where one of the fields in the subdocument is unique. Any suggestions? Or is the only way to do a distinct first, save those results in memory then do findOne foreach of those results??

[09:21:25] <robx> Am I out of luck?

[09:22:45] <robx> /j #nodejs

[09:23:34] <_johnny> robx: but it IS unique, right?

[09:24:47] <robx> The document is not unique, that's the thing, but it exists in a lot of results

[09:25:12] <robx> e.g { title : "party", user : { id : 12321 } }

[09:25:21] <robx> then another document has the same thing, but title "koala"

[09:25:56] <_johnny> ok, and you need the user.id?

[09:26:12] <robx> I need the entire user object

[09:26:40] <robx> user : { firstname : "x", lastname : "y", id : 23912 } as an example (again ;))

[09:27:25] <robx> and each user exists in many documents

[09:27:27] <_johnny> based on what? yes. you have two of those, with diff title, and a user obj. whay is it exactly that you try to query?

[09:27:38] <kandie> Hi, is there a way to compile the mongo/bson C++ driver to throw exceptions instead of assertions?

[09:28:16] <robx> based on the user_id

[09:28:29] <robx> These results are sent to a front-end and we dont want duplicates of each user

[09:29:15] <robx> Say we have 100k results, and 40k users, so some users will exists many times, what we want is to get those out but make sure they are unique

[09:29:50] <_johnny> sry 1 sec, phome

[09:31:04] <robx> No problem I appreciate you taking your time :)

[09:32:23] <_johnny> ok. i'm not sure i undertand your query. you select all the objects with different titles, and pull the users from that? you don't have a userdb in another collection or?

[09:33:04] <robx> Not exactly, lets make this a real world example sec :P

[09:33:48] <robx> Say we have blog posts, and each of those blog posts has a subdocument of the user posting it. We have 20k blog posts, among 100 users. Then we have a "list all bloggers" page, where we want to get the data from the blog posts

[09:34:00] <robx> (since we don't have any other data to go by)

[09:34:33] <_johnny> so, no user lookup/collection?

[09:34:41] <robx> exactly

[09:34:45] <_johnny> ah, okay

[09:35:07] <_johnny> out of curioucity, how do you authenticate? 3rd party?

[09:35:28] <robx> To mongo? Via an API

[09:35:55] <_johnny> no, i mean the users

[09:37:04] <robx> This was just an example :) But the authentication could've been done via form posting over HTTPS and using crypto on the recieving end

[09:37:30] <kandie> guys, I'm really over my head with this I would really appreciate any help - I'm using the C++ driver v2.1.0, and no matter what I do, I get assertions (not exceptions) whenever I try to look up a field that might not exist on a _valid_ BSONObject object

[09:37:48] <_johnny> well, if you don't have the user data anywhere else, i think you'll have to pull all, and do a client (or whichever retrieves the query) unique method.

[09:38:09] <kandie> what's funny is that it works fine so long as the fields are actually defined, but it asserts otherwise (and I can't even use obj.hasField() because that breaks all the same)

[09:38:09] <_johnny> robx: right, but it's "anynomous" then? i mean, my user/pw is not checked anywhere

[09:38:29] <robx> You can save it crypted in memory

[09:39:58] <robx> kandie: are you running mongo under virtualbox?

[09:40:24] <_johnny> hehe, sorry, i really don't understand how you can have users without a user db. i can log on right now with "herp/derp" and write blog posts? if your userdb is in memory, it would be cleared and you'd loose all those 40k users you have

[09:41:28] <kandie> robx: not really, I'm using arch linux x86_64, and this is happening on our live servers which run ubuntu

[09:42:49] <robx> __johnny you could cache it in an encrypted flatfile :)

[09:43:18] <_johnny> heh, true. but you can't use it for lookup?

[09:43:52] <robx> __johnny indeed you can, you read the cache into memory (decrypt it) and validate using that

[09:43:58] <kandie> robx: hey this worked! running BSONObject::getOwned() on the object i get from the cursor made it tick.. but that's strange because the documentation clearly states that BSONObject uses a reference based smart ptrs, why should i need to do this?

[09:44:29] <_johnny> robx: ok, then what stops you from listing the users from that, rather than collecting from your blog posts and running a uniq() based on user id ?

[09:44:59] <robx> __johnny I don't have any listings of user in any file - as I said this was just an example :]

[09:45:36] <_johnny> still, i'd imagine if you put an index on posts.user you should just do a distinct query for posts.user. it should do what you want

[09:45:59] <robx> kandie: I'm not really sure - I would wait a couple of hours and then ask in this room again :)

[09:46:06] <_johnny> where posts { title: "test", user: { user_id: .., username:.. } } you mentioned before

[09:46:26] <robx> _johnny the index wouldn't do anything tho

[09:46:48] <_johnny> what do you mean wouldn't do anything? it would let you query posts.user

[09:47:16] <kandie> robx: I'll stick around, thanks mate

[09:48:24] <robx> Okay, lets say this, I've taken out all the user ids, stored em in an array, [100, 101, 102, 103, ...] Now I need to get all users with this, but only want to get each result once.

[09:48:39] <robx> Then I would have to iterate over the array and do findOne (20 times)

[09:48:47] <_johnny> right. i'm setting up a test collection right now to check

[09:50:07] <robx> :)

[09:51:46] <_johnny> robx: a structure along the lines of this http://pastebin.com/XuH72f5T ?

[09:52:02] <_johnny> or just id's, but.. whatever

[09:53:06] <robx> Yes, like that, and then multiple documents have the same subdocument user

[09:57:14] <_johnny> robx: this is what i got: http://pastebin.com/30Kk9kyr

[09:57:57] <_johnny> one destinct, on one key (index)

[09:58:08] <robx> What happens now if you duplicate those documents, say 20 times.

[09:58:17] <robx> And you only want to get each USER OBJECT, once

[09:58:36] <_johnny> same

[09:58:52] <_johnny> what i just did. 2345 (Rob X.) has two blog posts

[09:59:00] <_johnny> and now three, two of them identical

[09:59:17] <robx> and you only get one user object back?

[09:59:20] <_johnny> "values": [1234, 2345}

[09:59:25] <_johnny> ]*

[09:59:34] <_johnny> yes. try it out

[09:59:51] <_johnny> if you want the whole object and not use those indexes, just put "user" as key

[09:59:58] <_johnny> db.runCommand({ distinct: 'posts', key: 'user'})

[10:00:17] <robx> That would be heavy on 100k documents tho wouldnt it?

[10:01:24] <_johnny> compared to what, doing one query for ALL objects in the collection, and then 20 findOnes? :P

[10:02:11] <robx> __johnny Haha, I get your point :) I will try that one, thanks mate! Appreciate it

[10:02:20] <_johnny> np :)

[10:02:33] <NodeX> distinct is not that heavy

[10:02:34] <_johnny> if you have indexes for the user object, it shouldn't be *that* heavy

[10:02:51] <NodeX> I regularly run it on a million documents and it takes not long at all

[10:03:00] <_johnny> i mean, even if twitter would do a page with a list of all their users, it would take time :p

[10:03:17] <robx> Is it possible to mix where queries with distinct?

[10:03:21] <_johnny> NodeX: nice

[10:03:38] <NodeX> robx : no

[10:04:15] <NodeX> http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct

[10:05:31] <robx> Then I still need to save those results in memory and then use those to query mongo. Since I need limit, offset and queries

[10:06:00] <NodeX> what are you trying to achieve - I missed the first part

[10:06:16] <robx> Hello! I Have a quick question. I have a document structure where each document has a sub document like: { field : value, subdocument : { field : value } }. I need to do a query where one of the fields in the subdocument is unique. Any suggestions? Or is the only way to do a distinct first, save those results in memory then do findOne foreach of those results??

[10:06:22] <robx> Say we have blog posts, and each of those blog posts has a subdocument of the user posting it. We have 20k blog posts, among 100 users. Then we have a "list all bloggers" page, where we want to get the data from the blog posts

[10:06:23] <robx> <robx> (since we don't have any other data to go by)

[10:07:40] <NodeX> are they always unique (the subdocuments0

[10:08:54] <NodeX> right I see ... but how come you cant query the "users" collection to get this list

[10:09:06] <NodeX> seems very expensive to distinct just for thta

[10:09:07] <NodeX> that *

[10:13:21] <_johnny> robx: but isn't it the list of those 100 people you want? o_O

[10:14:14] <NodeX> ^^ no need for a findOne() on each document

[10:15:47] <_johnny> that's what the distcint cmd above was for

[10:16:23] <_johnny> i have two users with 4 blog posts. the distinct gives a list of those two users, even though one user has three blog posts.

[10:16:34] <_johnny> substitute two for 100, and 4 for 20k :D

[10:16:44] <_johnny> the principle remains the same

[10:17:34] <robx> Johnny nu, that's the least of my problems :P

[10:17:56] <_johnny> but that's what your question was about

[10:17:58] <robx> I need to get the UNIQUE USER OBJECTS

[10:18:06] <_johnny> right?

[10:18:33] <_johnny> http://pastebin.com/dAj7JVLz <-

[10:18:44] <_johnny> is that not the unique user objects? :P

[10:18:44] <robx> __johnny if you need to list all those users, including their firstnames lastnames etc. What would the ID's give you?

[10:19:32] <NodeX> and here lay the problem of thinking of Mongo and data storage in terms of relational

[10:19:43] <_johnny> well, that's what i said earlier. just put the key you want. i took user.user_id, but you can just put user if you want the entire object

[10:20:11] <_johnny> robx: the id doesn't give me anything. that was just an example of what you can represent it as. look at my pastebin above

[10:20:22] <_johnny> db.runCommand({ distinct: 'posts', key: 'user'}).values

[10:20:47] <_johnny> gives an array of the unique user objects - however you've defined them in your collection

[10:21:06] <_johnny> in my test collection i just made them as { user_id: ..., user_name: '..' }

[10:21:23] <robx> Okay, now I want to get all of those who's lastname is Smith among those users

[10:21:50] <NodeX> foo.lastname:'Smith'

[10:21:53] <_johnny> why not select those, instead of retreiving ALL users?

[10:22:26] <robx> since John Smith has posted 36 blog posts, and I only want to get him once

[10:22:46] <_johnny> NodeX: i think he means doing the lastname: 'Smith' query on the distinct cmd

[10:23:00] <NodeX> yeh ^^ that's what I did

[10:23:09] <_johnny> what is foo there?

[10:23:14] <NodeX> user

[10:23:16] <robx> the object representing the results i guess

[10:23:20] <NodeX> db.runCommand({ distinct: 'posts', key: 'user.lastname'})

[10:23:35] <NodeX> I see what he means now

[10:23:39] <_johnny> NodeX: i just tried it, but i can't query on that

[10:24:03] <NodeX> you must have the syntax wrong - that is how I distinct things

[10:24:29] <robx> The only alternative seems to be savign the entire distinct result in memory and then use that to serve the results to the client.

[10:24:45] <NodeX> http://www.mongodb.org/display/DOCS/Aggregation#Aggregation-Distinct <--- distinct may also reference a nested key

[10:25:00] <_johnny> NodeX: i just made the example for distincting: db.runCommand({ distinct: 'posts', key: 'user'}).values <- that's what he wants to query lastname: 'Smith' on

[10:25:15] <robx> johnny now you understand my problem! :D

[10:25:23] <NodeX> robx : if it';s just a list of users why are you even querying the blogs collection ?

[10:25:23] <_johnny> i've always understood :p

[10:25:29] <NodeX> surely you have a user collection too ?

[10:25:36] <_johnny> NodeX: hehe, no, he does not

[10:25:48] <robx> NodeX, that's the problem. I don't.

[10:25:58] <_johnny> :)

[10:26:13] <NodeX> so anyone can blog on your system ?

[10:26:25] <NodeX> gimme the link for some free spamming ;)

[10:26:27] <robx> The blog is just a methapore, an example

[10:26:33] <_johnny> NodeX: what he asks is essentially, can one use find on a db.runCommand().values result

[10:26:43] <_johnny> NodeX: hehe my thought aswel :)

[10:27:10] <robx> Skip the entire blog thinking, it was just to represent the structure

[10:27:27] <_johnny> findOne({user.lastname: 'Smith'})

[10:27:29] <_johnny> :P

[10:27:41] <NodeX> you can do this if you want to (on the shell) ... db.foo.distinct("bar").forEach(....;

[10:27:46] <_johnny> still don't get what you need that for though

[10:28:28] <robx> OKAY LAST TRY :D........

[10:31:00] <robx> I have an application, people can upload DOCUMENTS, they have registered through a php site. They have a user ID. The document upload server is a node app. Node recieves the user ID with all upload requests. Each user can upload 10 documents, thus meaning the user exists in 10 mongo objects. Now, I want to list all the users who use my application.

[10:31:23] <robx> But I only want to list them once, and I want people to be able to filter the users as well.

[10:31:31] <NodeX> do you want to know the easiest way to do it?

[10:31:38] <robx> Create a user collection

[10:31:40] <NodeX> yes

[10:31:43] <NodeX> lol

[10:32:31] <NodeX> second piece of advice ... if you want to count the number of documents each without looping all the time - save a count with each user

[10:32:43] <NodeX> $inc : 1 on upload, $inc : -1 on delete

[10:35:43] <NodeX> sector_string

[10:35:44] <NodeX> oops

[10:59:28] <_johnny> robx: while also being the easiest, it is also the right way, and a more costefficient way :)

[11:19:56] <unsleep> are mongo collections limited to hardisk space?

[11:21:26] <Derick> yes... if you run out of disk space, you can't add more data

[11:21:36] <wereHamster> captain obvious! :)

[11:21:59] <augustl> we totally need "dbrotate", that gzips old data.

[11:22:18] <NinjaPenguin> does anyone have an ETA on the next update of the php driver?

[11:22:20] <unsleep> so then mongo is not a scalable distributed document database then.. isnt it? :)

[11:22:43] <augustl> unsleep: I'd say the ability to add more disks to store more data is "scalable"

[11:23:40] <unsleep> but if you add more nodes too shard the same content without addind more space then.. its not scalable.. isnt it?

[11:24:25] <augustl> I wouldn't say that a system that isn't able to store more data than what is available on the disks "isn't scalable"

[11:25:01] <augustl> "scalable" applies to the entire stack, not just the mechanics of the software

[11:25:30] <unsleep> i think its scalable in resources but not space :-?

[11:25:53] <augustl> unsleep: sounds like you're only talking about CPU scaling

[11:26:12] <augustl> so it does seem like in some cases, scaling for better CPU utilization also requires adding more storage

[11:26:29] <wereHamster> unsleep: adding more nodes == adding more space

[11:27:27] <augustl> unsleep: so I think what you're saying is that some times, scaling for more distributed CPU usage also requires adding more disk

[11:27:38] <augustl> I disagree that this means "isn't scalable" though

[11:27:40] <unsleep> so if a add a new node i dont need to do nothing special to store a bigger collection than the first node size?

[11:28:00] <unsleep> im confused about the terms, sorry

[11:28:47] <wereHamster> unsleep: well, depends on your mongodb cluster. If you are using replica sets then all nodes contain the same data (and so the db is restricted by the size of the smallest node). But if you shard, then mongodb distributes the data across all available nodes.

[11:29:17] <wereHamster> unsleep: http://www.mongodb.org/display/DOCS/Sharding+Introduction

[11:29:21] <augustl> no idea how mongo handles a 200mb large document if you only have db shards that are 100mb in size. Don't think that's a possible scenario though. Disclaimer: I haven't used mongo yet ;)

[11:29:57] <wereHamster> augustl: the size limit on a single document is 16M

[11:31:01] <augustl> and the minimum size of a db is 200mb iirc? So it seems like it is indeed an impossible scenario

[11:31:14] <wereHamster> no.

[11:31:15] <augustl> (to have a document so large that it need to span multiple shards

[11:31:37] <Derick> NinjaPenguin: what's your thing with the php driver? we'll probably make a release next week

[11:32:06] <Derick> unsleep: sharding *splits* the data in a collection to scale it

[11:33:02] <Derick> a bit late :-)

[11:35:17] <unsleep> im happy to know i dont need to change from db xDDDD

[11:36:27] <NodeX> lol

[11:40:00] <teelmo> Hi.

[11:40:32] <teelmo> I was wondering if anyone can point into the right direction on this problem of mine..

[11:40:40] <NodeX> best to just ask ;)

[11:40:50] <NinjaPenguin> Derick we see a number of connection issues with it currently

[11:41:09] <teelmo> I've got data that has timestamps: Mon, 18 Jun 2012 21:08:40 +0000 ; Tue, 19 Jun 2012 15:50:13 +0000

[11:41:16] <NinjaPenguin> we run php-fpm under nginx and failover just doesnt work for us at all

[11:41:18] <teelmo> so there are two timestampts.

[11:41:57] <teelmo> and I wanted to make a query that would return the data hourly..

[11:42:01] <Derick> NinjaPenguin: that won't come in the first new version, but in 1.3/2.0 though

[11:42:08] <Derick> we're working hard on rewriting this

[11:42:14] <NinjaPenguin> if primary/secondary swap then we get multiple "Cannot determine master" issues

[11:42:15] <Derick> (right as we speak)

[11:42:24] <NinjaPenguin> which can only be resolved with restarting FPM instances

[11:42:26] <Derick> NinjaPenguin: yes, that should resolve itself within 60 secs though, not?

[11:42:42] <NinjaPenguin> no, we see longer intervals than 60 seconds

[11:43:22] <Derick> hmm, odd

[11:43:23] <teelmo> http://docs.mongodb.org/manual/use-cases/hierarchical-aggregation/ i was checking out this example.

[11:43:26] <Derick> and you're on 1.2.10 now?

[11:43:30] <NinjaPenguin> yeah

[11:43:34] <Derick> meh

[11:43:48] <NinjaPenguin> we also run non fpm in which the 60 second limit DOES apply

[11:43:55] <Derick> interesting

[11:44:20] <Derick> that makes little sense actually

[11:44:50] <NodeX> NinjaPenguin : a possible work around is to have your FPM restart after a lower number of requests

[11:44:52] <NinjaPenguin> could there be some issue with cycling the connections in FPM perhaps?

[11:45:13] <NodeX> it's the max_requests paramter iirc

[11:45:18] <Derick> NinjaPenguin: what happens if you set http://uk.php.net/manual/en/mongo.configuration.php#ini.mongo.is-master-interval and http://uk.php.net/manual/en/mongo.configuration.php#ini.mongo.ping-interval to 0

[11:45:41] <Derick> NinjaPenguin: shouldn't differ...

[11:45:47] <NinjaPenguin> we actually tried that in a local VM environment and it appeared to have no difference

[11:45:53] <Derick> hm

[11:45:55] <Derick> odd

[11:46:02] <NinjaPenguin> so we havn't excalated to live

[11:46:08] <NinjaPenguin> escalated*

[11:46:36] <Derick> NinjaPenguin: did you try with logging on?

[11:46:58] <NinjaPenguin> mongod logging?

[11:47:13] <Derick> no, driver logging

[11:47:35] <Derick> sadly, it throws php notics

[11:47:37] <Derick> notices

[11:47:44] <NinjaPenguin> i dont believe so, how do you enable this? that is certainly something I could look into

[11:47:59] <Derick> but with display_errors, that should be ok

[11:48:01] <Derick> one sec

[11:48:02] <NinjaPenguin> notices are fine as we can replicate on a local virtual environment

[11:48:14] <Derick> right, but you want to log them with php's log_errors

[11:48:27] <NinjaPenguin> (yeah we will log them all)

[11:48:39] <Derick> http://uk.php.net/manual/en/class.mongolog.php

[11:48:53] <NinjaPenguin> excellent, I will take a look into that

[11:48:54] <Derick> the first two lines of the example

[11:49:11] <NinjaPenguin> ok ALL and ALL

[11:49:14] <Derick> yes

[11:49:21] <Sim0n> Hiya. Im doing maintanance on a 3 node-replica set. I have brought one of the nodes down for reasons not related to my question. Question is: Can I install mongodb 2.2 on the down node, add it to the replica set without complications?

[11:49:36] <Derick> 2.2 is not out yet

[11:49:52] <NinjaPenguin> any ETA on that

[11:49:57] <Sim0n> Silly me, just reading headlines on HN. :)

[11:50:09] <Derick> NinjaPenguin: summer

[11:50:13] <NinjaPenguin> cool

[11:50:28] <NinjaPenguin> so Derick is the issue that the driver is simply caching the master

[11:50:34] <Derick> no, it should not

[11:50:39] <NinjaPenguin> interesting

[11:50:44] <Derick> mongod kills the connection when it stops being primary

[11:50:56] <Derick> the driver should pick that up within the is_master timeout

[11:51:29] <NinjaPenguin> i see, i see then why running as FPM should make no real difference to that

[11:51:44] <Derick> correct

[11:51:53] <Derick> but that's not what happens it seems

[11:52:16] <NinjaPenguin> ok well I can now look into this further with the logging option

[11:52:24] <NinjaPenguin> one other, minor, question

[11:52:35] <Derick> sure

[11:52:42] <NinjaPenguin> is it possible to evenly distribute reads accross multiple slaves

[11:52:52] <NinjaPenguin> i can see how we can specify a slave to use

[11:53:00] <NinjaPenguin> and so can see how to create that logic internally

[11:53:02] <Derick> should already work

[11:53:21] <Derick> the driver pings, and puts slaves into buckets

[11:53:21] <NinjaPenguin> on round robin?

[11:53:34] <Derick> and will randomly use them as long as they're in the same speed bucket

[11:53:58] <Derick> it's random rather than round robin

[11:54:00] <NinjaPenguin> very interesting indeed, i hadn't picked up on that from the documentation, apologies

[11:54:13] <Derick> we're adding some more functionality and read preferences into 1.3 too

[11:54:22] <NinjaPenguin> yeah fantastic

[11:54:28] <Derick> 2.0 will be breaking, where it has new connection handling

[11:54:31] <Derick> at least, that's the plan

[11:54:36] <NinjaPenguin> understandable

[11:54:44] <Derick> 2.0 should come Q3/Q4

[11:55:38] <NinjaPenguin> well I have additional angles to explore now with the driver logging, so thats great

[11:55:42] <NinjaPenguin> many thanks for your time

[11:55:45] <Derick> np

[12:15:15] <FerchoDB> do anyone use the C# mongo driver? I'm trying to do queries and return results as "dynamic". Is that possibe? or do I must use a mapped class?

[12:19:33] <Gabrys> hi here

[12:19:44] <Gabrys> I'm playing with mongodb sharding

[12:19:58] <FerchoDB> it seems that there was a "findOne" method that returned dynamic, but now there is only findOneAs

[12:20:39] <Gabrys> I'm pushing milions of documents to mongos using mongoimport command and it seems it distributes the data to one shard at time, then switch to the second

[12:21:35] <Gabrys> the performance is 10,000 inserts per second (and if pushing data to mongod directly it's 15,000/s)

[12:21:57] <Gabrys> I though sharding should increase insert throughput, not limit it

[12:22:08] <Gabrys> and that shards would be equally loaded during massive import

[12:22:29] <NodeX> it depends on your shard key

[12:22:42] <Gabrys> I used _id as the shard key

[12:23:17] <FerchoDB> Correction: the findOne method was not in the official C# driver, I was looking an example made with another driver

[12:23:49] <NodeX> http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key#ChoosingaShardKey-Writescaling

[12:24:47] <NodeX> I would think by reading that document that an _id shard key would fill up shard 1, then move to shard 2 and so on

[12:26:41] <FerchoDB> correction again: in the official doc, it mentions findOne() as a method although it is not available on the last version of the library

[12:27:51] <Gabrys> NodeX, I tried more "natural" shard key before, but results were similar, I'll try again

[12:29:07] <Gabrys> NodeX, is it possible (and makes sense) to shard by ObjectId hash?

[12:30:00] <NodeX> you're probably better waiting for someone who knows more about sharding than me, certainly write scaling anyway

[12:36:18] <Derick> Gabrys: depends on why you shard

[12:36:32] <Gabrys> Derick, hi

[12:37:14] <Gabrys> sharding because I have lots of data, that will grow in future and want to import ~10M documents daily fast

[12:37:30] <Derick> do you want to shard because of write performance as well?

[12:37:35] <Gabrys> Derander_, yes

[12:37:37] <Gabrys> Derick, yes

[12:38:00] <Derick> in that case, your shard key needs to not "lean right" such as the objectid, as that starts with a timestamp

[12:38:17] <Gabrys> Derick, ok

[12:38:30] <Derick> which means all writes will have to go to one shard as that's where the new range lives (likely)

[12:38:32] <NodeX> that's what I thought but didn't know how to explain it !

[12:38:44] <Derick> I'm not stellar with this either

[12:38:59] <NodeX> I think about the alphabet analagy

[12:39:08] <NodeX> 26 shards - one fopr each letter - good shard key

[12:39:16] <Derick> right, but ou can't have things zetter than zet :-)

[12:39:23] <Derick> or is it zed?

[12:39:26] <Derick> zedder than zed

[12:39:29] <Derick> still sounds funny

[12:39:37] <NodeX> btoh sound great :P

[12:39:41] <kali> zed is dead ?

[12:39:54] <NodeX> anyway the principal stands - an eual shard key

[12:39:58] <NodeX> equal *

[12:40:07] <NodeX> 12 shards for 12 months of the year

[12:40:09] <Gabrys> Derick, NodeX, so hash would be ok?

[12:40:16] <NodeX> 4/5 for weeks of the month

[12:40:29] <Derick> Gabrys: don't you have a more natural field?

[12:42:10] <Gabrys> I can shard the db by key: countryId, userId, categoryId for instance

[12:42:23] <Gabrys> but they have much more records for userId = 0 and categoryId = 0

[12:42:45] <Gabrys> and this is not time-skewed

[12:43:24] <NodeX> how many category iD's are there ?

[12:43:39] <NodeX> and do these divide into your shards fairly equaly?

[12:43:39] <Gabrys> 25k

[12:43:55] <Derick> Gabrys: country id first doesn't sound like a bad idea

[12:44:04] <NodeX> yer, was just gonna suggest that

[12:44:08] <Derick> unless you think you'd need more than 50 shards or so :)

[12:44:16] <Gabrys> the problem is categoryId = 0 and userId = 0 are aggregated values

[12:44:28] <Gabrys> so for given userId > 0, there are only a few categoryIds

[12:44:36] <Gabrys> but for userId = 0 there are 25k categoryIds

[12:44:41] <Gabrys> same thing for categoryId

[12:44:43] <Derick> Gabrys: is there a year or month item?

[12:45:01] <Gabrys> Derick, date is later in the index

[12:45:04] <NodeX> or week of the year

[12:45:08] <Derick> that doesn't matter

[12:45:22] <Derick> I'd suggest to create a "identifier" field, that looks like:

[12:45:30] <Gabrys> the full (unique) index is countryId, userId, categoryId, timeQuant

[12:45:34] <Derick> country_code_month_day or something like that

[12:45:43] <Gabrys> timeQuant is date as string like '2012-06-01'

[12:45:46] <Derick> the shard key doesn't have to be a real key

[12:45:52] <Derick> you can make a new one up for it

[12:46:16] <Derick> f.e. "NL_06_01" (although I'd add userid to that too)

[12:46:17] <Gabrys> but every day, I import data for the previous day

[12:46:24] <Gabrys> so that would go to one shard

[12:46:30] <Derick> hence the country code

[12:46:40] <Gabrys> but the country id is always the same for now

[12:46:44] <Derick> ah :-)

[12:46:54] <Gabrys> yeah, sorry for not mentioning that before ;-)

[12:47:06] <Derick> you need a large-cardinality-even-distribution shard-key

[12:47:27] <NodeX> hour of the day ?

[12:47:29] <Derick> so lots of distinct values, that are evenly distributed

[12:47:34] <NodeX> that's relativley sparse

[12:47:44] <Derick> NodeX: should be good enough combined with others

[12:48:02] <NodeX> should get good read/write with that

[12:48:02] <Gabrys> I believe hash would be the best, unless you say otherwise

[12:48:16] <NodeX> what would your hash look like ?

[12:48:19] <Gabrys> I can generate that in file imported with mongoimport

[12:48:41] <Gabrys> could be simple crc, md5, sha1 or any other regular hash on row data

[12:49:07] <NodeX> that would be similar to using objectId as they would be almost unique

[12:49:17] <Gabrys> but not incremental

[12:49:32] <Derick> correct

[12:49:46] <Gabrys> I mean objectIds are similar for objects inserted at the same hour

[12:49:56] <Gabrys> hashes would be distributed evenly

[12:50:00] <Derick> but it'd be harder to figure out which data lives on which shard :)

[12:50:09] <Gabrys> that's right

[12:50:33] <Gabrys> any reason to bother about that?

[12:50:38] <NodeX> are you not worried about read performance?

[12:51:00] <phira> hashing only meets the criteria if all the data is unique

[12:51:06] <NodeX> is this purley for writes is what I'm asking

[12:51:09] <phira> if you have multiple copies of the same data, it will generate the same hash

[12:51:16] <phira> this can lead to buckets overflowing.

[12:51:21] <Gabrys> right, that sucks

[12:51:38] <Gabrys> ok, I'll try with the countryId, userId, categoryId sharding key

[12:52:00] <NodeX> are you indexing on import aswelll ?

[12:52:09] <Derick> phira: only in the case of an exact full row... you can however add a timestaminto the hash to fix that

[12:53:13] <phira> this is ProbablyTrue(™), in that a cryptographically secure hash should generate results that are roughly indistinguishable from a random output in that case. Non-crypto hashes can give you unbalanced results in that case.

[12:53:18] <phira> this CRC need not apply.

[12:53:31] <phira> this/thus

[12:55:30] <Derick> http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633 is a great post on this btw

[13:08:57] <_ollie> hi all, anyone aware of details of the $and operator?

[13:09:55] <_ollie> different documentation pages imply different semantics unfortunately…

[13:10:07] <_ollie> http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24and

[13:10:11] <_ollie> http://docs.mongodb.org/manual/reference/operators/

[13:11:00] <_ollie> the first argues with multiple constraints on a single key where the latter implies you can also use it for different keys… esp. for the latter I wonder what the difference compared to a plain { foo : "bar", bar : "foo" } is…

[13:37:03] <FerchoLP> has anyone run mongodb on Damn Small Linux?

[13:37:55] <kali> does it make sense ?

[13:38:31] <guest232131> has anyone asked questions without value information?

[13:38:36] <Gabrys> Derick, NodeX thank you guys, countryId, userId, categoryId seems to distribute nicely

[13:38:53] <Gabrys> I had the timeQuant added before, that was wrong I suppose

[13:39:28] <Gabrys> gtg now, see you and thanks again

[13:39:44] <NodeX> good luck

[13:42:24] <Derick> np

[14:05:57] <souza> Hello guys!@

[14:08:05] <souza> Someone have manipulated dates in C, and mongoDB? i saw in api that exists one type def "bson_date_t", but i didn't find any more documentation about, them if you know some link that talks about, or know about my question, i'll be very happy, and solve my problem! :)

[14:09:24] <kali> souza: as far as i remember, it's a int64, number of millisec since epoch...

[14:10:17] <souza> Humm, and i can set this value using a simple attribuition? like "bson_date_t date = 56468418465;"

[14:11:03] <kali> souza: http://api.mongodb.org/c/current/api/bson_8h_source.html look at line 151

[14:13:00] <souza> kali: humm, well it looks good, but i'm a newbie in C, do you know how can i set this value?

[14:15:43] <pchaussalet> Hi everybody

[14:16:17] <NodeX> good morning

[14:17:18] <pchaussalet> Is it possible to rename a database just with a rename of its datafile ?

[14:18:03] <pchaussalet> (I know it's not a recommended way to do it... :) )

[14:19:21] <NodeX> I dont think it is nor is it wise

[14:19:37] <NodeX> the indexes are bound with namespaces which are linked to database name iirc

[14:20:10] <NodeX> so for 1 it will render your indexes useless

[14:20:28] <guest232131> clone the database and drop the old one

[14:36:42] <pchaussalet> ok, that's what I thought... thanks !

[14:40:33] <guest232131> that's what google would have told you

[14:41:36] <pchaussalet> I didn't found it explicitely, I hoped it could be some hidden trick... :)

[15:17:48] <airportyh> Hello there, can anyone help me in working with binary data in python?

[15:18:06] <airportyh> seems like the class to use is bson.binary.Binary

[15:18:13] <guest232131> nobody can help you with unasked questions

[15:18:48] <airportyh> but that class looks like inherits str and is immutable

[15:19:12] <airportyh> Is that a correct accessment?

[15:19:32] <airportyh> if I want to modify binary data, does that mean I have to create a new blob entirely?

[15:19:59] <guest232131> how else?

[15:21:10] <airportyh> well, it would be more memory efficient if I could modify the buffer contents directly

[15:21:23] <airportyh> think large buffers

[15:21:42] <guest232131> what "would" be is of zero interested, of interest is what is actually imlemented

[17:01:04] <augustl> seeing some weird behaviour with the mongodb driver for Node.js. A call to db.open() causes the whole process to terminate, after "undefined" is logged

[17:01:15] <augustl> what could cause that?

[17:12:37] <anlek> Hello everyone. I'm trying to export a few documents from a collection in one DB into a second DB. Any ideas on what the best way to do this is?

[17:15:03] <guest232131> find() + save()

[17:39:07] <multiHYP> hi

[17:40:02] <multiHYP> is there a limit to a row's size in a collection? I have an array of objects in each row, this array may have 3 - 300 elements.

[17:48:17] <linsys> there is a limit to the size of a document its 16mb

[17:49:34] <multiHYP> oh ok that won't be reached by not even 1000s of my objects in the array.

[17:49:39] <multiHYP> thanks linsys

[18:31:03] <FerchoDB> hi. in the command journalLatencyTest, what does WithPauses mean on "timeMillis.8KBWithPauses" ?

[18:37:53] <vadi> How to do a query as the below one in PHP?

[18:38:33] <vadi> db.table.find({"name.en_US":"my string"})

[18:39:06] <vadi> That search works if executed via mongo shell, but doing the same in PHP does not find anything.

[18:40:24] <Derick> $db->collection->find( array ( 'name.enUS' => 'my string' ) );

[18:41:00] <Derick> plus the _

[18:41:02] <vadi> I think that is the one I am using. I am going to hard code it, to check again. thanks.

[18:45:08] <vadi> It does not work. Maybe the PHP Mongo driver does not support "property.property" in the search terms.

[18:51:21] <giessel> Can anyone tell me where in the source the 16mb document limit is enforced?

[18:51:47] <giessel> and, related, does anything else in the code depend on this limit?

[18:52:07] <vadi> Derick, It works. The problem is the rest to the PHP code. Thanks.

[18:52:53] <giessel> Deep/annoying questions, to be sure.

[18:58:00] <giessel> maybe it is driver specific?

[19:21:32] <dgottlieb> giessel: 16MB is definitely enforced on the server side. The constant can be configured at the compilation stage.

[19:23:07] <giessel> dgottlieb: awesome. ya i'm looking at the pymongo driver and it has a constant- but as written it is still at 4mb, which is obviously old

[19:23:09] <dgottlieb> giessel: as for drivers, I believe the standard is supposed to be that the driver sends a maxBsonSize query to a server after connecting, but I believe not all drivers are written to that spec

[19:23:38] <giessel> ya- there is a class property which i've changed, but i think it is over written at some point in the connection init

[19:23:56] <giessel> it def says 16mb on the error, even tho the constant is defined at 4mb

[19:24:00] <giessel> i will look at the config flag

[19:24:17] <giessel> we're working with large scientific data sets

[19:24:22] <giessel> 3d or 4d

[19:24:44] <giessel> breaking them up into smaller arrays could be done, but would be annoying computationall to rebuild them to do calculations

[19:25:03] <dgottlieb> you can try grepping for an isMaster command on connection init

[19:25:16] <dgottlieb> > db.isMaster(): { "ismaster" : true, "maxBsonObjectSize" : 16777216, "ok" : 1 }

[19:25:59] <giessel> i've written a workaround that checks for size limits and puts things in a gridfs instead, swapping out the entries in the doc for gridfs refs

[19:26:04] <giessel> but this is fragile and annoying, too

[19:27:29] <dgottlieb> i believe* you can try recompiling mongod with a bigger bson size and it sounds like the pymongo driver would pick up on the change

[19:27:54] <dgottlieb> if everything is coded correctly, there should be no weird dependencies :)

[19:28:01] <giessel> that's exactly what i'm going to try. HAH let's hope, huh?

[19:28:43] <giessel> i'll bbl. thanks for your help!

[20:14:33] <tystr> hmm

[20:14:57] <tystr> I keep getting this error on one of the secondaries I'm trying to add to a replica set:

[20:14:58] <tystr> [rsStart] replSet can't get local.system.replset config from self or any seed (EMPTYCONFIG)

[20:15:49] <tystr> rs.add('hostname'); echos this: { "down" : [ "hostname:27017" ], "ok" : 1 }

[20:18:33] <tystr> hmm nvm…seems an issue with aws, sry

[20:18:48] <tystr> indeed that host is unreachable for some reason…..

[20:42:32] <ismell> ello

[20:42:50] <ranman> hello

[20:43:09] <ismell> we are using the jruby gem and getting the following exception which core dumps tomcat: http://pastebin.com/JTkXVgKh

[20:43:28] <ismell> is this a known issue ?

[20:43:48] <ron> mongo 1.6.2?

[20:44:02] <ismell> yeah

[20:44:07] <ron> wow.

[20:44:21] <ismell> well the gem is

[20:53:55] <Swimming_Bird> i'm trying to do fulltext search on an people's names and emails, is regex the only way to do it?

[20:54:47] <algernon> Swimming_Bird: use something like solr or elasticsearch for that

[20:54:52] <algernon> (along with mongo)

[20:55:10] <Swimming_Bird> ugh, was hoping to not add a 4th db to my stack :P

[20:55:15] <algernon> other than that, regexp it is, and that's going to be slow.

[20:57:13] <Swimming_Bird> i couldn't find a startswith, with btree intexes that usually is pretty efficient

[21:10:25] <dstorrs> when I do => db.currentOp() I see a list of ops going on. Are these all current operations, or is this a history?

[21:18:11] <dstorrs> assuming it's current ops, which seems supported by docs and op name. I was just hoping it wasn't.

[21:18:37] <ron> Derick and mstearn are ops.

[21:19:14] <Derick> ron: he meant database ops not IRC ops :-)

[21:19:19] <dstorrs> ron: I was asking about the db.currentOp() command. :>

[21:19:23] <ron> I know ;)

[21:19:28] <Derick> silly :-þ

[21:19:53] <dstorrs> next question -- I've got queries in the currentOp output that say "query not recording (too large)". What does this mean in practical terms? presumably I've hit an unindexed query, or something...?

[21:20:56] <Derick> hmm... that I don't know

[21:21:10] <dstorrs> according to the bug in JIRA, there's no way to know which query this is "unless it happens to appear in the log". What should I be looking for?

[21:25:18] <dstorrs> Derick: I've got a pair of map/reduce jobs running over a very large collection. they are taking way too long. If I kill the server, am I putting my data at risk?

[21:25:43] <dstorrs> I have journalling enabled, so I THINK I'm safe, but I'd like to check

[21:25:57] <dstorrs> v2.0.2

[21:26:17] <Derick> db.killOp() is better

[21:27:12] <dstorrs> ok. I've killed the clients that started the M/Rs but apparently that doesn't terminate the job...?

[21:27:30] <Derick> right

[21:31:29] <dstorrs> Derick: any way to do this faster than one killOp() after another? there's 40 or 50 of these.

[21:32:00] <Derick> to kill all

[21:32:10] <Derick> try: db.$cmd.sys.killop.find()

[21:32:16] <Derick> just a hunch, not sure whether that works :-)

[21:32:55] <Derick> well, db.killOp() does this:

[21:33:02] <Derick> function (op) {

[21:33:02] <Derick> if (!op) {

[21:33:02] <Derick> throw "no opNum to kill specified";

[21:33:02] <Derick> }

[21:33:02] <Derick> return db.$cmd.sys.killop.findOne({op:op});

[21:33:04] <Derick> }

[21:33:05] <dstorrs> When the 10gen guy says "not sure that works but try it", it makes me nervous.... :P

[21:33:06] <Derick> I am extrapolating

[21:33:22] <Derick> well, in the worst case it doesn't work and you have to do them by hand :)

[21:33:29] <dstorrs> ok.

[21:33:36] <locojay1> hi anyone running mongo-hadoop-streaming on cdh3u3 or cdh3u4? keeep on getting java.io.IOException: Cannot run program "mapper.py": error=2, No such file or directory

[21:33:57] <locojay1> regular python streaming works fine

[21:35:30] <dstorrs> stupid question -- is that a literal $cmd ?

[21:35:38] <halcyon918> hey folks, I've been digging around the mongo docs and not seeing an answer… I'd like to see if I can connect to a remote replica set from the CLI to test it out (our DBAs set us up a replica set and I'd like to test it before I code into it). do I need a particular CLI command to do that?

[21:35:48] <Derick> dstorrs: yes

[21:35:54] <dstorrs> locojay1: not I, sorry. (didn't want to leave you Warnocked)

[21:37:09] <dstorrs> Derick: no joy. :< "no opID specified"

[21:37:14] <locojay1> dstorrs: thanks, no probs

[21:37:21] <Derick> dstorrs: fair enough :-)

[21:37:43] <dstorrs> are the op ids stored in a collection somewhere? should be possible to forEach over them, right?

[21:38:03] <Derick> they're not in a collection I think, but db.currentOp() returns an array

[21:39:47] <landongn> Hey all

[21:39:58] <landongn> Anyone have a situation in MMS where your cursor counts just skyrocket and stay there?

[21:41:31] <landongn> Trying to figure out what a baseline is for cursor counts on a slightly busy mongo replica set

[21:41:47] <landongn> can't find anything on the docs that talk about cursors and why there would be dangling (more than 10m) cursors

[21:41:50] <dstorrs> ahhhh.... much better.

[21:42:07] <dstorrs> Derick: ops.inprog.forEach(function(o) { db.killOp(o.opid) }) Worked a charm. Thanks.

[21:42:25] <Derick> awesome, we should put that in the docs :-)

[21:42:43] <dstorrs> (er...there's an implicit initial ops = db.currentOp() in there )

[21:42:46] <dstorrs> I'll go add it.

[21:43:00] <dstorrs> Maybe it could make it into the next release of the shell as db.killAllOps() ?

[21:51:33] <dstorrs> Derick: in JIRA, should shell related items be submitted under 'Core Server'? I don't see a 'shell' category.

[21:52:00] <Derick> "documentation" perhaps?

[21:52:30] <kchodorow_> there should be a shell component

[21:52:35] <kchodorow_> (in core server)

[21:52:46] <Derick> it's a doc thing though

[21:53:17] <Derick> https://jira.mongodb.org/browse/DOCS but I think it's 10gen private

[21:55:38] <kchodorow_> it would be cool if we had some shell tool packaging system to "install" js files that weren't built-in

[21:56:44] <tychoish> Derick: the DOCS project is public

[21:57:01] <tychoish> and shell improvements should be filed in SERVER

[22:08:56] <ukd1> Hey guys, I'm doing some ruby and have been using the mongo driver directly, I'd like to use either mongoid or mongomapper - is there any clear reason to use one over the other?

[22:50:48] <syphon7> does anyone know if FindAndModify will create a document if it doesn't exist?

[22:53:48] <giessel> dgottlieb: hello again- finally back at a computer. do you have a reference for setting the max bson document size at compile time?

[22:54:01] <giessel> dgottlieb: i've looked in the Sconstruct file and didn't see any options...

[22:58:33] <dgottlieb> giessel: Looking into things a bit deeper, I believe I was mistaken when I said it was "configurable", but it seems all that's necessary is changing the constant in the source here: https://github.com/mongodb/mongo/blob/master/src/mongo/bson/util/builder.h

[22:59:59] <dgottlieb> another bit of "evidence": https://github.com/mongodb/mongo/commit/b357c3ea89ef9374dd775c326f75d404bebe7f68

[23:00:42] <giessel> dgottlieb: awesome!!!

[23:00:54] <giessel> dgottlieb: thank you for finding that for me, that's really super kind

[23:01:28] <giessel> let it be known that today, someone got EXACTLY the answer they were looking for when asking on irc ;)

[23:02:40] <dgottlieb> No worries. the max bson size I've heard about being change-able and all that at compile time but I never had a need to try it myself. I'm curious to hear if everything works as expected or if maybe some other limitations pop up.

[23:05:07] <dgottlieb> syphon7: I'm pretty sure it does not, but you can pass in an upsert: true option

[23:05:08] <dgottlieb> http://www.mongodb.org/display/DOCS/findAndModify+Command

[23:05:31] <giessel> i wonder how hard it would be to to add it to the conf

[23:05:43] <giessel> idk much about scons tho

[23:06:45] <augustl> for the record, emptying all collections between each test is much faster than deleting the db between each test ;) </randomobservation>

[23:08:57] <dgottlieb> I doubt it would be hard to add to add as a compile option, I think it may just be more of an issue where scary things may happen if say, a replica set or sharded cluster has mongod's with mixed max bson sizes

[23:10:18] <dgottlieb> augustl: dropping a DB actually deletes files on the filesystem so the speed of that depends on which filesystem is being used. Dropping a collection does not reclaim any space on disk, but does have the advantage of being fast regardless of filesystem choice

[23:11:06] <augustl> dgottlieb: I noticed, it created 3 files for each test, one 16mb one 64mb and one 128mb, that will take a little bit of time on most systems I guess :)

[23:11:34] <augustl> it seems to write zeroes to the entire file too, so it's not a sparse file

[23:12:40] <syphon7> cool, thanks dgottlieb

[23:14:30] <dgottlieb> augustl: yes all new database files get pre-filled with 0s so (once again depending on the filesystem choice) it can take some time for creation

[23:14:40] <dgottlieb> syphon7: np

[23:27:29] <doug> is there a way to suppress value output of the CLI?

[23:27:44] <doug> if i type "f = function() {}" at the > prompt, I get f = function() {} as output

[23:27:48] <doug> i'd like to suppress that

[23:37:23] <giessel> dgottlieb: ya i don't even know how to think about that

[23:38:04] <giessel> dgottlieb: i'm running into some compile issues because i think the large size i changed it to makes some comparisons/assertions fail due to int type issues

[23:43:52] <giessel> dgottlieb: or maybe i just need to change "UncommittedBytesLimit" as well

[23:57:19] <landongn> anyone here help me understand why i've got 240+ active cursors open at all times? it's only one primary on one replica set that just spikes and just sits there.

[23:57:44] <landongn> i can't seem to find any docs on it- and the db.currentOp() detail isn't all that helpful- there's nowhere near 240 active processes

Log file Viewer

Help | Karma | Search:

#mongodb logs for Tuesday the 26th of June, 2012