pmxbot IRC Log Viewer

[00:06:09] <Godslastering> do mapreduce jobs run, in any way, concurrently? i.e. - will running mapreduce take advantage of multiple available cores?

[00:55:30] <dstorrs> I have a sharded DB. there are no backups or replicas. I tried to create an index, it ate the entire CPU and locked the DB. I ^C'd the operation (which I've been able to do before). That didn't work. I tried to killOp the operation.

[00:55:59] <dstorrs> then I tried to service mongoX stop my servers. one shard went down easily, the other took a while then said FAILED. it now has a lock file hanging around.

[00:56:13] <dstorrs> I'm reluctant to start up again until I know where I am.

[00:56:27] <dstorrs> Log files are available. Can anyone help?

[01:01:25] <dstorrs> help?

[01:01:28] <dstorrs> anyone?

[01:03:43] <nofxx> anyone got a replset can issue `mongotop` so I can sleep in peace today that my db is ok? local.oplog.rs 1945ms 1945ms 0ms

[01:03:55] <nofxx> knowing that my db*

[01:04:16] <nofxx> dstorrs, only one way to know it hehe... do you have real data there?

[01:04:34] <nofxx> dstorrs, make some cp -r copies if so, trial and error can't hurt

[01:04:53] <dstorrs> nofxx: yes. We have a boatload of data.

[01:05:12] <dstorrs> And the disk on shard 1 is just over 50% full, so I can't really copy it.

[01:05:30] <dstorrs> we were due to get another shard in within the next couple of weeks.

[01:05:36] <nofxx> dstorrs, tar czvf + scp to you

[01:06:51] <dstorrs> I was hoping for something more immdiate. We're talking ~500G of data per box.

[01:07:02] <dstorrs> It would take 6 hours to back up.

[01:07:30] <dstorrs> I need to get some data out of the system for a customer ASAP. That's why I killed this index.

[01:10:29] <nofxx> dstorrs, wow... that's a good load. Grab all the info you have on a pastie, post here and in the mailist

[01:13:10] <dstorrs> coming

[01:14:27] <dstorrs> http://pastie.org/4381057

[01:14:33] <dstorrs> nofxx: ^^

[01:14:53] <dstorrs> last 500 lines of the log on shard 2, the one with slow shutdown. shard 1 went down obediently

[01:36:59] <ukd1> does mongodb ensure ordering of arrays in document? I.e. if {$set: {x: [1,2,3,4,5]}} will always return x in the same order?

[02:20:48] <crudson> ukd1: Why would anything change the ordering of array values?

[07:36:04] <[AD]Turbo> hola

[10:09:14] <NodeX> gotta love this google PR update

[10:09:18] <NodeX> makes no sense at all

[11:14:09] <akaIDIOT> I'm trying to express something that won'

[11:14:15] <akaIDIOT> t work in a single query

[11:14:45] <akaIDIOT> so I need another roundtrip, but for the 'inner' query i only need to retrieve something to restrict the result set for the second part

[11:15:26] <akaIDIOT> can i tell a query (from Java) to give me just the objids and tell the second query to restrict itself to the objids

[11:15:49] <akaIDIOT> the latter could just be a $nin: <objids>

[11:16:13] <akaIDIOT> or should i just iterate it and cache all the objids manually

[11:17:22] <NodeX> you can bring back the documetns you want to parse and use $in

[11:17:56] <NodeX> since _id's are always indexed it would make it a fast(er) round trip

[11:23:22] <akaIDIOT> NodeX: thanks :)

[11:23:28] <akaIDIOT> i guess i'll do that

[11:27:54] <NodeX> ;)

[11:31:42] <akaIDIOT> the $in operator does get very very slow if you feed a long list of things right? :P

[11:31:59] <akaIDIOT> which would possibly put me in a corner, but hey :P

[11:32:24] <akaIDIOT> could define my data model better, something for iteration 2 :P

[11:33:55] <nemothekid> akaIDIOT: yes it does!

[11:34:51] <nemothekid> we've found coll.remove({$in => [1,2,3…,50]}) is much slower than for i in 50 coll.remove(i)

[11:35:13] <akaIDIOT> my data is a tree

[11:35:42] <akaIDIOT> i've currently just saved all the ids of parents, chidlren, ancestors and descendants as list properties

[11:35:51] <akaIDIOT> to just make it easy for now

[11:36:04] <akaIDIOT> should convert that to something like (pre, post, depth)

[11:36:24] <akaIDIOT> thought that makes reading and loading the data quite more complex

[11:37:17] <akaIDIOT> also that will only allow me to efficiently get the descendants of a single node

[11:37:22] <akaIDIOT> not a whole bunch of them

[12:16:25] <solars> hey, does anyone know if there is something like mongoid in ruby - for java?

[12:16:36] <solars> not just a driver, but an orm like thing?

[12:22:57] <kali> solars: there is an experimental support in hibernate, and DAO helpers in spring

[12:23:07] <kali> solars: and probably a dozen other :)

[12:23:25] <kali> solars: but these are the two i've heard of

[12:25:52] <akaIDIOT> that query language rocks

[12:26:16] <akaIDIOT> managed to translate an SQL generator to a DBObject generator in Java in just a few hours

[12:26:30] <akaIDIOT> and it didnt even crash the first friggin time i ran it

[12:26:56] <akaIDIOT> ♥ to the team that designed a Java binding without ending up with >>>> generics anywhere :P

[12:27:32] <solars> kali, thanks!

[13:39:21] <souza> Hello guys!

[13:44:28] <souza> I'm trying to insert a big BSON in my mongodb, but I won't get this :( - It runs normally, but doesn't insert in database, this is my code: http://pastebin.com/azpxbgbS and if i try to print the BSON that will be inserted in MongoDB, I got this stack > http://pastebin.com/tB6uTnJU

[13:44:44] <MANCHUCK> how do i create a user that wil be able to access all databases ?

[13:47:27] <souza> MANCHUCK: i recommend you read this > http://www.mongodb.org/display/DOCS/Security+and+Authentication =)

[13:48:19] <MANCHUCK> i followed that however i still need to run the command on the database i basically want to create a new root user

[13:49:28] <NodeX> you need to login to the admin database first

[13:50:39] <MANCHUCK> ahh thats is what i was missing

[13:50:41] <MANCHUCK> ok thanks

[13:56:27] <souza> anyone knows something that could help me!? =)

[15:08:33] <simon___> Hiya!

[15:08:48] <simon___> Debian 6.0 vs Ubuntu 12.04 for mongodb server? Whats your thoughts?

[15:09:41] <simon___> Ive been using 10.04 for a long time without any big issues. But thinking about how much more problems Ive had with ubuntu on the desktop than with Debian Im thinking about moving over to Debian.

[15:10:20] <simon___> Reason I switched on desktop Is because out of date desktop packages. On a server I really could not care less about version of XFCE etc.. :)

[15:10:54] <simon___> I.e using Debian has always been a more stable experience for me when it comes to desktop, is there any real difference on the servers?

[15:11:02] <simon___> (not used Debian on servers since years ago)

[15:12:06] <vsmatck> Ubuntu is debian unstable with few changes. So I just weigh it as debian stable vs debian unstable (and some people like to run testing, unstable, or mixed).

[15:13:29] <vsmatck> With mongodb the important consideration is mainly the filesystem.

[15:14:16] <simon___> Ah ok. Thanks.

[15:14:18] <simon___> Using XFS.

[15:16:47] <TubaraoSardinha> Any idea on why mongo won't save dates on local time using ruby driver?

[15:19:38] <kali> TubaraoSardinha: mongodb dates are stored as offset from 1970-01-01T00:00:00 utc, without timezone information

[15:21:34] <TubaraoSardinha> kali: I was expecting this, but the thing is that I need to store consolidated user actions per day and my fear is that some user actions might end up on the wrong day due to timezone differences.

[15:22:57] <TubaraoSardinha> I imagined something like actions: {03082012: 5, 04082012: 3} with the key representing the day

[15:23:09] <TubaraoSardinha> Maybe my logic is wrong

[15:32:08] <kali> TubaraoSardinha: i would not use mongodb dates for that, because there are actualy timestamps

[16:13:57] <jordanorelli> is there a tool out there that will let me specify a hash/dict/object with collection names for keys and query documents for values, that will dump all relevant documents?

[16:14:19] <jordanorelli> that is, for doing something like getting all the data for a particular week, to extract a portion of my data to make a testing database.

[16:23:11] <Godslastering> i'm attempting to do something like this ( http://paste.pound-python.org/show/ckzGbjCZHg3qOy5ixFEI/ ) in python using pymongo. this is running pretty slow to iterate over so many rows, and i'm wondering if i can do some custom advanced query in mongodb to speed it up. Basically, i want to peer into multiple collections with one query

[17:06:31] <nofxx> anyone got a replset can run `mongotop` on the master? I'm in doubt it's normal: local.oplog.rs 1094ms 1094ms 0ms

[17:30:55] <Godslastering> can anyone possibly help explain to me how the heck this is happening? http://paste.pound-python.org/show/uvXZxFrwxkdMvoFauLHk/

[18:00:30] <Godslastering> this is uh .... extremely annoying, does anyone have any clue how in the world this could happen? http://paste.pound-python.org/show/uvXZxFrwxkdMvoFauLHk/

[18:01:15] <chubz> Is there a way to start a replica set with all its members all at once?

[18:03:58] <mpobrien> @godslastering what does the resulting doc have in it?

[18:05:00] <Godslastering> mpobrien: {u'name':'bob',u'hostname':'bob.com'} it doesn't have u'ip' .. i'm just wondering how in the world mongodb is giving me the document if it doesn't have u'ip'. my query is wrong somehow, i'm guessing

[18:05:41] <mpobrien> well

[18:05:54] <mpobrien> i think the issue is

[18:06:10] <mpobrien> $nin : [None] basically means, find documents where ip isn't null

[18:06:19] <mpobrien> null is not necessarily the same thing as "nonexistent"

[18:06:56] <Godslastering> mpobrien: here is what i want "every document where 'ip' is (1) not null and (2) not equal to the string '__unresolved__'

[18:07:50] <mpobrien> ok

[18:08:02] <mpobrien> then in that case you may want to try adding an $exists

[18:08:19] <mpobrien> to the query

[18:08:24] <Godslastering> mpobrien: so, {'$exists':'ip'}?

[18:09:04] <mpobrien> {ip: {"$exists":True, "$nin": [ … ]} }

[18:09:22] <mpobrien> or simpler yet, just filter them out on the application side. if not 'ip' in entry: continue

[18:09:39] <Godslastering> mpobrien: that's what i was doing, and it was ... err. slow. i'll try that line you just said

[18:09:47] <mpobrien> ok

[18:10:23] <mpobrien> it shouldn't really be too bad unless there are a LOT of those docs that are missing the ip field; is that the case?

[18:10:40] <Godslastering> mpobrien: yes. about 3 million are missing the ip field, and about 12 million have the ip field

[18:11:46] <mpobrien> ok, that does make sense then

[18:16:15] <Godslastering> also, if i have about 10 pre-allocated files, and i delete a 2gb chunk of data, can i tell mongodb to re-arrange the data and remove empty pre-allocated files?

[18:32:48] <kevBook> Hey guys

[18:33:27] <kevBook> my document has { startDate: , endDate }

[18:33:54] <kevBook> i want to search for all documents that lie between today range

[18:34:02] <kevBook> any good ways to do that?

[18:34:13] <fabiobatalha> there is a way to check the BSON size?

[18:34:28] <fabiobatalha> of a document.

[18:35:07] <souza> fabiobatalha: what's the language?

[18:35:14] <fabiobatalha> pymongo

[18:35:18] <fabiobatalha> python

[18:35:42] <souza> fabiobatalha: good luck > http://api.mongodb.org/python/current/api/index.html

[18:44:35] <jgornick> Is there a way for me to monitor all queries on a database?

[18:44:49] <mpobrien> you can set profiling level to 2

[18:45:01] <mpobrien> and in the system.profile collection, all queries will be recorded.

[18:45:07] <jgornick> awesome!

[18:49:03] <Godslastering> mpobrien: http://paste.pound-python.org/show/uvXZxFrwxkdMvoFauLHk/ still having the same issue. i've tried the same query another place in my application and i'm getting the same error; this seems like a proper query, but it's acting wrongly

[18:53:36] <chubz> how come when i type rs.status().members.stateStr i get no output in mongo?

[18:53:38] <chubz> :(

[18:55:16] <mpobrien> @godslastering can you update that pastebin with your latest? it looks like the same thing to me

[18:55:24] <mpobrien> @chubz whats in rs.status()

[18:56:31] <Godslastering> mpobrien: http://paste.pound-python.org/show/unisqRfpCLy6ihqCbQpx/

[18:57:35] <chubz> mpobrien: rs.status() shows the members of mt replica set and their name,health, state, stateStr, uptime,optime etc

[18:58:16] <mpobrien> yeah, but i mean, whats the raw output of the function? are you sure you are getting the right key?

[18:58:34] <mpobrien> IIRC members is an array, you rpobably need something like rs.status().members[0].stateStr

[18:59:13] <jgornick> Is it possible to clear out db.system.profile?

[18:59:25] <jgornick> Also, is it possible to remove a collection from the db entirely?

[18:59:51] <jgornick> Ahh, just got it: db.setProfilingLevel(0); db.system.profile.drop();

[19:00:44] <chubz> mpobrien: oh goodie thanks!

[19:01:46] <chubz> mpobrien: is there a way to display more than one member's state? like instead of rs.status().members[0].state something like rs.status().members[0..3].state

[19:02:07] <mpobrien> you can just write a function in javascript to do it

[19:02:22] <chubz> okay

[19:02:23] <chubz> thanks

[19:02:30] <chubz> was just wondering if there was a functionf or it

[19:02:49] <mpobrien> var x = rs.status(); for(var i=0;i<x.members.length;i++){ print(i + ": " + x.stateStr) }

[19:02:53] <mpobrien> something like that

[19:04:11] <chubz> thanks, ill try that out

[19:08:50] <mpobrien> @godslastering what happens if you run that query from the shell, do those docs come up

[19:09:40] <Godslastering> mpobrien: actually what i pasted is wrong. i dont want those docs. i'm lost, i've gotta come back to this problem later or tomorrow.

[19:09:55] <mpobrien> ok

[19:12:14] <slavik_> is it possible to recursively search for a key=value pair and return the top level doc if it is found somewhere?

[19:16:20] <wereHamster> slavik_: do it manually, using where

[19:23:05] <Godslastering> mpobrien: i am having an issue though: viewing my mongod output log, i'm getting like 80% of my queries over 150ms ... it's running on a quad core 3.2GHz i7 ... these are pretty simple queries, and i'm quering on indexed data. is this normal?

[19:24:03] <mpobrien> hm

[19:24:24] <mpobrien> you should double check what the query plan is with .explain() just to be sure

[19:24:47] <mpobrien> also look in mongostat to see if theres page faults

[19:25:15] <Godslastering> mpobrien: ok i'm running mongostat. what do i do now? what am i looking for?

[19:25:25] <mpobrien> do you have a "faults" field in there?

[19:26:18] <Godslastering> mpobrien: no. i don't believe so.

[19:26:46] <mpobrien> FYI cpu is rarely the bottleneck for queries, its usually RAM or io

[19:27:22] <Godslastering> mpobrien: mongod is using 1gb ram, and i've got about 6gb of ram free on the system. how would i know if the bottleneck here was I/O?

[19:28:11] <mpobrien> try running iostat and monitor that

[19:28:38] <mpobrien> also in the log, what does it say when it reports a slow query? should have something like nscanned etc.

[19:28:59] <Godslastering> mpobrien: nscanned:2359481

[19:29:05] <Godslastering> on the last one i just saw fly by me

[19:29:27] <mpobrien> hmm

[19:29:33] <mpobrien> thats a lot

[19:29:57] <Godslastering> mpobrien: should it be lower? doesn't that mean that is how many entries are in that collection?

[19:30:18] <mpobrien> that means the number of entries that the DB needs to examine to find matches

[19:30:39] <Godslastering> mpobrien: hm... again, is that bad though? i did ensure an index on the fields i'm querying on

[19:30:52] <mpobrien> a high number like that would probably mean index isn't being used at all, or its not an effective one

[19:31:21] <Godslastering> mpobrien: appears not to be a disk bottleneck either. MB/s is only at about 0.25

[19:31:45] <Godslastering> lets see, it may be an index issue then.

[19:32:17] <mpobrien> try your queries with .explain() and check the output carefully - also using hte profiler might help

[19:34:09] <Godslastering> ok so i did a .explain(), mpobrien, and it's giving me this http://paste.pound-python.org/show/24581/

[19:34:17] <mpobrien> yeah

[19:34:21] <mpobrien> basiccursor means no index was used

[19:34:26] <mpobrien> table scan

[19:34:49] <Godslastering> mpobrien: hm, ok. that explains it. but if i do db.mycollection.ensureIndex({ip:1,hostname:1,name:1}), why wouldn't this work?

[19:35:03] <mpobrien> whats the query you're running?

[19:35:34] <Godslastering> mpobrien: db.mycoll.find({ip:'182.168.0.1'}) in the one i pasted

[19:36:26] <mpobrien> ok

[19:37:43] <mpobrien> so

[19:37:49] <mpobrien> whats in db.mycoll.getIndexes()

[19:38:21] <Godslastering> [{_id:1},{name:1,ip:1,hostname:1}]

[19:38:47] <mpobrien> hm, that index ordering doesn't match your ensureIndex ordering

[19:39:07] <chubz> how do i check my mongodb api version #?

[19:39:19] <mpobrien> @chubz which api?

[19:39:46] <chubz> mongodb api?

[19:39:59] <mpobrien> api for what though? a specific driver, etc

[19:40:09] <chubz> java

[19:40:33] <mpobrien> should be in the filename of the jar file in your project

[19:41:02] <chubz> oh durrr. sorry brain fart

[19:41:20] <Godslastering> mpobrien: alright, db.mycoll.getIndexKeys() is telling me exactly what i'd expect, but it's still using a basiccursor in .explain()

[19:42:26] <Godslastering> mpobrien: and attempting to force an index with db.mycoll.find({ip:'192.168.0.1'}).hint({ip:1}).explain() is telling me bad hint

[19:43:00] <mpobrien> what does getIndexKeys() say?

[19:43:35] <Godslastering> mpobrien: that's what i pasted earlier, the [{_id:1}] stuff

[19:44:10] <mpobrien> thats still incorrect though - the ordering matters

[19:44:25] <mpobrien> if you have an index on (name, ip, hostname)

[19:44:39] <mpobrien> you can query on (name) or (name, ip), or (name, ip, hostname) and it will use the index

[19:44:45] <Godslastering> mpobrien: ok i was confused at how indices work

[19:44:57] <Godslastering> mpobrien: i just did db.mycoll.ensureIndex({ip:1}) and now it's using a BTreeCursor

[19:45:03] <mpobrien> but you can't query on just ip, because its not the first field

[19:45:06] <mpobrien> ok, perfect

[19:45:08] <mpobrien> thats what you need

[19:45:13] <mpobrien> that should be much faster

[19:45:20] <Godslastering> mpobrien: before i was doing db.mycoll.ensureIndex({ip:1,hostname:1,name:1}) thinking it would create all of those

[19:45:25] <mpobrien> ohh

[19:45:31] <mpobrien> nah, you need to create each one individually

[19:45:36] <Godslastering> mpobrien: thanks, i see now :)

[19:45:46] <mpobrien> or if you are querying on multiple fields together, you can use that syntax to create a compound index

[19:45:54] <mpobrien> but doesnt seem like that applies to you here

[19:46:04] <mpobrien> {ip : 1} should be fine

[19:47:44] <Godslastering> mpobrien: wow, since i did that i haven't seen a single slow query show up yet

[19:53:51] <Godslastering> mpobrien: ok, since i was confused before, let me be sure i know what i'm doing here: i only have to do ensureIndex once, even if i insert a lot of data after this? will it rebuild the index?

[19:54:50] <pas256> Hello

[19:59:04] <kali> Godslastering: yes. index are maintained when you write, (not on the first read a la couchdb)

[19:59:49] <Godslastering> kali: alright, thanks. i was pretty confused with the documentation that i read

[20:01:25] <kali> Godslastering: that said, if you have a big chunk of data to load, it might be a good idea to insert and then to setup the index

[20:02:48] <Godslastering> kali: i already have the bulk of my data inserted, from now on it will be just like 10-20000 entries an hour, compared to the initial chunk. should i check to be sure the indices exist at the beginning of my script before i run to make sure i dont get bad performance?

[20:04:12] <kali> Godslastering: you don't have to do anything. if the index it there, it wont go away :)

[20:04:26] <Godslastering> kali: alright, good. thanks.

[20:05:01] <Godslastering> kali: real quickly, how can i get all documents where the key 'ip' is either null or doesn't exist?

[20:05:27] <Godslastering> i was doing db.mycoll.find({ip:none}) but someone said that may not work?

[20:05:39] <Godslastering> (null, not none)

[20:06:17] <kali> it think ip:null actualy works for both null and lack of value

[20:06:27] <kali> unless you define the index as sparse

[20:06:30] <Godslastering> ah, alright, good then.

[20:06:59] <Godslastering> and db.mycoll.find({ip:{$nin:[null,'__unresolved__']}}) was giving me some documents in which 'ip' wasn't even defined though

[20:08:34] <kali> Godslastering: http://uu.zoy.org/v/hifuno#lang=objectivec&style=solarized_light

[20:09:11] <kali> is it what you expect ?

[20:09:54] <Godslastering> kali: that's fine, yeah, but db.mycoll.find({ip:{$nin:[null,'__unresolved__']}}) is still giving me documents where ip might be null

[20:11:46] <kali> mmmm

[20:13:05] <Godslastering> kali: this is what's happening http://paste.pound-python.org/show/24582/

[20:13:38] <kali> Godslastering: http://uu.zoy.org/v/bizeri#lang=smalltalk&style=solarized_light this is working, but not nice

[20:14:09] <Godslastering> kali: oh that's nearly illegible! but it's working?

[20:17:23] <kali> Godslastering: i think you should consider a slight alteration of your scheme :)

[20:17:31] <Godslastering> kali: how's that?

[20:17:53] <kali> had "unresolved: true" instead of the magical value

[20:17:54] <Godslastering> kali: here's my situation: i want to find entries where ip is not null and it doesn't equal '__unresolved__

[20:18:05] <Godslastering> kali: oh silly me, that would make a lot of sense

[20:18:21] <kali> you're not silly, you're thinking SQL

[20:18:42] <Godslastering> kali: somehow, yeah, even though i hate SQL, i'm still thinking that way

[20:19:14] <kali> resolved: true might be even better

[20:19:55] <Godslastering> kali: will {unresolved:{$not:true}} give me where unresolved:false and where unresolved doesn't even exist?

[20:21:18] <kali> db.foo.find({ unresolved: { $nin: [true] }})

[20:21:38] <Godslastering> kali: thanks

[20:22:01] <kali> or db.foo.find({ unresolved: { $in: [false, null] }})

[20:22:14] <Godslastering> kali: i like $nin:[true] :p

[20:22:46] <kali> yeah, this is why i prefer resolved: true, but i think unresolved:true might be more efficient in term of space

[20:22:57] <kali> i expect you have more resolved ips than unresolved

[20:23:08] <Godslastering> kali: psh, i'd be lucky if that was the case >_>

[20:23:43] <kali> if you're dealing with a huge collection, consider picking short names, too

[20:24:08] <kali> a bit frustrating, but it can make a big difference

[20:24:13] <Godslastering> kali: can i rename later, easily, when i'm out of the 'development' phase?

[20:24:32] <kali> at the price of a data reload, basically

[20:24:52] <Godslastering> kali: hmm. i'm not using _long_ names per se, but not short either. i'll have to change them soon, though, i guess

[20:25:16] <kali> the sooner the better. you want to do this before going through testing and QA

[20:25:27] <kali> or whatever equivalent you have :)

[20:25:32] <kali> if you have some.

[20:25:47] <Godslastering> kali: luckily this is a personal project, but because it _is_ personal, lots of disk usage can get annoying

[20:35:15] <Godslastering> kali: if mongodb allocated 20 files when i had a lot of data, and then i remove like 4gb, can i tell it to get rid of unecessary pre-allocated files?

[20:37:52] <mpobrien> if you do a repairDatabase, it will re-write the datafiles with the minimum space needed

[20:38:11] <Godslastering> mpobrien: so, from the mongo client, db.mycoll.repairDatabase() ?

[20:38:34] <mpobrien> just db.repairDatabase()

[20:38:42] <Godslastering> mpobrien: ah, thanks. i'll give it a whirl

[20:38:45] <mpobrien> it will need to have extra disk space available to run

[20:39:00] <mpobrien> and it might take some time depending on how much data is in there

[20:39:21] <mpobrien> but once its finsihed, it should reclaim a good amount of space

[20:39:33] <Godslastering> mpobrien: ok, running it now.

[20:59:46] <Godslastering> if an insert takes a while, it's most likely disk I/O causing that, right?

[21:00:11] <mpobrien> could be allocating new datafile

[21:00:39] <Godslastering> mpobrien: yeah, but that's still disk i/o, right?

[21:00:43] <mpobrien> yeah

[21:00:54] <Godslastering> alright, so no easy optimization there

[21:01:03] <mpobrien> yeah, should be very rare though.

[21:01:15] <Godslastering> mpobrien: mhm i'm recreating my data set, so that makes sense

[21:01:20] <mpobrien> yup

[21:02:34] <pas256> is there a reason why MongoDB with a read only workload doing geospatial queries would not use all cores on a system?

[21:04:29] <kali> Godslastering: file allocation is fast if you use xfs or ext4

[21:04:54] <kali> Godslastering: and painfully slow with ext2 and ext3

[21:04:59] <Godslastering> kali: i'm using macosx, so ... i can't remember of the top of my head what i'm using

[21:05:13] <kali> hdfs

[21:05:36] <kali> or is it hfs ?

[21:06:47] <Godslastering> Mac OS X Extended (Journaled), kali .. heh

[21:07:50] <kali> this is hfs+

[21:14:13] <kali> Godslastering: as far as i can tell, there is no support for fallocate() on osx

[21:14:25] <kali> Godslastering: so fast file preallocation does not work

[21:18:47] <ojii> what's the best practice to get a random entry from a mongodb collection?>

[21:19:33] <mpobrien> if the collection isn't too huge, do a count() to get the size of hte collection

[21:19:52] <mpobrien> then .find({}).skip(<random number>).limit(1)

[21:20:03] <ojii> mpobrien, what's "too huge"?

[21:20:36] <mpobrien> well, depends on what acceptable performance is for you… the potential issue is that .skip() gets expensive if you have a lot of docs

[21:20:49] <mpobrien> so it works better on smaller collections

[21:21:12] <mpobrien> an alternative is, populate a random number into each doc when you insert it

[21:21:18] <mpobrien> create an index on it

[21:21:42] <mpobrien> then do .find( { rand_field : {$gt : <some random value>}}).limit(1)

[21:23:03] <ojii> mpobrien, doesn't sound too random though

[21:23:42] <mpobrien> it should be fine

[21:24:37] <mpobrien> you're basically just assigning a random value to each doc, and then choosing one based on another random value at runtime

[21:27:13] <Godslastering> kali: and because it's mac os x, itunes uses more ram than mongodb! >.>

[22:42:28] <manveru> is there any way to have remove() return the removed document(s)?

[22:50:34] <mpobrien> manveru: you can use findAndModify for that

[22:50:51] <mpobrien> it has a "remove" option which will return a document and remove it in one option

[22:50:58] <mpobrien> only works on a single doc at a time though.

[22:51:24] <manveru> awesome, that's good enough

[22:53:42] <nofxx> My server is almost idle, got some spare ram for mongo, 90% queries are < 100ms , but every once in a while a got some find by ids, going 300-700ms ...

[22:53:57] <nofxx> and this local.oplog.rs 1973ms 1973ms 0ms on mongotop.... things are normal? first trip sailor...

[22:54:25] <nofxx> on the replset master*

[22:55:25] <pas256> is there a reason why MongoDB with a read only workload doing geospatial queries would not use all cores on a system?

[22:55:41] <nofxx> that and mongoid that always but an $order, even in find by ids...: { $query: { _id: ObjectId('501c46c343d8a44c26000163') }, $orderby: { _id: 1 } } ... wondering if it could hit performance in some way

[22:56:05] <nofxx> pas256, iirc you need sharding to use multiple cores

[22:57:56] <nofxx> manveru, using mongoid? 3 has findAndModify working

[22:58:07] <pas256> nofxx: On a smaller box, it used all cores

[22:58:14] <manveru> nofxx: mgo

[22:58:17] <pas256> not that we have a bigger box, it is not using all of them

[22:58:27] <pas256> weird

[22:58:45] <Liquid-Silence> hi all

[22:59:05] <Liquid-Silence> its there a way to remove element from all documents in a collection?

[22:59:28] <jordanorelli> manveru: in mgo it's Find(…).Apply(). it might be fixed by now, but there's a bug on findAndModify if your session mode is set to eventual

[23:00:41] <Liquid-Silence> jordanorelli: any idea?

[23:01:16] <mpobrien> Liquid-Silence: use the $unset operator

[23:01:54] <Liquid-Silence> not to sure what you mean mate

[23:02:55] <mpobrien> db.collection.update( {}, {$unset: {fieldname : 1} }, false, true)

[23:03:10] <mpobrien> will remove a field from all docs in the collection.

[23:04:11] <Liquid-Silence> ok my fieldname is ResourceId

[23:04:18] <Liquid-Silence> can I do db.collection.update( {}, {$unset: {fieldname : ResourceId} }, false, true)

[23:04:19] <mpobrien> ok so

[23:04:23] <mpobrien> yeah

[23:04:27] <mpobrien> you will need to put ResourceId in quotes.

[23:04:38] <mpobrien> and use it as the key

[23:04:40] <mpobrien> not the value

[23:04:42] <mpobrien> like this:

[23:04:51] <mpobrien> can I do db.collection.update( {}, {$unset: {"ResourceId" : 1} }, false, true)

[23:05:30] <Liquid-Silence> hmm where do I execute this in mongo view

[23:05:38] <mpobrien> in your mongo shell

[23:16:46] <Liquid-Silence> collection being the collection I am using

[23:16:47] <Liquid-Silence> ?

[23:16:51] <mpobrien> yes

[23:20:55] <Liquid-Silence> that did not work

[23:20:57] <Liquid-Silence> I still see it

[23:22:11] <Liquid-Silence> ResourceId is in a document

[23:22:17] <Liquid-Silence> mpobrien:

[23:23:57] <mpobrien> whats your document look like

[23:23:59] <mpobrien> can you post a sample

[23:25:53] <jordanorelli> what's the normal way to see if a field exists on an object in mongo's javascript implementation?

[23:26:08] <mpobrien> in a query?

[23:26:10] <jordanorelli> e.g., i get a document, regardless of whether it has some key, and then i want to see, for that document that i've already retrieved, if i have that key.

[23:26:11] <mpobrien> or jsut in the shell?

[23:26:30] <jordanorelli> in the shell. it's really an $eval block, but same thing.

[23:26:50] <mpobrien> just fetch the document by its _id (or whatever) and check to see if the value for the key is undefined.

[23:27:09] <jordanorelli> word

[23:27:58] <mpobrien> e.g.: doc = db.collection.find({_id:'foo'); if( 'ResourceId' in doc ){ print ("it has the field") }

Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 3rd of August, 2012