PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Friday the 3rd of August, 2012

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:06:09] <Godslastering> do mapreduce jobs run, in any way, concurrently? i.e. - will running mapreduce take advantage of multiple available cores?
[00:55:30] <dstorrs> I have a sharded DB. there are no backups or replicas. I tried to create an index, it ate the entire CPU and locked the DB. I ^C'd the operation (which I've been able to do before). That didn't work. I tried to killOp the operation.
[00:55:59] <dstorrs> then I tried to service mongoX stop my servers. one shard went down easily, the other took a while then said FAILED. it now has a lock file hanging around.
[00:56:13] <dstorrs> I'm reluctant to start up again until I know where I am.
[00:56:27] <dstorrs> Log files are available. Can anyone help?
[01:01:25] <dstorrs> help?
[01:01:28] <dstorrs> anyone?
[01:03:43] <nofxx> anyone got a replset can issue `mongotop` so I can sleep in peace today that my db is ok? local.oplog.rs 1945ms 1945ms 0ms
[01:03:55] <nofxx> knowing that my db*
[01:04:16] <nofxx> dstorrs, only one way to know it hehe... do you have real data there?
[01:04:34] <nofxx> dstorrs, make some cp -r copies if so, trial and error can't hurt
[01:04:53] <dstorrs> nofxx: yes. We have a boatload of data.
[01:05:12] <dstorrs> And the disk on shard 1 is just over 50% full, so I can't really copy it.
[01:05:30] <dstorrs> we were due to get another shard in within the next couple of weeks.
[01:05:36] <nofxx> dstorrs, tar czvf + scp to you
[01:06:51] <dstorrs> I was hoping for something more immdiate. We're talking ~500G of data per box.
[01:07:02] <dstorrs> It would take 6 hours to back up.
[01:07:30] <dstorrs> I need to get some data out of the system for a customer ASAP. That's why I killed this index.
[01:10:29] <nofxx> dstorrs, wow... that's a good load. Grab all the info you have on a pastie, post here and in the mailist
[01:13:10] <dstorrs> coming
[01:14:27] <dstorrs> http://pastie.org/4381057
[01:14:33] <dstorrs> nofxx: ^^
[01:14:53] <dstorrs> last 500 lines of the log on shard 2, the one with slow shutdown. shard 1 went down obediently
[01:36:59] <ukd1> does mongodb ensure ordering of arrays in document? I.e. if {$set: {x: [1,2,3,4,5]}} will always return x in the same order?
[02:20:48] <crudson> ukd1: Why would anything change the ordering of array values?
[07:36:04] <[AD]Turbo> hola
[10:09:14] <NodeX> gotta love this google PR update
[10:09:18] <NodeX> makes no sense at all
[11:14:09] <akaIDIOT> I'm trying to express something that won'
[11:14:15] <akaIDIOT> t work in a single query
[11:14:45] <akaIDIOT> so I need another roundtrip, but for the 'inner' query i only need to retrieve something to restrict the result set for the second part
[11:15:26] <akaIDIOT> can i tell a query (from Java) to give me just the objids and tell the second query to restrict itself to the objids
[11:15:49] <akaIDIOT> the latter could just be a $nin: <objids>
[11:16:13] <akaIDIOT> or should i just iterate it and cache all the objids manually
[11:17:22] <NodeX> you can bring back the documetns you want to parse and use $in
[11:17:56] <NodeX> since _id's are always indexed it would make it a fast(er) round trip
[11:23:22] <akaIDIOT> NodeX: thanks :)
[11:23:28] <akaIDIOT> i guess i'll do that
[11:27:54] <NodeX> ;)
[11:31:42] <akaIDIOT> the $in operator does get very very slow if you feed a long list of things right? :P
[11:31:59] <akaIDIOT> which would possibly put me in a corner, but hey :P
[11:32:24] <akaIDIOT> could define my data model better, something for iteration 2 :P
[11:33:55] <nemothekid> akaIDIOT: yes it does!
[11:34:51] <nemothekid> we've found coll.remove({$in => [1,2,3…,50]}) is much slower than for i in 50 coll.remove(i)
[11:35:13] <akaIDIOT> my data is a tree
[11:35:42] <akaIDIOT> i've currently just saved all the ids of parents, chidlren, ancestors and descendants as list properties
[11:35:51] <akaIDIOT> to just make it easy for now
[11:36:04] <akaIDIOT> should convert that to something like (pre, post, depth)
[11:36:24] <akaIDIOT> thought that makes reading and loading the data quite more complex
[11:37:17] <akaIDIOT> also that will only allow me to efficiently get the descendants of a single node
[11:37:22] <akaIDIOT> not a whole bunch of them
[12:16:25] <solars> hey, does anyone know if there is something like mongoid in ruby - for java?
[12:16:36] <solars> not just a driver, but an orm like thing?
[12:22:57] <kali> solars: there is an experimental support in hibernate, and DAO helpers in spring
[12:23:07] <kali> solars: and probably a dozen other :)
[12:23:25] <kali> solars: but these are the two i've heard of
[12:25:52] <akaIDIOT> that query language rocks
[12:26:16] <akaIDIOT> managed to translate an SQL generator to a DBObject generator in Java in just a few hours
[12:26:30] <akaIDIOT> and it didnt even crash the first friggin time i ran it
[12:26:56] <akaIDIOT> ♥ to the team that designed a Java binding without ending up with >>>> generics anywhere :P
[12:27:32] <solars> kali, thanks!
[13:39:21] <souza> Hello guys!
[13:44:28] <souza> I'm trying to insert a big BSON in my mongodb, but I won't get this :( - It runs normally, but doesn't insert in database, this is my code: http://pastebin.com/azpxbgbS and if i try to print the BSON that will be inserted in MongoDB, I got this stack > http://pastebin.com/tB6uTnJU
[13:44:44] <MANCHUCK> how do i create a user that wil be able to access all databases ?
[13:47:27] <souza> MANCHUCK: i recommend you read this > http://www.mongodb.org/display/DOCS/Security+and+Authentication =)
[13:48:19] <MANCHUCK> i followed that however i still need to run the command on the database i basically want to create a new root user
[13:49:28] <NodeX> you need to login to the admin database first
[13:50:39] <MANCHUCK> ahh thats is what i was missing
[13:50:41] <MANCHUCK> ok thanks
[13:56:27] <souza> anyone knows something that could help me!? =)
[15:08:33] <simon___> Hiya!
[15:08:48] <simon___> Debian 6.0 vs Ubuntu 12.04 for mongodb server? Whats your thoughts?
[15:09:41] <simon___> Ive been using 10.04 for a long time without any big issues. But thinking about how much more problems Ive had with ubuntu on the desktop than with Debian Im thinking about moving over to Debian.
[15:10:20] <simon___> Reason I switched on desktop Is because out of date desktop packages. On a server I really could not care less about version of XFCE etc.. :)
[15:10:54] <simon___> I.e using Debian has always been a more stable experience for me when it comes to desktop, is there any real difference on the servers?
[15:11:02] <simon___> (not used Debian on servers since years ago)
[15:12:06] <vsmatck> Ubuntu is debian unstable with few changes. So I just weigh it as debian stable vs debian unstable (and some people like to run testing, unstable, or mixed).
[15:13:29] <vsmatck> With mongodb the important consideration is mainly the filesystem.
[15:14:16] <simon___> Ah ok. Thanks.
[15:14:18] <simon___> Using XFS.
[15:16:47] <TubaraoSardinha> Any idea on why mongo won't save dates on local time using ruby driver?
[15:19:38] <kali> TubaraoSardinha: mongodb dates are stored as offset from 1970-01-01T00:00:00 utc, without timezone information
[15:21:34] <TubaraoSardinha> kali: I was expecting this, but the thing is that I need to store consolidated user actions per day and my fear is that some user actions might end up on the wrong day due to timezone differences.
[15:22:57] <TubaraoSardinha> I imagined something like actions: {03082012: 5, 04082012: 3} with the key representing the day
[15:23:09] <TubaraoSardinha> Maybe my logic is wrong
[15:32:08] <kali> TubaraoSardinha: i would not use mongodb dates for that, because there are actualy timestamps
[16:13:57] <jordanorelli> is there a tool out there that will let me specify a hash/dict/object with collection names for keys and query documents for values, that will dump all relevant documents?
[16:14:19] <jordanorelli> that is, for doing something like getting all the data for a particular week, to extract a portion of my data to make a testing database.
[16:23:11] <Godslastering> i'm attempting to do something like this ( http://paste.pound-python.org/show/ckzGbjCZHg3qOy5ixFEI/ ) in python using pymongo. this is running pretty slow to iterate over so many rows, and i'm wondering if i can do some custom advanced query in mongodb to speed it up. Basically, i want to peer into multiple collections with one query
[17:06:31] <nofxx> anyone got a replset can run `mongotop` on the master? I'm in doubt it's normal: local.oplog.rs 1094ms 1094ms 0ms
[17:30:55] <Godslastering> can anyone possibly help explain to me how the heck this is happening? http://paste.pound-python.org/show/uvXZxFrwxkdMvoFauLHk/
[18:00:30] <Godslastering> this is uh .... extremely annoying, does anyone have any clue how in the world this could happen? http://paste.pound-python.org/show/uvXZxFrwxkdMvoFauLHk/
[18:01:15] <chubz> Is there a way to start a replica set with all its members all at once?
[18:03:58] <mpobrien> @godslastering what does the resulting doc have in it?
[18:05:00] <Godslastering> mpobrien: {u'name':'bob',u'hostname':'bob.com'} it doesn't have u'ip' .. i'm just wondering how in the world mongodb is giving me the document if it doesn't have u'ip'. my query is wrong somehow, i'm guessing
[18:05:41] <mpobrien> well
[18:05:54] <mpobrien> i think the issue is
[18:06:10] <mpobrien> $nin : [None] basically means, find documents where ip isn't null
[18:06:19] <mpobrien> null is not necessarily the same thing as "nonexistent"
[18:06:56] <Godslastering> mpobrien: here is what i want "every document where 'ip' is (1) not null and (2) not equal to the string '__unresolved__'
[18:07:50] <mpobrien> ok
[18:08:02] <mpobrien> then in that case you may want to try adding an $exists
[18:08:19] <mpobrien> to the query
[18:08:24] <Godslastering> mpobrien: so, {'$exists':'ip'}?
[18:09:04] <mpobrien> {ip: {"$exists":True, "$nin": [ … ]} }
[18:09:22] <mpobrien> or simpler yet, just filter them out on the application side. if not 'ip' in entry: continue
[18:09:39] <Godslastering> mpobrien: that's what i was doing, and it was ... err. slow. i'll try that line you just said
[18:09:47] <mpobrien> ok
[18:10:23] <mpobrien> it shouldn't really be too bad unless there are a LOT of those docs that are missing the ip field; is that the case?
[18:10:40] <Godslastering> mpobrien: yes. about 3 million are missing the ip field, and about 12 million have the ip field
[18:11:46] <mpobrien> ok, that does make sense then
[18:16:15] <Godslastering> also, if i have about 10 pre-allocated files, and i delete a 2gb chunk of data, can i tell mongodb to re-arrange the data and remove empty pre-allocated files?
[18:32:48] <kevBook> Hey guys
[18:33:27] <kevBook> my document has { startDate: , endDate }
[18:33:54] <kevBook> i want to search for all documents that lie between today range
[18:34:02] <kevBook> any good ways to do that?
[18:34:13] <fabiobatalha> there is a way to check the BSON size?
[18:34:28] <fabiobatalha> of a document.
[18:35:07] <souza> fabiobatalha: what's the language?
[18:35:14] <fabiobatalha> pymongo
[18:35:18] <fabiobatalha> python
[18:35:42] <souza> fabiobatalha: good luck > http://api.mongodb.org/python/current/api/index.html
[18:44:35] <jgornick> Is there a way for me to monitor all queries on a database?
[18:44:49] <mpobrien> you can set profiling level to 2
[18:45:01] <mpobrien> and in the system.profile collection, all queries will be recorded.
[18:45:07] <jgornick> awesome!
[18:49:03] <Godslastering> mpobrien: http://paste.pound-python.org/show/uvXZxFrwxkdMvoFauLHk/ still having the same issue. i've tried the same query another place in my application and i'm getting the same error; this seems like a proper query, but it's acting wrongly
[18:53:36] <chubz> how come when i type rs.status().members.stateStr i get no output in mongo?
[18:53:38] <chubz> :(
[18:55:16] <mpobrien> @godslastering can you update that pastebin with your latest? it looks like the same thing to me
[18:55:24] <mpobrien> @chubz whats in rs.status()
[18:56:31] <Godslastering> mpobrien: http://paste.pound-python.org/show/unisqRfpCLy6ihqCbQpx/
[18:57:35] <chubz> mpobrien: rs.status() shows the members of mt replica set and their name,health, state, stateStr, uptime,optime etc
[18:58:16] <mpobrien> yeah, but i mean, whats the raw output of the function? are you sure you are getting the right key?
[18:58:34] <mpobrien> IIRC members is an array, you rpobably need something like rs.status().members[0].stateStr
[18:59:13] <jgornick> Is it possible to clear out db.system.profile?
[18:59:25] <jgornick> Also, is it possible to remove a collection from the db entirely?
[18:59:51] <jgornick> Ahh, just got it: db.setProfilingLevel(0); db.system.profile.drop();
[19:00:44] <chubz> mpobrien: oh goodie thanks!
[19:01:46] <chubz> mpobrien: is there a way to display more than one member's state? like instead of rs.status().members[0].state something like rs.status().members[0..3].state
[19:02:07] <mpobrien> you can just write a function in javascript to do it
[19:02:22] <chubz> okay
[19:02:23] <chubz> thanks
[19:02:30] <chubz> was just wondering if there was a functionf or it
[19:02:49] <mpobrien> var x = rs.status(); for(var i=0;i<x.members.length;i++){ print(i + ": " + x.stateStr) }
[19:02:53] <mpobrien> something like that
[19:04:11] <chubz> thanks, ill try that out
[19:08:50] <mpobrien> @godslastering what happens if you run that query from the shell, do those docs come up
[19:09:40] <Godslastering> mpobrien: actually what i pasted is wrong. i dont want those docs. i'm lost, i've gotta come back to this problem later or tomorrow.
[19:09:55] <mpobrien> ok
[19:12:14] <slavik_> is it possible to recursively search for a key=value pair and return the top level doc if it is found somewhere?
[19:16:20] <wereHamster> slavik_: do it manually, using where
[19:23:05] <Godslastering> mpobrien: i am having an issue though: viewing my mongod output log, i'm getting like 80% of my queries over 150ms ... it's running on a quad core 3.2GHz i7 ... these are pretty simple queries, and i'm quering on indexed data. is this normal?
[19:24:03] <mpobrien> hm
[19:24:24] <mpobrien> you should double check what the query plan is with .explain() just to be sure
[19:24:47] <mpobrien> also look in mongostat to see if theres page faults
[19:25:15] <Godslastering> mpobrien: ok i'm running mongostat. what do i do now? what am i looking for?
[19:25:25] <mpobrien> do you have a "faults" field in there?
[19:26:18] <Godslastering> mpobrien: no. i don't believe so.
[19:26:46] <mpobrien> FYI cpu is rarely the bottleneck for queries, its usually RAM or io
[19:27:22] <Godslastering> mpobrien: mongod is using 1gb ram, and i've got about 6gb of ram free on the system. how would i know if the bottleneck here was I/O?
[19:28:11] <mpobrien> try running iostat and monitor that
[19:28:38] <mpobrien> also in the log, what does it say when it reports a slow query? should have something like nscanned etc.
[19:28:59] <Godslastering> mpobrien: nscanned:2359481
[19:29:05] <Godslastering> on the last one i just saw fly by me
[19:29:27] <mpobrien> hmm
[19:29:33] <mpobrien> thats a lot
[19:29:57] <Godslastering> mpobrien: should it be lower? doesn't that mean that is how many entries are in that collection?
[19:30:18] <mpobrien> that means the number of entries that the DB needs to examine to find matches
[19:30:39] <Godslastering> mpobrien: hm... again, is that bad though? i did ensure an index on the fields i'm querying on
[19:30:52] <mpobrien> a high number like that would probably mean index isn't being used at all, or its not an effective one
[19:31:21] <Godslastering> mpobrien: appears not to be a disk bottleneck either. MB/s is only at about 0.25
[19:31:45] <Godslastering> lets see, it may be an index issue then.
[19:32:17] <mpobrien> try your queries with .explain() and check the output carefully - also using hte profiler might help
[19:34:09] <Godslastering> ok so i did a .explain(), mpobrien, and it's giving me this http://paste.pound-python.org/show/24581/
[19:34:17] <mpobrien> yeah
[19:34:21] <mpobrien> basiccursor means no index was used
[19:34:26] <mpobrien> table scan
[19:34:49] <Godslastering> mpobrien: hm, ok. that explains it. but if i do db.mycollection.ensureIndex({ip:1,hostname:1,name:1}), why wouldn't this work?
[19:35:03] <mpobrien> whats the query you're running?
[19:35:34] <Godslastering> mpobrien: db.mycoll.find({ip:'182.168.0.1'}) in the one i pasted
[19:36:26] <mpobrien> ok
[19:37:43] <mpobrien> so
[19:37:49] <mpobrien> whats in db.mycoll.getIndexes()
[19:38:21] <Godslastering> [{_id:1},{name:1,ip:1,hostname:1}]
[19:38:47] <mpobrien> hm, that index ordering doesn't match your ensureIndex ordering
[19:39:07] <chubz> how do i check my mongodb api version #?
[19:39:19] <mpobrien> @chubz which api?
[19:39:46] <chubz> mongodb api?
[19:39:59] <mpobrien> api for what though? a specific driver, etc
[19:40:09] <chubz> java
[19:40:33] <mpobrien> should be in the filename of the jar file in your project
[19:41:02] <chubz> oh durrr. sorry brain fart
[19:41:20] <Godslastering> mpobrien: alright, db.mycoll.getIndexKeys() is telling me exactly what i'd expect, but it's still using a basiccursor in .explain()
[19:42:26] <Godslastering> mpobrien: and attempting to force an index with db.mycoll.find({ip:'192.168.0.1'}).hint({ip:1}).explain() is telling me bad hint
[19:43:00] <mpobrien> what does getIndexKeys() say?
[19:43:35] <Godslastering> mpobrien: that's what i pasted earlier, the [{_id:1}] stuff
[19:44:10] <mpobrien> thats still incorrect though - the ordering matters
[19:44:25] <mpobrien> if you have an index on (name, ip, hostname)
[19:44:39] <mpobrien> you can query on (name) or (name, ip), or (name, ip, hostname) and it will use the index
[19:44:45] <Godslastering> mpobrien: ok i was confused at how indices work
[19:44:57] <Godslastering> mpobrien: i just did db.mycoll.ensureIndex({ip:1}) and now it's using a BTreeCursor
[19:45:03] <mpobrien> but you can't query on just ip, because its not the first field
[19:45:06] <mpobrien> ok, perfect
[19:45:08] <mpobrien> thats what you need
[19:45:13] <mpobrien> that should be much faster
[19:45:20] <Godslastering> mpobrien: before i was doing db.mycoll.ensureIndex({ip:1,hostname:1,name:1}) thinking it would create all of those
[19:45:25] <mpobrien> ohh
[19:45:31] <mpobrien> nah, you need to create each one individually
[19:45:36] <Godslastering> mpobrien: thanks, i see now :)
[19:45:46] <mpobrien> or if you are querying on multiple fields together, you can use that syntax to create a compound index
[19:45:54] <mpobrien> but doesnt seem like that applies to you here
[19:46:04] <mpobrien> {ip : 1} should be fine
[19:47:44] <Godslastering> mpobrien: wow, since i did that i haven't seen a single slow query show up yet
[19:53:51] <Godslastering> mpobrien: ok, since i was confused before, let me be sure i know what i'm doing here: i only have to do ensureIndex once, even if i insert a lot of data after this? will it rebuild the index?
[19:54:50] <pas256> Hello
[19:59:04] <kali> Godslastering: yes. index are maintained when you write, (not on the first read a la couchdb)
[19:59:49] <Godslastering> kali: alright, thanks. i was pretty confused with the documentation that i read
[20:01:25] <kali> Godslastering: that said, if you have a big chunk of data to load, it might be a good idea to insert and then to setup the index
[20:02:48] <Godslastering> kali: i already have the bulk of my data inserted, from now on it will be just like 10-20000 entries an hour, compared to the initial chunk. should i check to be sure the indices exist at the beginning of my script before i run to make sure i dont get bad performance?
[20:04:12] <kali> Godslastering: you don't have to do anything. if the index it there, it wont go away :)
[20:04:26] <Godslastering> kali: alright, good. thanks.
[20:05:01] <Godslastering> kali: real quickly, how can i get all documents where the key 'ip' is either null or doesn't exist?
[20:05:27] <Godslastering> i was doing db.mycoll.find({ip:none}) but someone said that may not work?
[20:05:39] <Godslastering> (null, not none)
[20:06:17] <kali> it think ip:null actualy works for both null and lack of value
[20:06:27] <kali> unless you define the index as sparse
[20:06:30] <Godslastering> ah, alright, good then.
[20:06:59] <Godslastering> and db.mycoll.find({ip:{$nin:[null,'__unresolved__']}}) was giving me some documents in which 'ip' wasn't even defined though
[20:08:34] <kali> Godslastering: http://uu.zoy.org/v/hifuno#lang=objectivec&style=solarized_light
[20:09:11] <kali> is it what you expect ?
[20:09:54] <Godslastering> kali: that's fine, yeah, but db.mycoll.find({ip:{$nin:[null,'__unresolved__']}}) is still giving me documents where ip might be null
[20:11:46] <kali> mmmm
[20:13:05] <Godslastering> kali: this is what's happening http://paste.pound-python.org/show/24582/
[20:13:38] <kali> Godslastering: http://uu.zoy.org/v/bizeri#lang=smalltalk&style=solarized_light this is working, but not nice
[20:14:09] <Godslastering> kali: oh that's nearly illegible! but it's working?
[20:17:23] <kali> Godslastering: i think you should consider a slight alteration of your scheme :)
[20:17:31] <Godslastering> kali: how's that?
[20:17:53] <kali> had "unresolved: true" instead of the magical value
[20:17:54] <Godslastering> kali: here's my situation: i want to find entries where ip is not null and it doesn't equal '__unresolved__
[20:18:05] <Godslastering> kali: oh silly me, that would make a lot of sense
[20:18:21] <kali> you're not silly, you're thinking SQL
[20:18:42] <Godslastering> kali: somehow, yeah, even though i hate SQL, i'm still thinking that way
[20:19:14] <kali> resolved: true might be even better
[20:19:55] <Godslastering> kali: will {unresolved:{$not:true}} give me where unresolved:false and where unresolved doesn't even exist?
[20:21:18] <kali> db.foo.find({ unresolved: { $nin: [true] }})
[20:21:38] <Godslastering> kali: thanks
[20:22:01] <kali> or db.foo.find({ unresolved: { $in: [false, null] }})
[20:22:14] <Godslastering> kali: i like $nin:[true] :p
[20:22:46] <kali> yeah, this is why i prefer resolved: true, but i think unresolved:true might be more efficient in term of space
[20:22:57] <kali> i expect you have more resolved ips than unresolved
[20:23:08] <Godslastering> kali: psh, i'd be lucky if that was the case >_>
[20:23:43] <kali> if you're dealing with a huge collection, consider picking short names, too
[20:24:08] <kali> a bit frustrating, but it can make a big difference
[20:24:13] <Godslastering> kali: can i rename later, easily, when i'm out of the 'development' phase?
[20:24:32] <kali> at the price of a data reload, basically
[20:24:52] <Godslastering> kali: hmm. i'm not using _long_ names per se, but not short either. i'll have to change them soon, though, i guess
[20:25:16] <kali> the sooner the better. you want to do this before going through testing and QA
[20:25:27] <kali> or whatever equivalent you have :)
[20:25:32] <kali> if you have some.
[20:25:47] <Godslastering> kali: luckily this is a personal project, but because it _is_ personal, lots of disk usage can get annoying
[20:35:15] <Godslastering> kali: if mongodb allocated 20 files when i had a lot of data, and then i remove like 4gb, can i tell it to get rid of unecessary pre-allocated files?
[20:37:52] <mpobrien> if you do a repairDatabase, it will re-write the datafiles with the minimum space needed
[20:38:11] <Godslastering> mpobrien: so, from the mongo client, db.mycoll.repairDatabase() ?
[20:38:34] <mpobrien> just db.repairDatabase()
[20:38:42] <Godslastering> mpobrien: ah, thanks. i'll give it a whirl
[20:38:45] <mpobrien> it will need to have extra disk space available to run
[20:39:00] <mpobrien> and it might take some time depending on how much data is in there
[20:39:21] <mpobrien> but once its finsihed, it should reclaim a good amount of space
[20:39:33] <Godslastering> mpobrien: ok, running it now.
[20:59:46] <Godslastering> if an insert takes a while, it's most likely disk I/O causing that, right?
[21:00:11] <mpobrien> could be allocating new datafile
[21:00:39] <Godslastering> mpobrien: yeah, but that's still disk i/o, right?
[21:00:43] <mpobrien> yeah
[21:00:54] <Godslastering> alright, so no easy optimization there
[21:01:03] <mpobrien> yeah, should be very rare though.
[21:01:15] <Godslastering> mpobrien: mhm i'm recreating my data set, so that makes sense
[21:01:20] <mpobrien> yup
[21:02:34] <pas256> is there a reason why MongoDB with a read only workload doing geospatial queries would not use all cores on a system?
[21:04:29] <kali> Godslastering: file allocation is fast if you use xfs or ext4
[21:04:54] <kali> Godslastering: and painfully slow with ext2 and ext3
[21:04:59] <Godslastering> kali: i'm using macosx, so ... i can't remember of the top of my head what i'm using
[21:05:13] <kali> hdfs
[21:05:36] <kali> or is it hfs ?
[21:06:47] <Godslastering> Mac OS X Extended (Journaled), kali .. heh
[21:07:50] <kali> this is hfs+
[21:14:13] <kali> Godslastering: as far as i can tell, there is no support for fallocate() on osx
[21:14:25] <kali> Godslastering: so fast file preallocation does not work
[21:18:47] <ojii> what's the best practice to get a random entry from a mongodb collection?>
[21:19:33] <mpobrien> if the collection isn't too huge, do a count() to get the size of hte collection
[21:19:52] <mpobrien> then .find({}).skip(<random number>).limit(1)
[21:20:03] <ojii> mpobrien, what's "too huge"?
[21:20:36] <mpobrien> well, depends on what acceptable performance is for you… the potential issue is that .skip() gets expensive if you have a lot of docs
[21:20:49] <mpobrien> so it works better on smaller collections
[21:21:12] <mpobrien> an alternative is, populate a random number into each doc when you insert it
[21:21:18] <mpobrien> create an index on it
[21:21:42] <mpobrien> then do .find( { rand_field : {$gt : <some random value>}}).limit(1)
[21:23:03] <ojii> mpobrien, doesn't sound too random though
[21:23:42] <mpobrien> it should be fine
[21:24:37] <mpobrien> you're basically just assigning a random value to each doc, and then choosing one based on another random value at runtime
[21:27:13] <Godslastering> kali: and because it's mac os x, itunes uses more ram than mongodb! >.>
[22:42:28] <manveru> is there any way to have remove() return the removed document(s)?
[22:50:34] <mpobrien> manveru: you can use findAndModify for that
[22:50:51] <mpobrien> it has a "remove" option which will return a document and remove it in one option
[22:50:58] <mpobrien> only works on a single doc at a time though.
[22:51:24] <manveru> awesome, that's good enough
[22:53:42] <nofxx> My server is almost idle, got some spare ram for mongo, 90% queries are < 100ms , but every once in a while a got some find by ids, going 300-700ms ...
[22:53:57] <nofxx> and this local.oplog.rs 1973ms 1973ms 0ms on mongotop.... things are normal? first trip sailor...
[22:54:25] <nofxx> on the replset master*
[22:55:25] <pas256> is there a reason why MongoDB with a read only workload doing geospatial queries would not use all cores on a system?
[22:55:41] <nofxx> that and mongoid that always but an $order, even in find by ids...: { $query: { _id: ObjectId('501c46c343d8a44c26000163') }, $orderby: { _id: 1 } } ... wondering if it could hit performance in some way
[22:56:05] <nofxx> pas256, iirc you need sharding to use multiple cores
[22:57:56] <nofxx> manveru, using mongoid? 3 has findAndModify working
[22:58:07] <pas256> nofxx: On a smaller box, it used all cores
[22:58:14] <manveru> nofxx: mgo
[22:58:17] <pas256> not that we have a bigger box, it is not using all of them
[22:58:27] <pas256> weird
[22:58:45] <Liquid-Silence> hi all
[22:59:05] <Liquid-Silence> its there a way to remove element from all documents in a collection?
[22:59:28] <jordanorelli> manveru: in mgo it's Find(…).Apply(). it might be fixed by now, but there's a bug on findAndModify if your session mode is set to eventual
[23:00:41] <Liquid-Silence> jordanorelli: any idea?
[23:01:16] <mpobrien> Liquid-Silence: use the $unset operator
[23:01:54] <Liquid-Silence> not to sure what you mean mate
[23:02:55] <mpobrien> db.collection.update( {}, {$unset: {fieldname : 1} }, false, true)
[23:03:10] <mpobrien> will remove a field from all docs in the collection.
[23:04:11] <Liquid-Silence> ok my fieldname is ResourceId
[23:04:18] <Liquid-Silence> can I do db.collection.update( {}, {$unset: {fieldname : ResourceId} }, false, true)
[23:04:19] <mpobrien> ok so
[23:04:23] <mpobrien> yeah
[23:04:27] <mpobrien> you will need to put ResourceId in quotes.
[23:04:38] <mpobrien> and use it as the key
[23:04:40] <mpobrien> not the value
[23:04:42] <mpobrien> like this:
[23:04:51] <mpobrien> can I do  db.collection.update( {}, {$unset: {"ResourceId" : 1} }, false, true)
[23:05:30] <Liquid-Silence> hmm where do I execute this in mongo view
[23:05:38] <mpobrien> in your mongo shell
[23:16:46] <Liquid-Silence> collection being the collection I am using
[23:16:47] <Liquid-Silence> ?
[23:16:51] <mpobrien> yes
[23:20:55] <Liquid-Silence> that did not work
[23:20:57] <Liquid-Silence> I still see it
[23:22:11] <Liquid-Silence> ResourceId is in a document
[23:22:17] <Liquid-Silence> mpobrien:
[23:23:57] <mpobrien> whats your document look like
[23:23:59] <mpobrien> can you post a sample
[23:25:53] <jordanorelli> what's the normal way to see if a field exists on an object in mongo's javascript implementation?
[23:26:08] <mpobrien> in a query?
[23:26:10] <jordanorelli> e.g., i get a document, regardless of whether it has some key, and then i want to see, for that document that i've already retrieved, if i have that key.
[23:26:11] <mpobrien> or jsut in the shell?
[23:26:30] <jordanorelli> in the shell. it's really an $eval block, but same thing.
[23:26:50] <mpobrien> just fetch the document by its _id (or whatever) and check to see if the value for the key is undefined.
[23:27:09] <jordanorelli> word
[23:27:58] <mpobrien> e.g.: doc = db.collection.find({_id:'foo'); if( 'ResourceId' in doc ){ print ("it has the field") }