PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 8th of October, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:33:49] <hydrajump> has anyone used https://github.com/sunitparekh/data-anonymization to anonymise mongodb data?
[01:04:23] <amcgregor> hydrajump: So far I've just rolled my own exporters in Python to anonymize the data we share.
[01:05:01] <amcgregor> hydrajump: Mostly it's just hashing things for my dataset, though. Or generating unique, consistent random string psudonyms to re-map session IDs and things.
[01:06:11] <amcgregor> (The latter is a special case to prevent correlation between individual exports, which simple hashing wouldn't accomplish.)
[04:29:30] <kakashi_> hi, I have a question about write_concern option in mongodb.
[04:30:45] <kakashi_> when we set it as { w: 0}, we find out some data get lost.
[04:32:13] <kakashi_> is there any way to check mongod apply error
[04:33:25] <joannac> yes, specify w:1
[04:34:53] <kakashi_> okay... but w:1 is pretty slow when we import large amounts of data
[04:38:11] <Boomtime> @kakashi: what are you using to import the data?
[04:39:25] <kakashi_> I just use python mongodb driver
[04:39:35] <Boomtime> multi-thread your import
[04:40:29] <kakashi_> got it, good idea!
[04:41:44] <Boomtime> alternatively, use the bulk API: http://api.mongodb.org/python/current/examples/bulk.html#ordered-bulk-write-operations
[04:44:01] <kakashi_> yes, we already use bulk insert
[04:44:23] <Boomtime> then multi-threading won't much more
[04:45:00] <Boomtime> the slowdown is just the difference between the documents you can actually insert versus the documents which didn't get inserted at all when you were using w:0
[04:45:29] <Boomtime> you are almost certainly at the limit of the your hard-drives
[04:45:42] <kakashi_> okay... I got it
[04:45:47] <kakashi_> thanks
[05:12:33] <culthero> Anyone have any suggestions for partitioning fulltext index searches when you don't have a fixed predicate for intersection/
[05:12:45] <culthero> Speeding them up rather, against large collections (50-100gb)
[05:27:35] <culthero> I have 8m tweet documents, with text indexed, and searching against them for something that occurs 10k times is slow as shit, on a 4 shard setup with SSD's..
[05:29:11] <culthero> searching through the text index for something like RT
[05:29:55] <culthero> triggers between 2 minutes and 10 minutes of 200mb/s read from the disk.. can I somehow ... maybe if by timeframe lessen the amount of index it needs to run through
[05:31:37] <Jester831> I'd like to use mongodb objectID's for redis bitmaps, which are 32 bit, what kinds of options are there to avoid collisions?
[05:43:58] <bin> guys i have some weird problem
[05:44:36] <bin> the _id field has index 1 (ascending)
[05:44:45] <bin> but my records are saved descending
[05:44:52] <bin> any idea why this is happening ?
[05:45:29] <joannac> what do you mean, your records are saved descending?
[05:45:30] <culthero> o.o so doing a simple db.collection.find() returns newest results without sort({_id: -1}) ?
[05:46:29] <bin> { "_id" : ObjectId("5434ce4764826092ecda9554") and after that { "_id" : ObjectId("5434ce4764826092ecda9553"),
[05:46:41] <bin> it should be { "_id" : ObjectId("5434ce4764826092ecda9553"), and after that { "_id" : ObjectId("5434ce4764826092ecda9554"),
[05:46:52] <bin> culthero: yeah
[05:47:18] <joannac> bin: single thread? pure insert (not an update)?
[05:47:55] <bin> database.get().getCollection("message_queue").insert(object)
[05:47:59] <bin> pure insert , not even save
[05:48:08] <bin> single thread ..
[05:48:12] <bin> it's a for loop
[05:48:21] <bin> in the for loop the sequence is correct
[05:48:48] <bin> but when saved in the collection first appear older documents
[05:49:05] <joannac> brand new database?
[05:49:08] <culthero> How are you viewing the collection? Through mongo CLI or
[05:49:16] <bin> yeap we updated our dev machine
[05:49:25] <bin> and it's with the latest mongodb version
[05:49:36] <bin> as far as i know (just asked the colleague)
[05:49:56] <bin> culthero: well through terminal using mongo shell
[05:51:14] <bin> ok i dropped the collection
[05:51:18] <bin> will try again ..
[05:51:41] <joannac> use a new database
[05:51:51] <bin> haha it worked ..
[05:52:04] <culthero> is it a capped collection?
[05:52:11] <bin> of course not ..
[05:52:35] <joannac> bin: only explanation i have is reused extents
[05:52:42] <bin> meaning ?
[05:52:46] <joannac> your first document had a new extent allocated
[05:53:02] <bin> well the problem remain even if i remove all documents
[05:53:03] <joannac> but your second document could use the space from previously deleted documents
[05:53:04] <bin> and insert new
[05:53:30] <bin> btw collections were restored from a dump
[05:53:42] <bin> could that be a possible problem
[05:53:48] <joannac> no?
[05:54:01] <bin> anyways ... thanks for your time guys :)
[06:01:06] <iksik> hm, what would be the simplest way to clone a collection? i don't want to migrate it to different db instance, just need to create local copy of it
[06:04:55] <iksik> ok, nvm found it
[07:58:47] <acidjazz> hey every1.. any1 here particularly good w/ mongo+python?
[08:06:14] <acidjazz> so like.. how do i get full on microseconds from a sec and usec
[08:06:27] <acidjazz> to pass into python datetime.fromtimestamp()
[11:34:06] <PirosB3> Hi all, is it possible to do a $geoNear query on an array of coordintes?
[11:34:16] <PirosB3> I have a document structure similar to this:
[11:34:18] <Derick> no, only to a point.
[11:34:43] <Derick> but your document can have an array of points, if you use a MultiPoint geo-json type.
[11:34:46] <PirosB3> { city: xxx, stores: [{name: xxxx, coords: [x,y]}, {name: xxxx, coords: [x,y]}, {name: xxxx, coords: [x,y]}]}
[11:35:03] <Derick> PirosB3: I would change that scema to:
[11:35:18] <PirosB3> if I can $unwind them before, then it would be great. Unfortunately it looks like not possible
[11:35:39] <Derick> { city: xxx, store_name: xxxx, coords: { type: 'Point', 'coordinates' : [ x, y ] } }
[11:35:55] <Derick> geoNear needs to be the first in aggregation, so you can't unwind before hand
[11:36:14] <Derick> PirosB3: i.e., store each "store" into its own document
[11:36:25] <PirosB3> exactly
[11:36:32] <PirosB3> ok
[11:36:46] <PirosB3> but then, if I have to count unique cities
[11:36:52] <PirosB3> it’s not as easy as doing a count()
[11:36:59] <PirosB3> I do need to aggregate before counting
[11:37:06] <Derick> right, but it's not really difficult either
[11:37:06] <PirosB3> mmm
[11:37:13] <Derick> you should be able to use distinct
[11:37:13] <PirosB3> is it expensive?
[11:37:23] <Derick> collection.distinct( 'city' ) should do
[11:37:31] <PirosB3> Derick: do you have any reference to operation complexity in Mongo?
[11:37:40] <Derick> distinct would be a full table scan
[11:37:49] <Derick> but, so would a count on a collection
[11:38:09] <PirosB3> sure, but while I assume a count is stored internally
[11:38:19] <Derick> PirosB3: it's not
[11:38:32] <PirosB3> oh really?
[11:38:35] <PirosB3> interesting :)
[11:38:43] <Derick> i don't believe it is
[11:38:45] <PirosB3> so a count is O(N) ?
[11:38:48] <Derick> how many stores are we talking about?
[11:39:14] <PirosB3> we are talking about around 6000 cities (it’s abstract)
[11:39:24] <Derick> that will of course, easily fit in memory
[11:39:29] <PirosB3> and for each city, around 30-40 stores
[11:39:36] <Derick> yeah, that's nothing really.
[11:39:41] <PirosB3> 240.000
[11:39:44] <PirosB3> average
[11:40:01] <Derick> you do have 24MB of memory I hope? :-)
[11:40:17] <PirosB3> how expensive do you expect an aggregate({$group: “$store”}).count() to be?
[11:40:26] <PirosB3> I think I should have 24MB :)
[11:40:33] <Derick> probably a bit more than a distinct, but not much
[11:40:39] <PirosB3> oh wait
[11:40:43] <PirosB3> that’s true
[11:40:47] <PirosB3> I can just do a distinct
[11:42:26] <Derick> I've just checked, a distinct on a 227k document collection, takes about a second
[11:43:06] <PirosB3> mmmm
[11:43:17] <Derick> about the same for $group
[11:43:27] <Derick> let me create an index on the field
[11:43:43] <PirosB3> I guess if you index by the key you are distincting by
[11:43:48] <PirosB3> then it would be more efficient
[11:43:57] <PirosB3> (maybe, just sayin)
[11:44:02] <Derick> yes, then the distinct is nearly instant
[11:44:03] <PirosB3> anyways thanks for you help Derick
[11:44:07] <Derick> but not the $group
[11:44:10] <PirosB3> that’s fine
[11:44:26] <Derick> (based on non-scientific experimentation)
[11:44:27] <PirosB3> you know what, that’s actually even better. It makes me avoid using $unwind all the time
[11:44:38] <Derick> yes - which would be really slow
[11:44:49] <Derick> and also, you won't be able to do geonear at all ;-)
[11:44:58] <PirosB3> exactly!
[11:45:34] <PirosB3> also Derick, do I need to specity type: "Point"
[11:45:46] <PirosB3> explicitly save it in my documents?
[11:45:57] <PirosB3> or is it okay just to save coordinates: [x,y]
[11:46:01] <Derick> yes, you should store geojson
[11:46:17] <Derick> which requires: type: "Point" and coordinates: [x,y]
[11:46:31] <PirosB3> okay, so that means location: {type: “Point”, coordinates: [x,y]}
[11:46:33] <PirosB3> exactly
[11:46:34] <Derick> yes
[11:46:35] <PirosB3> wow thx
[11:46:59] <PirosB3> the more I use MongoDB the more I think it’s cool!
[11:46:59] <Derick> that, combined with a 2dsphere index and a geoNear using the same geojson format, is likely the fastest
[11:47:13] <PirosB3> one thing that would be cool
[11:47:46] <PirosB3> is to find a way to do map-reduce using your generic programming language
[11:47:51] <PirosB3> so for example
[11:47:59] <PirosB3> I could use numpy in the map phase
[11:48:01] <Derick> yeah, but that would mean embedding your favourite language into MongoDB
[11:48:04] <PirosB3> or something like that
[11:48:16] <PirosB3> yeah :(
[11:48:18] <Derick> and as there are dozens of "favourite language", that's not going to be easy
[11:48:35] <PirosB3> nope, in fact I mean more provide a generic interface
[11:48:39] <Derick> Map/Reduce is also not very performant - you're much better off using the Aggregation Framework
[11:48:45] <PirosB3> true
[11:49:08] <PirosB3> but map/reduce would be very useful for me, if I could use my own libraries in both phases
[11:49:15] <PirosB3> what is you general go-to in this case?
[11:49:18] <PirosB3> any suggestions?
[11:49:31] <Derick> hmm, I'd avoid M/R ;-)
[11:49:51] <PirosB3> ahahaha :) ok I will accept your suggestion :)
[11:50:12] <Derick> but if you get a lot of data, you should look at hadoop and it's mongodb integration
[11:50:28] <Derick> I believe that hadoop allows you to use your favourite language (but I'm not certain on that)
[11:51:03] <PirosB3> oh okay..
[11:51:09] <PirosB3> so hadoop has a mongo integration?
[11:51:13] <PirosB3> didnt know that
[11:51:13] <PirosB3> coo
[11:51:15] <PirosB3> cool
[11:51:28] <Derick> yeah, http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/
[11:51:30] <PirosB3> but I’ll stick to your suggestion and generally avoid MR till I really need to
[12:14:03] <lipiec> hi, after upgrade to new mms agent 2.6.0
[12:14:54] <lipiec> I see those errors in mms agent log
[12:14:55] <lipiec> http://pastebin.com/TYFvcxi6
[12:15:10] <lipiec> I've just upgraded nothing more.
[12:15:36] <lipiec> What can I do with it? Cause mms cannot monitor my instances anymore.
[12:16:02] <lipiec> I don't want any kerberos...
[12:17:15] <PirosB3> hey Derick
[12:17:26] <PirosB3> is there any way I can do a distinct() query on > 1 field?
[12:17:46] <PirosB3> like collection.distinct([‘city’, ‘store’])
[12:17:47] <PirosB3> ?
[12:19:30] <Derick> PirosB3: seems you'll have to use the aggregation framework for that
[12:19:42] <Derick> why would you store a "store" twice though?
[12:20:38] <PirosB3> because the data would be {city: xx, store: xxx, location: { type: Point2D, location: [xxx,xxxx]}}
[12:20:57] <PirosB3> but actially that turns out to not be what I want
[12:21:07] <PirosB3> if I want to say, how many stores are there in a city
[12:21:21] <PirosB3> collection.distinct(‘store’, {city: xxx}).count()
[12:21:29] <PirosB3> does that sound correct?
[12:31:53] <PirosB3> hi all, I am getting errors when using ensureIndex on my 2D field
[12:32:02] <PirosB3> this is an example of a document in my collection
[12:32:02] <PirosB3> https://gist.github.com/PirosB3/2d48d92bff017c1978a1
[12:32:20] <PirosB3> The error I get is: location object expected, location array not in correct format
[13:45:59] <MaxSan> Hey folks. So if I am in a directy on linux such as /home/Me/Project/SomeProject
[13:46:28] <MaxSan> and run mongo from that directory in the terminal, does this only access collectiosn from this location?
[13:46:48] <MaxSan> as in, they are not stored in some .mongo directory or something but are specific to the location im running the command from?
[16:28:52] <huleo> hi
[16:30:52] <huleo> get one element fitting some query, but only one, with biggest value of other field - aiming for impossible?
[16:33:50] <huleo> for example only one latest article (with biggest date) of each author ($in?)
[16:36:58] <dgarstang> Anyone awake?
[16:37:26] <dgarstang> I'm trying to create a user in mongo with mongo foo.js. The foo.js has a 'use dbc' in the first line but mongo doesn't like that
[16:42:45] <dgarstang> god, why is this so hard?
[16:51:02] <dgarstang> hellooo
[16:54:15] <dgarstang> Is MongoDB an abandonware product?
[16:56:38] <jngd> dgarstang, I think no, why you ask?
[17:08:07] <culthero> dgarstang: If you're writing scripts for the mongo shell are you following this? http://docs.mongodb.org/manual/tutorial/write-scripts-for-the-mongo-shell/
[17:08:28] <culthero> IE: db = db.getSiblingDB('<db>')
[17:08:32] <culthero> not
[17:08:37] <culthero> not use dbc
[17:14:31] <dgarstang> culthero: thanks. looking...
[17:15:39] <dgarstang> Btw, here's my issue https://groups.google.com/forum/#!topic/mongodb-user/ex3mwDlgcC0
[17:16:32] <dgarstang> What I don't understand is... to create a user in the dbc database, do I connect to the dbc database or the admin database?
[17:17:49] <dgarstang> Because... if I connect to the admin database (eventhough I say 'use') it still creates the user in the admin database. If I connect to the dbc database as the admin user, then auth fails
[17:22:10] <dgarstang> got it
[17:55:12] <NodeJS> Does mongodb support server side javascript functions?
[18:08:21] <cheeser> http://docs.mongodb.org/manual/core/server-side-javascript/
[18:52:31] <mondan> can anyone help me with the setParameter syntax in the new YMAL config. I am trying to disable the localhost exception, but whatever I put in there doesnt seem to want to work
[18:55:04] <dgarstang> Can I set the mongo cli password via an environment variable?
[18:58:00] <mocx> yo, i have a mongoose problem, middleware issue, not sure if this is the right place to be
[18:58:39] <mocx> here's my code: pastie.org/pastes/9632075/text?key=gddijiwhkvbgeb22v2swg
[18:59:02] <mocx> basically when a password is less than 8 chars a new error should be thrown
[18:59:30] <mocx> it's catching the length, but the error only appears as {}
[19:00:38] <mocx> console.log(err) is showing it, but it's not being passed to next() it seems
[20:00:35] <dgarstang> Can I set the mongo cli password via an environment variable?
[23:18:28] <drags> hello, I'm running 2.4.10, I've enabled authentication (using keyFile directive) and now I cannot seem to grant myself the ability to use rs.status(), getting: { "ok" : 0, "errmsg" : "unauthorized" }
[23:18:41] <drags> my "superuser" has: roles: [ "userAdminAnyDatabase", "readWriteAnyDatabase", "userAdminAnyDatabase", "dbAdminAnyDatabase", "clusterAdmin" ]
[23:19:00] <drags> I can't seem to find any other roles to apply to grant more access.. how can I restore access to the rs.status() command?
[23:24:14] <Boomtime> @drags: it should work already
[23:24:16] <Boomtime> http://docs.mongodb.org/manual/reference/built-in-roles/#clusterAdmin
[23:24:51] <Boomtime> clusterAdmin grants clusterManager grants replSetGetStatus (rs.status)
[23:25:17] <Boomtime> so.. what does db.auth() return when you authenticate?
[23:25:43] <drags> Boomtime: "1" (minus quotes)
[23:26:02] <Boomtime> ok, and what database are you auth'ing against?
[23:26:18] <drags> admin
[23:27:02] <drags> Boomtime: is there a method for showing the roles granted to the current user?
[23:27:42] <Boomtime> not sure, but i would just look at the user creds doc
[23:28:21] <drags> Boomtime: hrm, welp, I made a new user with the roles listed above and it seems to work
[23:28:21] <Boomtime> system.users.findOne( { user: "superuser" } )
[23:28:47] <drags> I had actually forgot the clusterAdmin role on the superuser account initially, so I removed and re-added that user.. it may be possibe I wasn't use'ing admin at the time
[23:28:54] <Boomtime> i think the roles on your superuser are not what you think
[23:29:08] <Boomtime> ah, yep
[23:29:29] <Boomtime> yeah, being in the wrong database at the time is a common/easy mistake
[23:29:57] <drags> doh, yar, running that findOne command (with db. prepended) shows the "superuser" user was still missing the clusterAdmin role
[23:30:06] <drags> thanks for walking me through this Boomtime :)
[23:30:16] <Boomtime> cheers