pmxbot IRC Log Viewer

[00:33:49] <hydrajump> has anyone used https://github.com/sunitparekh/data-anonymization to anonymise mongodb data?

[01:04:23] <amcgregor> hydrajump: So far I've just rolled my own exporters in Python to anonymize the data we share.

[01:05:01] <amcgregor> hydrajump: Mostly it's just hashing things for my dataset, though. Or generating unique, consistent random string psudonyms to re-map session IDs and things.

[01:06:11] <amcgregor> (The latter is a special case to prevent correlation between individual exports, which simple hashing wouldn't accomplish.)

[04:29:30] <kakashi_> hi, I have a question about write_concern option in mongodb.

[04:30:45] <kakashi_> when we set it as { w: 0}, we find out some data get lost.

[04:32:13] <kakashi_> is there any way to check mongod apply error

[04:33:25] <joannac> yes, specify w:1

[04:34:53] <kakashi_> okay... but w:1 is pretty slow when we import large amounts of data

[04:38:11] <Boomtime> @kakashi: what are you using to import the data?

[04:39:25] <kakashi_> I just use python mongodb driver

[04:39:35] <Boomtime> multi-thread your import

[04:40:29] <kakashi_> got it, good idea!

[04:41:44] <Boomtime> alternatively, use the bulk API: http://api.mongodb.org/python/current/examples/bulk.html#ordered-bulk-write-operations

[04:44:01] <kakashi_> yes, we already use bulk insert

[04:44:23] <Boomtime> then multi-threading won't much more

[04:45:00] <Boomtime> the slowdown is just the difference between the documents you can actually insert versus the documents which didn't get inserted at all when you were using w:0

[04:45:29] <Boomtime> you are almost certainly at the limit of the your hard-drives

[04:45:42] <kakashi_> okay... I got it

[04:45:47] <kakashi_> thanks

[05:12:33] <culthero> Anyone have any suggestions for partitioning fulltext index searches when you don't have a fixed predicate for intersection/

[05:12:45] <culthero> Speeding them up rather, against large collections (50-100gb)

[05:27:35] <culthero> I have 8m tweet documents, with text indexed, and searching against them for something that occurs 10k times is slow as shit, on a 4 shard setup with SSD's..

[05:29:11] <culthero> searching through the text index for something like RT

[05:29:55] <culthero> triggers between 2 minutes and 10 minutes of 200mb/s read from the disk.. can I somehow ... maybe if by timeframe lessen the amount of index it needs to run through

[05:31:37] <Jester831> I'd like to use mongodb objectID's for redis bitmaps, which are 32 bit, what kinds of options are there to avoid collisions?

[05:43:58] <bin> guys i have some weird problem

[05:44:36] <bin> the _id field has index 1 (ascending)

[05:44:45] <bin> but my records are saved descending

[05:44:52] <bin> any idea why this is happening ?

[05:45:29] <joannac> what do you mean, your records are saved descending?

[05:45:30] <culthero> o.o so doing a simple db.collection.find() returns newest results without sort({_id: -1}) ?

[05:46:29] <bin> { "_id" : ObjectId("5434ce4764826092ecda9554") and after that { "_id" : ObjectId("5434ce4764826092ecda9553"),

[05:46:41] <bin> it should be { "_id" : ObjectId("5434ce4764826092ecda9553"), and after that { "_id" : ObjectId("5434ce4764826092ecda9554"),

[05:46:52] <bin> culthero: yeah

[05:47:18] <joannac> bin: single thread? pure insert (not an update)?

[05:47:55] <bin> database.get().getCollection("message_queue").insert(object)

[05:47:59] <bin> pure insert , not even save

[05:48:08] <bin> single thread ..

[05:48:12] <bin> it's a for loop

[05:48:21] <bin> in the for loop the sequence is correct

[05:48:48] <bin> but when saved in the collection first appear older documents

[05:49:05] <joannac> brand new database?

[05:49:08] <culthero> How are you viewing the collection? Through mongo CLI or

[05:49:16] <bin> yeap we updated our dev machine

[05:49:25] <bin> and it's with the latest mongodb version

[05:49:36] <bin> as far as i know (just asked the colleague)

[05:49:56] <bin> culthero: well through terminal using mongo shell

[05:51:14] <bin> ok i dropped the collection

[05:51:18] <bin> will try again ..

[05:51:41] <joannac> use a new database

[05:51:51] <bin> haha it worked ..

[05:52:04] <culthero> is it a capped collection?

[05:52:11] <bin> of course not ..

[05:52:35] <joannac> bin: only explanation i have is reused extents

[05:52:42] <bin> meaning ?

[05:52:46] <joannac> your first document had a new extent allocated

[05:53:02] <bin> well the problem remain even if i remove all documents

[05:53:03] <joannac> but your second document could use the space from previously deleted documents

[05:53:04] <bin> and insert new

[05:53:30] <bin> btw collections were restored from a dump

[05:53:42] <bin> could that be a possible problem

[05:53:48] <joannac> no?

[05:54:01] <bin> anyways ... thanks for your time guys :)

[06:01:06] <iksik> hm, what would be the simplest way to clone a collection? i don't want to migrate it to different db instance, just need to create local copy of it

[06:04:55] <iksik> ok, nvm found it

[07:58:47] <acidjazz> hey every1.. any1 here particularly good w/ mongo+python?

[08:06:14] <acidjazz> so like.. how do i get full on microseconds from a sec and usec

[08:06:27] <acidjazz> to pass into python datetime.fromtimestamp()

[11:34:06] <PirosB3> Hi all, is it possible to do a $geoNear query on an array of coordintes?

[11:34:16] <PirosB3> I have a document structure similar to this:

[11:34:18] <Derick> no, only to a point.

[11:34:43] <Derick> but your document can have an array of points, if you use a MultiPoint geo-json type.

[11:34:46] <PirosB3> { city: xxx, stores: [{name: xxxx, coords: [x,y]}, {name: xxxx, coords: [x,y]}, {name: xxxx, coords: [x,y]}]}

[11:35:03] <Derick> PirosB3: I would change that scema to:

[11:35:18] <PirosB3> if I can $unwind them before, then it would be great. Unfortunately it looks like not possible

[11:35:39] <Derick> { city: xxx, store_name: xxxx, coords: { type: 'Point', 'coordinates' : [ x, y ] } }

[11:35:55] <Derick> geoNear needs to be the first in aggregation, so you can't unwind before hand

[11:36:14] <Derick> PirosB3: i.e., store each "store" into its own document

[11:36:25] <PirosB3> exactly

[11:36:32] <PirosB3> ok

[11:36:46] <PirosB3> but then, if I have to count unique cities

[11:36:52] <PirosB3> it’s not as easy as doing a count()

[11:36:59] <PirosB3> I do need to aggregate before counting

[11:37:06] <Derick> right, but it's not really difficult either

[11:37:06] <PirosB3> mmm

[11:37:13] <Derick> you should be able to use distinct

[11:37:13] <PirosB3> is it expensive?

[11:37:23] <Derick> collection.distinct( 'city' ) should do

[11:37:31] <PirosB3> Derick: do you have any reference to operation complexity in Mongo?

[11:37:40] <Derick> distinct would be a full table scan

[11:37:49] <Derick> but, so would a count on a collection

[11:38:09] <PirosB3> sure, but while I assume a count is stored internally

[11:38:19] <Derick> PirosB3: it's not

[11:38:32] <PirosB3> oh really?

[11:38:35] <PirosB3> interesting :)

[11:38:43] <Derick> i don't believe it is

[11:38:45] <PirosB3> so a count is O(N) ?

[11:38:48] <Derick> how many stores are we talking about?

[11:39:14] <PirosB3> we are talking about around 6000 cities (it’s abstract)

[11:39:24] <Derick> that will of course, easily fit in memory

[11:39:29] <PirosB3> and for each city, around 30-40 stores

[11:39:36] <Derick> yeah, that's nothing really.

[11:39:41] <PirosB3> 240.000

[11:39:44] <PirosB3> average

[11:40:01] <Derick> you do have 24MB of memory I hope? :-)

[11:40:17] <PirosB3> how expensive do you expect an aggregate({$group: “$store”}).count() to be?

[11:40:26] <PirosB3> I think I should have 24MB :)

[11:40:33] <Derick> probably a bit more than a distinct, but not much

[11:40:39] <PirosB3> oh wait

[11:40:43] <PirosB3> that’s true

[11:40:47] <PirosB3> I can just do a distinct

[11:42:26] <Derick> I've just checked, a distinct on a 227k document collection, takes about a second

[11:43:06] <PirosB3> mmmm

[11:43:17] <Derick> about the same for $group

[11:43:27] <Derick> let me create an index on the field

[11:43:43] <PirosB3> I guess if you index by the key you are distincting by

[11:43:48] <PirosB3> then it would be more efficient

[11:43:57] <PirosB3> (maybe, just sayin)

[11:44:02] <Derick> yes, then the distinct is nearly instant

[11:44:03] <PirosB3> anyways thanks for you help Derick

[11:44:07] <Derick> but not the $group

[11:44:10] <PirosB3> that’s fine

[11:44:26] <Derick> (based on non-scientific experimentation)

[11:44:27] <PirosB3> you know what, that’s actually even better. It makes me avoid using $unwind all the time

[11:44:38] <Derick> yes - which would be really slow

[11:44:49] <Derick> and also, you won't be able to do geonear at all ;-)

[11:44:58] <PirosB3> exactly!

[11:45:34] <PirosB3> also Derick, do I need to specity type: "Point"

[11:45:46] <PirosB3> explicitly save it in my documents?

[11:45:57] <PirosB3> or is it okay just to save coordinates: [x,y]

[11:46:01] <Derick> yes, you should store geojson

[11:46:17] <Derick> which requires: type: "Point" and coordinates: [x,y]

[11:46:31] <PirosB3> okay, so that means location: {type: “Point”, coordinates: [x,y]}

[11:46:33] <PirosB3> exactly

[11:46:34] <Derick> yes

[11:46:35] <PirosB3> wow thx

[11:46:59] <PirosB3> the more I use MongoDB the more I think it’s cool!

[11:46:59] <Derick> that, combined with a 2dsphere index and a geoNear using the same geojson format, is likely the fastest

[11:47:13] <PirosB3> one thing that would be cool

[11:47:46] <PirosB3> is to find a way to do map-reduce using your generic programming language

[11:47:51] <PirosB3> so for example

[11:47:59] <PirosB3> I could use numpy in the map phase

[11:48:01] <Derick> yeah, but that would mean embedding your favourite language into MongoDB

[11:48:04] <PirosB3> or something like that

[11:48:16] <PirosB3> yeah :(

[11:48:18] <Derick> and as there are dozens of "favourite language", that's not going to be easy

[11:48:35] <PirosB3> nope, in fact I mean more provide a generic interface

[11:48:39] <Derick> Map/Reduce is also not very performant - you're much better off using the Aggregation Framework

[11:48:45] <PirosB3> true

[11:49:08] <PirosB3> but map/reduce would be very useful for me, if I could use my own libraries in both phases

[11:49:15] <PirosB3> what is you general go-to in this case?

[11:49:18] <PirosB3> any suggestions?

[11:49:31] <Derick> hmm, I'd avoid M/R ;-)

[11:49:51] <PirosB3> ahahaha :) ok I will accept your suggestion :)

[11:50:12] <Derick> but if you get a lot of data, you should look at hadoop and it's mongodb integration

[11:50:28] <Derick> I believe that hadoop allows you to use your favourite language (but I'm not certain on that)

[11:51:03] <PirosB3> oh okay..

[11:51:09] <PirosB3> so hadoop has a mongo integration?

[11:51:13] <PirosB3> didnt know that

[11:51:13] <PirosB3> coo

[11:51:15] <PirosB3> cool

[11:51:28] <Derick> yeah, http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/

[11:51:30] <PirosB3> but I’ll stick to your suggestion and generally avoid MR till I really need to

[12:14:03] <lipiec> hi, after upgrade to new mms agent 2.6.0

[12:14:54] <lipiec> I see those errors in mms agent log

[12:14:55] <lipiec> http://pastebin.com/TYFvcxi6

[12:15:10] <lipiec> I've just upgraded nothing more.

[12:15:36] <lipiec> What can I do with it? Cause mms cannot monitor my instances anymore.

[12:16:02] <lipiec> I don't want any kerberos...

[12:17:15] <PirosB3> hey Derick

[12:17:26] <PirosB3> is there any way I can do a distinct() query on > 1 field?

[12:17:46] <PirosB3> like collection.distinct([‘city’, ‘store’])

[12:17:47] <PirosB3> ?

[12:19:30] <Derick> PirosB3: seems you'll have to use the aggregation framework for that

[12:19:42] <Derick> why would you store a "store" twice though?

[12:20:38] <PirosB3> because the data would be {city: xx, store: xxx, location: { type: Point2D, location: [xxx,xxxx]}}

[12:20:57] <PirosB3> but actially that turns out to not be what I want

[12:21:07] <PirosB3> if I want to say, how many stores are there in a city

[12:21:21] <PirosB3> collection.distinct(‘store’, {city: xxx}).count()

[12:21:29] <PirosB3> does that sound correct?

[12:31:53] <PirosB3> hi all, I am getting errors when using ensureIndex on my 2D field

[12:32:02] <PirosB3> this is an example of a document in my collection

[12:32:02] <PirosB3> https://gist.github.com/PirosB3/2d48d92bff017c1978a1

[12:32:20] <PirosB3> The error I get is: location object expected, location array not in correct format

[13:45:59] <MaxSan> Hey folks. So if I am in a directy on linux such as /home/Me/Project/SomeProject

[13:46:28] <MaxSan> and run mongo from that directory in the terminal, does this only access collectiosn from this location?

[13:46:48] <MaxSan> as in, they are not stored in some .mongo directory or something but are specific to the location im running the command from?

[16:28:52] <huleo> hi

[16:30:52] <huleo> get one element fitting some query, but only one, with biggest value of other field - aiming for impossible?

[16:33:50] <huleo> for example only one latest article (with biggest date) of each author ($in?)

[16:36:58] <dgarstang> Anyone awake?

[16:37:26] <dgarstang> I'm trying to create a user in mongo with mongo foo.js. The foo.js has a 'use dbc' in the first line but mongo doesn't like that

[16:42:45] <dgarstang> god, why is this so hard?

[16:51:02] <dgarstang> hellooo

[16:54:15] <dgarstang> Is MongoDB an abandonware product?

[16:56:38] <jngd> dgarstang, I think no, why you ask?

[17:08:07] <culthero> dgarstang: If you're writing scripts for the mongo shell are you following this? http://docs.mongodb.org/manual/tutorial/write-scripts-for-the-mongo-shell/

[17:08:28] <culthero> IE: db = db.getSiblingDB('<db>')

[17:08:32] <culthero> not

[17:08:37] <culthero> not use dbc

[17:14:31] <dgarstang> culthero: thanks. looking...

[17:15:39] <dgarstang> Btw, here's my issue https://groups.google.com/forum/#!topic/mongodb-user/ex3mwDlgcC0

[17:16:32] <dgarstang> What I don't understand is... to create a user in the dbc database, do I connect to the dbc database or the admin database?

[17:17:49] <dgarstang> Because... if I connect to the admin database (eventhough I say 'use') it still creates the user in the admin database. If I connect to the dbc database as the admin user, then auth fails

[17:22:10] <dgarstang> got it

[17:55:12] <NodeJS> Does mongodb support server side javascript functions?

[18:08:21] <cheeser> http://docs.mongodb.org/manual/core/server-side-javascript/

[18:52:31] <mondan> can anyone help me with the setParameter syntax in the new YMAL config. I am trying to disable the localhost exception, but whatever I put in there doesnt seem to want to work

[18:55:04] <dgarstang> Can I set the mongo cli password via an environment variable?

[18:58:00] <mocx> yo, i have a mongoose problem, middleware issue, not sure if this is the right place to be

[18:58:39] <mocx> here's my code: pastie.org/pastes/9632075/text?key=gddijiwhkvbgeb22v2swg

[18:59:02] <mocx> basically when a password is less than 8 chars a new error should be thrown

[18:59:30] <mocx> it's catching the length, but the error only appears as {}

[19:00:38] <mocx> console.log(err) is showing it, but it's not being passed to next() it seems

[20:00:35] <dgarstang> Can I set the mongo cli password via an environment variable?

[23:18:28] <drags> hello, I'm running 2.4.10, I've enabled authentication (using keyFile directive) and now I cannot seem to grant myself the ability to use rs.status(), getting: { "ok" : 0, "errmsg" : "unauthorized" }

[23:18:41] <drags> my "superuser" has: roles: [ "userAdminAnyDatabase", "readWriteAnyDatabase", "userAdminAnyDatabase", "dbAdminAnyDatabase", "clusterAdmin" ]

[23:19:00] <drags> I can't seem to find any other roles to apply to grant more access.. how can I restore access to the rs.status() command?

[23:24:14] <Boomtime> @drags: it should work already

[23:24:16] <Boomtime> http://docs.mongodb.org/manual/reference/built-in-roles/#clusterAdmin

[23:24:51] <Boomtime> clusterAdmin grants clusterManager grants replSetGetStatus (rs.status)

[23:25:17] <Boomtime> so.. what does db.auth() return when you authenticate?

[23:25:43] <drags> Boomtime: "1" (minus quotes)

[23:26:02] <Boomtime> ok, and what database are you auth'ing against?

[23:26:18] <drags> admin

[23:27:02] <drags> Boomtime: is there a method for showing the roles granted to the current user?

[23:27:42] <Boomtime> not sure, but i would just look at the user creds doc

[23:28:21] <drags> Boomtime: hrm, welp, I made a new user with the roles listed above and it seems to work

[23:28:21] <Boomtime> system.users.findOne( { user: "superuser" } )

[23:28:47] <drags> I had actually forgot the clusterAdmin role on the superuser account initially, so I removed and re-added that user.. it may be possibe I wasn't use'ing admin at the time

[23:28:54] <Boomtime> i think the roles on your superuser are not what you think

[23:29:08] <Boomtime> ah, yep

[23:29:29] <Boomtime> yeah, being in the wrong database at the time is a common/easy mistake

[23:29:57] <drags> doh, yar, running that findOne command (with db. prepended) shows the "superuser" user was still missing the clusterAdmin role

[23:30:06] <drags> thanks for walking me through this Boomtime :)

[23:30:16] <Boomtime> cheers

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 8th of October, 2014