pmxbot IRC Log Viewer

[02:06:41] <appledash> Uhh

[02:06:51] <appledash> So I'm inserting stuff into mongodb

[02:06:57] <appledash> A lot of stuff

[02:07:08] <appledash> It was at 98k rows inserted

[02:07:11] <appledash> I went to bed for 8 hours

[02:07:16] <appledash> now it is at 834k

[02:07:20] <appledash> why is it so slow?

[02:07:40] <joannac> indexes?

[02:07:54] <appledash> Are you suggesting I create indexes or are you asking if I have any?

[02:08:54] <joannac> do you have any?

[02:09:04] <appledash> Yes

[02:09:07] <appledash> I have one

[02:09:52] <joannac> are you maxing out i/o?

[02:09:55] <joannac> where's the bottleneck?

[02:10:49] <appledash> CPU seems fine, network isn't in the equation because this is a local server, and as for I/O... I'm not sure

[02:11:24] <appledash> dstat says I'm writing about 4000k/sec. Would that max out a standard desktop HDD? (I think the l is actually capital K as in kilobytes, not kilobits.

[02:11:26] <appledash> )

[02:11:29] <appledash> The k* not the l.

[03:00:16] <appledash> joannac: ?

[03:20:48] <d4rklit3> hi

[03:21:00] <d4rklit3> db.tweets.find().sort({"created_at":-1}).limit(5000) am i using limit wrong?

[03:31:29] <Boomtime> d4rklit3: as a shell command that looks fine

[03:37:46] <d4rklit3> it was robomongo

[03:37:48] <d4rklit3> hard limiting

[03:41:35] <Boomtime> righto

[03:47:28] <CipherChen> what's the internal command for db.getReplicationInfo

[03:49:58] <CipherChen> i need to get the info by pymongo

[03:50:16] <Noxywoxy> Quick Q: Is anyone able to recommend a consultancy service? I was looking here and I'm not really sure how to guage quality. Hoping a personal recommendation would help.

[03:50:19] <Boomtime> that summary is made locally from several queries

[03:50:34] <Boomtime> type "db.getReplicationInfo" at a shell prompt, it will show you the javascript

[03:52:22] <CipherChen> thks, i just forgot the js source code

[03:52:42] <Boomtime> Noxywoxy: https://www.mongodb.com/products/consulting

[04:05:21] <joannac> appledash: I can't answer that for your HDDs, you need to test

[04:05:47] <appledash> Hmm

[04:20:35] <wizzardx6> quit

[04:20:36] <wizzardx6> exit

[07:17:19] <n^izzo> hey all,

[07:18:07] <n^izzo> I need to count the number of documents referance annother document

[07:18:21] <n^izzo> this is what I have at the moment db.histories.aggregate([{ $match : { archived : false } }, { $group:{ _id: "$_zone", amt: { $sum: 1 } } }])

[07:18:58] <n^izzo> I want to group buy the value of _zone and count that

[07:22:07] <joannac> n^izzo: okay. and that doesn't work?

[10:11:32] <CipherChen> hi all, do your mongodb cluster size execeed 12 nodes, is there exists an good solution to solve this

[10:33:19] <Gargoyle> CipherChen: What needs solving?

[10:35:24] <CipherChen> when my cluster exceed large enough that i need more than 12 slaves for querying

[10:37:16] <CipherChen> when i need a lot nodes for reading, for example 100 nodes

[10:38:17] <Gargoyle> CipherChen: Use sharding?

[10:41:20] <CipherChen> we cannot now, for that we deploy multi data center and the data in each center are totally the same

[11:19:56] <CipherChen> I'm running 2.4, if I want to upgrade to 2.8, when it says "To upgrade an existing MongoDB deployment to 2.8, you must be running 2.6.", do it mean that the WHOLE rs has to being in 2.6 first or just the upgrading-node being running in 2.6?

[12:08:45] <joannac> CipherChen: why are you upgrading to 2.8? testing?

[12:20:22] <markotitel> What needs to be done in order for mongod logs queries to a file? I have enabled "profile =2" and "slowms = 30" in config file.

[12:20:30] <markotitel> But still nothing :(

[12:36:17] <OrenDB> Hi there - "query not recording (too large)"

[12:36:22] <OrenDB> How can i pass this

[12:36:33] <OrenDB> and to print the query

[12:37:30] <OrenDB> I know the query is large but i still want to print it...

[13:00:11] <OrenDB> Someone?

[13:08:18] <joannac> markotitel: you should see it in the mongod log as well as in the profile collection

[13:13:38] <markotitel> joannac, it seems that still there is no slow log, DB is not used at all for now so possibily no slow queries. Should I see read queries also?

[13:19:57] <joannac> markotitel: yes, for queries > 30ms

[13:20:58] <joannac> but all queries, even those <30ms, will go into the system.profile collection

[13:24:43] <markotitel> joannac, so mongo will not write all queries into file?

[13:24:59] <markotitel> just slow ones

[15:23:47] <amk> hi all

[15:51:51] <Lingo__> hi. I have a collection with documents such as { id: 1, version: 1 }, { id: 1, version:2 }, { id: 2, version: 1 }. is it possible to construct a query that gets me the document with greatest version for each id?

[15:53:08] <Lingo__> so my query would return [{ id: 1, version: 2}, { id: 1, version: 1 }]...

[15:55:20] <amk> Can someone give me a hint where to look for causes of multiple lingering connections (ex. 810 open connections on a low low low traffic app db)? Is a the event of a connection not being closed always caused by the client?

[15:56:49] <amk> Lingo__, look at http://docs.mongodb.org/manual/reference/operator/query/gt/

[15:57:31] <Lingo__> amk: I need the max value, regardless of it's actual value.

[15:57:59] <Lingo__> version could just as easily be 1234234, and will be different for every id

[15:58:28] <Lingo__> I basically want select * from collection where id = X have version = max(version)

[15:58:36] <Lingo__> or something along those lines

[16:03:39] <amk> Lingo__, in other words 'find documents with distinct ID, limit to 1, sorty by VERSION' ?

[16:03:48] <Lingo__> yes

[16:04:45] <Lingo__> amk: yes. 99% sure it's not possible with find(), thinking it's not possible with aggregate either, but not sure.

[16:09:57] <amk> Lingo__, how about this http://stackoverflow.com/questions/14697332/mongodb-query-for-all-documents-with-unique-field

[16:13:20] <drorh> Hi. having a unique hashed index is supported?

[16:20:46] <Lingo___> amk: trying that approach gives me "exception: the group aggregate field 'id' must be defined as an expression inside an object" and I have an id property defined.

[16:22:13] <amk> Lingo___, this is a bit over my head now; but sounds like you're throwing it an _id where you should be throwing an {something}

[16:22:24] <amk> I'm afraid I can't help you out mor

[16:22:38] <Lingo___> thx

[17:25:58] <OrNix> hi. tell me please how it is better to make replica set with 2 nodes? On which node should be started arbiter: on master or on slave?

[17:35:25] <GothAlice> OrNix: Your question is non-sensical. An arbiter is its own thing, separate from a primary or secondary or any other data-storing process.

[17:36:03] <OrNix> GothAlice, i understand it. But is it possible to make rs with 2 nodes?

[17:36:09] <GothAlice> OrNix: http://docs.mongodb.org/manual/tutorial/deploy-replica-set/ — Give this a good read, in addition to http://docs.mongodb.org/manual/core/replication-introduction/

[17:36:41] <GothAlice> OrNix: Yes, it certainly is possible. It's not high-availability unless you have three or more nodes, though, since if one of the two goes offline the other has no way of knowing if it's buddy went offline or it did.

[17:36:54] <GothAlice> (Thus; in a "two data storage replicas" setup, add an arbiter.)

[17:38:06] <GothAlice> All the arbiter does is voice an opinion in primary elections.

[17:40:00] <OrNix> i understand the arbiter's role. So if i need 2nodes rs - is it better to disable failover and select PRIMARY by hands?

[17:40:37] <GothAlice> OrNix: The difficulty is that in that arrangement, if the SECONDARY goes down, the primary will demote itself to a *read-only* secondary.

[17:40:55] <GothAlice> I.e. even if the non-critical part of the replica set goes away, the critical part fails.

[17:41:11] <GothAlice> So, no. It's never better to try to "manually" manage that.

[17:42:50] <GothAlice> (Using priority levels, i.e. a priority zero secondary, you can ensure a certain host will remain primary at all costs, or will remain primary if no other option is available. You do get control over how the cluster arranges itself, but it's best to use these provided tools to do that, rather than any form of manual intervention.)

[17:44:32] <GothAlice> A la: http://docs.mongodb.org/manual/tutorial/configure-secondary-only-replica-set-member/

[17:44:59] <OrNix> thanks

[17:45:33] <GothAlice> (Still throw an arbiter in there to ensure a failure in the secondary doesn't bring down the primary, though.)

[18:07:58] <flyingkiwi> we added a new shard to our rs. when are the moved chunks deleted (from harddrive and freed) on the old shard?

[18:08:24] <flyingkiwi> currently its still moving the chunks around

[19:21:08] <RaceCondition> does mongo support queries in the form ((A or B) and (D or E))?

[19:21:19] <GothAlice> Yes, using combinations of $or and $and.

[19:21:26] <RaceCondition> ok

[19:24:33] <GothAlice> RaceCondition: db.example.find({$and: [{$or: [{A conditions…}, {B conditions…}]}, {$or: [{D conditions}, {E conditions}]}]}) — not wrapping in $and would mean attempting to have two values for the same key ($or) at the top level. Ref: http://docs.mongodb.org/manual/reference/operator/query/and/ http://docs.mongodb.org/manual/reference/operator/query/or/

[19:25:06] <RaceCondition> GothAlice: yes, already got it

[19:37:49] <aliasc> Hi, all

[19:37:52] <aliasc> :)

[19:40:43] <Zelest> evening

[19:45:19] <aliasc> so how well mongodb works with c++

[19:45:37] <aliasc> i mean i tried the driver seems decent

[19:46:04] <GothAlice> It's an officially supported client language, and most of the server is written in it. ;)

[19:46:24] <aliasc> :D nice to hear

[19:46:42] <GothAlice> So much boost…

[19:48:33] <aliasc> actually we are from the gamedev community

[19:48:54] <aliasc> we are making a multiplayer game and mongodb seems to fit our case

[19:49:00] <aliasc> if its capable enough

[19:49:08] <aliasc> just to store user data

[19:50:30] <aliasc> i wonder how secure it is

[19:50:34] <cheeser> there's at least 1 or 2 game companies i know of that use mongo for that.

[19:50:43] <GothAlice> aliasc: As a note, since MongoDB documents are inherently dynamically typed, use within a statically typed language comes at a cost in terms of the amount of boilerplate code you will need to write. (This is especially true in Java which rather amplifies the effect.)

[19:50:45] <GothAlice> Having some form of RESTful service written in a dynamic scripting language which your game speaks to (service-oriented-architecture) may speed development, reduce bugs, and increase agility. :)

[19:51:05] <GothAlice> (And also allow for independent scaling of the services.)

[19:51:17] <aliasc> this is the reason we want mongodb

[19:51:19] <aliasc> scalability

[19:51:25] <cheeser> a (dubious) short term gain with a long term cost :D

[19:51:52] <aliasc> wait what. i dont care if 2 companies use mongodb for games.

[19:51:58] <aliasc> if it works. then it works

[19:52:10] <GothAlice> aliasc: Optimization without measurement is by definition premature. What're your expected pain points (things that may hit performance walls and require independent scaling)?

[19:53:07] <GothAlice> aliasc: I've used MongoDB with several gaming clients, mostly Facebook games, though. https://gist.github.com/amcgregor/4207375 is my section of a presentation on the distributed RPC system I built for one of them. (Link to the full Python+MongoDB+Facebook gaming presentation in the comments.)

[19:54:08] <aliasc> we still don't know if there will be impacts on performance.

[19:54:32] <aliasc> but if mongodb is fast and clients link directly then there should be no problems.

[19:54:43] <aliasc> of course we have server client software to manage the process

[19:54:47] <harttho> We're seeing load spike on our mongo shards (all three replica sets) every 15 minutes

[19:54:51] <GothAlice> aliasc: Wait; you'd have gaming clients connecting to mongod directly?

[19:54:57] <harttho> Are there any cron sort of things that would result in that

[19:54:58] <aliasc> nope.

[19:55:03] <aliasc> client/server.

[19:55:15] <aliasc> a server software that deals with mongodb

[19:55:21] <GothAlice> harttho: Do you have any TTL indexes?

[19:55:22] <aliasc> and the client. the actual game

[19:56:07] <aliasc> while testing currently the server shows coordinates of another player in 2D space

[19:56:18] <aliasc> sends them to the client.

[19:56:52] <aliasc> i dont plan to use mongodb for coordinates and stuff just to store user data

[19:57:04] <aliasc> but for curiosity we tried that and it seems to work well

[19:58:08] <aliasc> however if we plan to coop we need to store object data from the game to database. for example if we want users to save their coop progress together

[19:58:26] <GothAlice> aliasc: May I PM you?

[19:58:30] <aliasc> of course

[20:33:55] <tim_t> can anyone advise the best approach to implementing a mail system in mongo? right now i have a collection holding mail documents with a to: field for single users… a giant inbox as it were. i get the feeling this is not the best use of mongo but i cannot put a fine point on why

[20:35:44] <tim_t> well, when a user gets their email i simply query using the user as a filter. each document has a single user reference on it

[20:38:40] <shoerain> any thoughts on extensions like https://github.com/gabrielelana/mongodb-shell-extensions? it adds jsonpath, lodash, and moment to the mongorc.js

[20:38:56] <tim_t> no, its more like… say you have a letterbox at a house and a bunch of people are living there. each person has a letter addressed only to them, but it is still held in the one letterbox. my collection is the letterbox, my documents are the letters and just like when you bring the letters in you look at who it is addressed to and give it to them.

[20:39:40] <GothAlice> tim_t: Since e-mail data is typically read-only (excluding flags on the message; those are entirely different) you could attempt to replicate the top-level MIME encoding of the message (headers vs. body, each header as a field, etc.) and index the ones you care to search on. What you have is referred to as a multi-tenant collection.

[20:40:16] <GothAlice> Generally this isn't an issue, esp. if the data is indexed. With a substantial enough user base, the cardinality (uniqueness) of any given user's ID in the pool of all messages will allow for highly efficient querying. (The index will be used to great effect.)

[20:40:16] <tim_t> is the best use of mongo though?

[20:40:22] <GothAlice> tim_t: Define "best".

[20:40:51] <tim_t> lack of disadvantages

[20:41:23] <GothAlice> tim_t: What's the disadvantage of the current arrangement?

[20:41:36] <GothAlice> (All things are a trade-off.)

[20:42:14] <tim_t> nothing that i can tell. i am just curious if others had advice on a potentially better solution for issues i am not yet seeing

[20:43:43] <tim_t> like create an inbox upon account creation so each user gets their own

[20:44:10] <GothAlice> Maildir is better in that it's simpler, relies on existing atomic filesystem operations, and can exploit existing filesystem compression and encryption techniques without needing to even be aware of them. It's harder to network and scale. Using a collection-per-user will run into namespace count limits in MongoDB, but isolates user's data.

[20:44:11] <tim_t> i guess having wide-open choices has me second-guessing the wisdom of the choices i am making?

[20:44:22] <GothAlice> In terms of simplifying querying, what you've got is best.

[20:45:14] <GothAlice> Index on [user_id, mailbox_path, -recieved] — you can now answer "what messages are in mailbox X for user Y, ordered with newest on top" very efficiently.

[20:46:09] <GothAlice> (Where mailbox_path is something like "INBOX", "Sent", or "Projects.Marrow", etc.)

[20:46:17] <tim_t> okay thanks. isolation is not really an issue in my case so it looks like this is a good approach

[20:46:45] <GothAlice> (I'd recommend following IMAP conventions on mailbox naming for interoperability and consistency reasons.)

[20:48:18] <tim_t> great. i'll implement as advised. thanks!

[21:08:35] <harttho> :GothAlice Yes, we do have TTL indexes

[21:09:40] <GothAlice> That could very well be your issue. Compare the timestamps triggering cleanup vs. the periods of excessive load; you may have to begin staggering your TTL runs. (I.e. shift the dates around a bit. TTL indexes are minute-accurate.)

[21:09:45] <GothAlice> harttho: ^

[21:11:08] <hicker> Hi Everyone :-) I'm getting "exception: Cannot apply $addToSet modifier to non-array" when executing Practice.findByIdAndUpdate(req.params.PracticeId, { $addToSet: { users: req.params.UserId } } ... but users is an array, so I'm not sure why it's throwing an exception.

[21:11:57] <harttho> :GothAlice Will look into it, the the number of documents is small

[21:36:02] <harttho> What causes messages like these to pop up?

[21:36:10] <harttho> command collectionname.$cmd command: { dbstats: 1 } ntoreturn:1 keyUpdates:0 locks(micros) r:926075 reslen:255 926ms

[21:36:23] <GothAlice> harttho: That'd be a slow query.

[21:37:04] <harttho> Any what to find out what is causing the query?

[21:37:24] <GothAlice> See: http://docs.mongodb.org/manual/tutorial/manage-the-database-profiler/

[21:37:36] <GothAlice> It actually tells you.

[21:37:45] <GothAlice> $cmd running dbstats took an insane amount of time.

[21:38:28] <GothAlice> Looks to have been stuck in a read lock for nearly a second.

[21:40:06] <GothAlice> harttho: The system.profile collection will contain additional information; it includes client info, authenticated user, and the raw query, but you may need to enable profiling first.

[21:41:00] <harttho> Yeah, I'm confused on why the dbstats would take so long

[21:41:11] <GothAlice> They were stuck waiting for another operation to finish. Possibly indexing.

[21:41:24] <GothAlice> (Indexing blocks other administrative $cmds from operating against the collection being indexed.)

[21:42:05] <GothAlice> You'd need to go spelunking through your mongod logs to find out what it was, though.

[21:44:40] <harttho> Any way to see current admin operations only (so as to not have to sift through the rest)?

[21:45:19] <GothAlice> The profiler data is a collection like any other; formulate a query and run it. ;) (You can query the logged query to only the ones you care about.)

[21:46:22] <harttho> Okay, thanks :)

[21:47:10] <harttho> Also seeing command admin.$cmd command: { writebacklisten: ObjectId('5473af0118c6a1efae65b96b') } ntoreturn:1 keyUpdates:0 reslen:44 300000ms

[21:47:37] <GothAlice> Yeeeeeeeeek. That's a full five minute query.

[21:47:55] <harttho> gotta see what's causing it

[21:47:59] <GothAlice> What type of cluster are you running? And what type of write concern are you setting on your queries?

[21:48:35] <harttho> 2 shards with 3 replica sets

[21:48:51] <harttho> 3 config servers

[21:50:05] <GothAlice> So, {[shard:(replica, replica, replica), shard:(replica, replica, replica)], (config, config, config)} or {[replica:(shard, shard), replica:(shard, shard), replica:(shard, shard)], (config, config, config)}

[21:50:20] <harttho> the former

[21:50:50] <GothAlice> And the write concern you use by default? (And peak?)

[21:51:20] <harttho> yeah default write concern I believe

[21:54:43] <GothAlice> Are you using capped collections and tailing cursors? (a la the "awaitData" option?)

[21:54:52] <harttho> I don't believe so

[21:55:37] <GothAlice> Hmm. Well, long-running writebacklisten queries are OK, they're just idle.

[21:55:55] <GothAlice> (writebacklisten is an internal command; not one generated in your own code)

[21:56:01] <harttho> Also seeing (once the load starts)

[21:56:02] <harttho> getmore local.oplog.rs query: { ts: { $gte: Timestamp 1415929198000|76 } } cursorid:1662888153033004078 ntoreturn:0 keyUpdates:0 numYields: 1 locks(micros) r:226 nreturned:2 reslen:924 2167ms

[21:56:19] <harttho> and command admin.$cmd command: { listDatabases: 1 } ntoreturn:1 keyUpdates:0 locks(micros) R:799 r:34188 reslen:28407 9040ms

[21:56:51] <GothAlice> You must be missing an index on ts. Nothing in normal operation should really query listDatabases… and that read lock is nuts. There is certainly something up with your current configuration.

[21:57:07] <GothAlice> SSDs or platter disks on those hosts?

[21:57:41] <harttho> platter

[21:57:54] <GothAlice> What's your IO load like?

[21:59:02] <GothAlice> joannac: MMS can handle 2x3 sharded replica sets, right?

[21:59:29] <GothAlice> (In terms of provisioning, that is.)

[22:00:13] <harttho> seeing around 1K/s read and write, occasionally write spiking to 20K

[22:01:55] <harttho> During the load

[22:02:07] <harttho> around 10% max load

[22:02:21] <harttho> (io load) server load higher

[22:02:32] <GothAlice> Last question before I need to run: filesystem?

[22:02:59] <harttho> xfs

[22:03:56] <GothAlice> harttho: Could you run "mongotop/mongostat" and see what your read/write latencies are, and if they're coordinated with that bulk load operation? If so, to prevent hosing your database each time, you may have to slow that import down. If not… I'm at a loss. I'd spin up a staging environment for more vigorous testing that won't impact production.

[22:03:58] <harttho> well, once the load goes down we'll try profiling it

[22:05:07] <harttho> bounces around

[22:05:19] <GothAlice> Enabling more than slow query profiling will have a deleterious effect on performance, FYI. mongotop/mongostat require no changes to the profiling configuration; you're looking for excessive lock percentages.

[22:05:19] <harttho> 100ms -> 1200ms

[22:05:32] <GothAlice> Also faults.

[22:05:47] <harttho> Oplog near the top a good amount of time

[22:06:20] <GothAlice> … "near the top"? Your mongotop/mongostat output isn't just a stream?

[22:06:28] <harttho> Locks bounce around too, and 10 or so faults per replicate set a second

[22:06:57] <harttho> Yeah they're streams

[22:07:54] <harttho> config server hitting 2K+ queries, too

[22:08:14] <GothAlice> That'd be sharding doing its thing.

[22:09:09] <harttho> 750 'get mores' on 'local' db pretty consistently

[22:09:57] <GothAlice> Could you pastebin/gist your db.serverStatus() output?

[22:11:47] <GothAlice> http://cl.ly/image/3n312V3Z1B3N — also sometimes amusing. Apparently my DB host has been up for so long that I've exceeded 48 bits of integer accuracy in counting how long locks have been held. XD

[22:13:53] <harttho> http://pastebin.com/LB9ctBBf

[22:14:13] <GothAlice> Ahm, no, all of the output of that command, not just lock counters.

[22:14:22] <GothAlice> Also those local db lock counters are insane.

[22:14:43] <GothAlice> What's the uptime of that process? :/

[22:16:09] <harttho> http://pastebin.com/1buuAV1j

[22:16:28] <harttho> (there is some missing data, sorry for the that, don't want to waste your time obfuscating the data)

[22:16:32] <harttho> update is 316 days

[22:16:36] <harttho> uptime*

[22:22:39] <GothAlice> accessesNotInMemory: 27543272, pageFaultExceptionsThrown: 17166926, page_faults: 2370184717, average_ms: 802, asserts.warning: 85896

[22:22:45] <GothAlice> All of these are clear indications if impending doom.

[22:23:17] <harttho> Any way to clear the stats to see what is happening now vs the previous 300ish days?

[22:23:24] <harttho> Thanks for all of the assistance btw

[22:23:26] <GothAlice> Warning level assertions should appear in the mongod log. Page faults indicate a clear need to continue to scale up (add more RAM) or horizontally (further subdivide into additional shards).

[22:23:50] <GothAlice> 'Cause it's causing insane lag to wait for the kernel to pull in pages off-disk, swapping out others in the process.

[22:24:28] <GothAlice> You average *disk flush* is taking nearly a full second.

[22:25:50] <GothAlice> (noatime,nodiratime,notail FTW)

[22:26:58] <GothAlice> Eek, you're also running 2.4.9.

[22:30:01] <GothAlice> http://cl.ly/2c2g2I2m1T10 are the issues resolved (37 of them) between 2.4.9 and 2.4.12, many of which are _extremely_ important fixes. (Data corruption, reliability, filesystem, security, and performance.)

[22:31:36] <GothAlice> harttho: ^ I have to run, but I'll be back in a bit.

[22:33:02] <harttho> GothAlice: thanks

[22:35:06] <styles> https://privatepaste.com/23e314d5f5

[22:35:34] <styles> This query works, but if you remove MINUTE and only count by hour, it does nothing

[22:42:32] <harttho> upgrading to .12

[22:42:37] <GothAlice> styles: Remove which "minute" precicely?

[22:43:00] <styles> I actually think it works GothAlice it's my code that generates this query

[22:43:16] <styles> Better question, are there any tools to realtime debug queries sent to mongo?

[22:43:32] <GothAlice> You could always tail the oplog.

[22:43:37] <GothAlice> (There are some tools to do this, yes.)

[22:43:53] <GothAlice> http://blog.mongolab.com/2012/06/introducing-dex-the-index-bot/ is one of them (note the follow-up articles)

[22:43:55] <styles> what do you recommend

[22:44:04] <styles> ah

[22:44:39] <GothAlice> (We keep a copy of "notable events" in the oplog and store them as an audit log and version tracking.)

[22:45:37] <styles> Ok so I need to understand oplog better I'll go read the docs on that

[22:45:37] <styles> Thanks

[22:46:12] <styles> Curious GothAlice how much data is in your guys mongo cluster?

[22:46:20] <styles> I'm using it for every HTTP request & calculate stats

[22:46:26] <GothAlice> styles: My at home dataset is 25 TiB.

[22:46:32] <styles> woah

[22:46:34] <styles> what's it of?

[22:46:43] <GothAlice> Everything digital I have ever touched since 2001/2002 or so.

[22:46:54] <styles> gridfs?

[22:46:58] <styles> You're using GridFS*?

[22:46:58] <GothAlice> (De-duplicated, compressed, and searchable.) Aye.

[22:47:04] <styles> I want that :(

[22:47:06] <GothAlice> With some serious hacks on top of it. ;)

[22:47:21] <styles> Do you have like a web UI / mounted drive that mimics GridFS?

[22:47:27] <GothAlice> (I added full-text indexing to my codebase before such a thing existed in MongoDB itself. Same with compression.)

[22:47:29] <styles> what servers are you running at home?

[22:47:32] <styles> Oo

[22:47:49] <styles> I was looking for a few home servers for long running tasks & testing

[22:47:54] <GothAlice> Three Dell 1Us and three Drobo 8-something-i arrays.

[22:48:17] <GothAlice> (iSCSI from Drobo to paired server, that MongoDB cluster exposing GridFS to a local FUSE filesystem.)

[22:48:19] <styles> How much did you get those Dell 1us for?

[22:48:28] <GothAlice> The dell boxen were… $600 ea?

[22:48:30] <styles> Curious why GridFS over... like FreeNas?

[22:48:37] <styles> omg I want that

[22:48:45] <styles> Dell C class servers? those cloud ones?

[22:48:48] <GothAlice> Because my filesystem represents an infinite number of nodes stored in an infinite number of directories infinitely deep.

[22:48:58] <GothAlice> No, these are ancient beasts. AMD Opterons, even.

[22:49:11] <styles> I want to mimic that exact setup

[22:49:40] <styles> I have like 5 processes I consistantly have to run on my local machine that's just a daunting task & my make shift 6TB raid here (local comp) is running out

[22:49:44] <GothAlice> I priced it out; most cloud storage providers would cost me ~$2K/terabyte/year. (compose.io would cost me half a million a month)

[22:49:51] <styles> Did you buy them from dell?

[22:50:09] <styles> Yeah plus you have the security of it being at home

[22:50:10] <GothAlice> Nah; the 1Us were second-hand from a startup that was shutting down. XD

[22:50:15] <styles> :D

[22:50:16] <styles> That's the best

[22:50:18] <GothAlice> Also unlimited electricity is covered by my rent.

[22:50:20] <GothAlice> >:D

[22:50:34] <styles> LOOOOOOL

[22:50:38] <GothAlice> I triple-checked the contract with my landlord.

[22:50:42] <styles> hahahahahahaha

[22:51:08] <styles> It's like the story of a guy that had a whole rack at home and was using liek 24TB of bandwith a month with Verizon

[22:51:15] <GothAlice> …

[22:51:19] <styles> they got so mad

[22:51:25] <styles> hahaha

[22:51:28] <styles> Yeah same

[22:51:31] <styles> I get Google Fiber soon too

[22:51:35] <styles> So I'm going to abuse the crap out of it

[22:51:47] <styles> I want to ramp up and have my home lab up by then

[22:52:20] <GothAlice> Bell keep coming by to try to sell me "Bell Fibe", and I keep having to teach them that they're not selling me fiber, they're selling me ADSL last-mile to the concentrator on my apartment complex roof, *then* it's fiber. They don't offer unlimited unless you bundle, and I can only tell them I don't own a home phone or television…

[22:52:48] <styles> LOL

[22:53:00] <ianp> I'm surprised people still pay for home phones

[22:53:03] <styles> It's sad when you know more than the people hocking you services :-/

[22:53:13] <styles> ianp, it's not a bad idea tbh

[22:53:19] <GothAlice> … they try to sell me the bundle anyway, as the "solution" to my "unlimited" problem. Then I tell them I have offsite backup of 25 TiB of data, and they back away from the door.

[22:53:20] <ianp> I already have a cell phone

[22:53:28] <styles> If there's an emergency chances are you're mobile towers will go out or be over loaded fast

[22:53:49] <GothAlice> styles: That's why I run my own VoIP, too.

[22:53:50] <GothAlice> ¬_¬

[22:54:00] <styles> heeh

[22:54:16] <styles> Side question, any recommendation on getting cheap Dell 1Us?

[22:54:17] <ianp> never though of that... but what kind of large scale emergency would i need that

[22:54:29] <GothAlice> styles: Dumpster diving after a .com crash.

[22:54:39] <GothAlice> (Guaranteed to win.)

[22:54:50] <styles> ianp, tons tornados, earth quakes etc..

[22:55:04] <styles> mostly natural disasters

[22:55:19] <styles> GothAlice, any advice on finding .com crashes? :D

[22:55:21] <GothAlice> Civil disruption is also becoming more of a risk in NA.

[22:55:23] <ianp> I figure I'm screwed beyond the point of the phone helping in that case.

[22:55:29] <styles> GothAlice, yeah

[22:55:36] <styles> ianp, nope

[22:55:46] <styles> most of the time people are just freaking out contacting loved ones making sure they're ok

[22:55:51] <ianp> yea

[22:56:03] <styles> GothAlice, you're set

[22:56:21] <GothAlice> SDRs FTW

[22:56:22] <styles> One of the things I indend on doing is backuping up wikipedia every week

[22:56:34] <l2trace99> does anyone know what permissions i need to create users from a script ?

[22:56:36] <GothAlice> styles: That's included in my exocortex. (The name for that at home project.)

[22:56:46] <styles> open sou rce that shti!

[22:57:10] <GothAlice> l2trace99: userAdmin or userAdminAnyDatabase

[22:57:28] <ianp> that sounds pretty neat.

[22:57:31] <GothAlice> (userAdmin only applies to an admin user of one database, adminUserAnyDatabase would apply to an admin in the 'admin' database.)

[22:57:34] <ianp> I want to buy the cryptome DVDs too

[22:57:38] <styles> GothAlice, I've been looking at... uh

[22:58:03] <styles> https://camlistore.org/

[22:58:13] <styles> This is sorta interesting, but I like your MongoDB approach better

[22:58:21] <GothAlice> styles: If I attempted to open it, I'd be in violation of a dozen patents that I'm aware of, and probably many that I'm not.

[22:58:38] <styles> watttt?

[22:59:05] <GothAlice> http://www.freshpatents.com/Framework-for-the-dynamic-generation-of-a-search-engine-sitemap-xml-file-dt20081023ptan20080263005.php for one.

[22:59:43] <l2trace99> GothAlice: is that true for 2.6.5 ?

[23:00:00] <GothAlice> l2trace99: Should be true for most if not all of the 2.6.x series.

[23:00:35] <GothAlice> l2trace99: I hope you mean the userAdmin/userAdminAnyDatabase thing, not the patent thing. ¬_¬

[23:01:05] <l2trace99> yeah userAdmin thing

[23:01:15] <l2trace99> I have a provisioning script that was working for 2.4

[23:01:24] <l2trace99> but fails with 2.6

[23:01:40] <GothAlice> https://gist.github.com/amcgregor/c33da0d76350f7018875#file-cluster-sh-L76-L79 is the "add_user" chunk of my own staging provisioning script.

[23:01:44] <GothAlice> l2trace99: ^

[23:02:05] <styles> GothAlice, with dex does it show agg functions?

[23:03:40] <GothAlice> styles: AFIK, no? I test the first few stages ($match, $sort, $skip, $limit) as a normal query first, anyway.

[23:03:58] <styles> damn dex probably wont work then

[23:04:10] <GothAlice> (Those are the only bits that'll be effected by an index in an aggregate query, and only if they lead the pipeline.)

[23:04:13] <styles> I have an agg function in Go and it's not returning the same results as me doing it by hand

[23:04:20] <styles> So I want to see what's being generaetd and can't find a nice way tod o it

[23:04:46] <GothAlice> Alas you'd have to inspect the implementation of the driver to see if you can get the "raw" SON being sent over the wire.

[23:05:17] <styles> https://privatepaste.com/05531d8f23

[23:05:22] <styles> yeah blah

[23:05:24] <styles> I couldn't find much

[23:05:32] <l2trace99> GothAlice: yeah this going to make me sad

[23:05:42] <GothAlice> l2trace99: How so?

[23:05:45] <l2trace99> I was doing via the php driver

[23:05:54] <l2trace99> I was calling eval

[23:06:22] <GothAlice> PHP is not a general purpose scripting language, no matter how much it tries to be one…

[23:06:35] <harttho> what's an acceptable accessesNotInMemory and pageFaultExceptionsThrown per second?

[23:06:42] <harttho> (Or is 0 the goal)

[23:06:44] <GothAlice> harttho: Zero for most datasets.

[23:07:20] <GothAlice> My at home dataset is highly "at rest", and waaaaay larger than any amount of RAM I could throw at it, so expect some when you have terabytes of data.

[23:07:23] <styles> Kinda off topic, but since we know Google has been running document stores for ages now, I wonder what they have running that people don't know about :P

[23:08:04] <l2trace99> GothAlice: thanks I have some pain ahead of me

[23:08:15] <styles> GothAlice, do you work for Google?!

[23:08:23] <GothAlice> No…

[23:08:26] <styles> o ok haha

[23:08:45] <GothAlice> But when NDAs are involved, it's easier to simply avoid the whole subject. ¬_¬

[23:09:30] <styles> Why? You don't like being sued? what's wrong with you :D

[23:10:29] <GothAlice> :P

[23:11:11] <GothAlice> "Did I guess this, or was I actually told this at some point?"

[23:12:05] <GothAlice> l2trace99: Feel free to adapt my script; all it really needs is some "ssh" commands wrapped around everything to execute these commands remotely.

[23:12:55] <GothAlice> l2trace99: Or, even more highly recommended, provision using MMS. https://mms.mongodb.com/

[23:13:04] <styles> GothAlice, yep!

[23:13:12] <styles> Yeah looks like mgo logging doesn't even show agg functions

[23:13:15] <styles> fml

[23:13:18] <l2trace99> GothAlice: this is all wrapped within a beanstalk worker

[23:13:40] <l2trace99> GothAlice: it creates the db and the user account to access it

[23:15:21] <l2trace99> it was until we upgraded to 2.6

[23:15:26] <GothAlice> XD

[23:17:10] <l2trace99> GothAlice: nice I am going to use the same tech as your script tho

[23:17:45] <GothAlice> Bash: one of the most highly underrated, and single most deployed scripting language on the planet.

[23:17:45] <GothAlice> :)

[23:18:50] <l2trace99> GothAlice: yes but it make me itch. When things get heavy I chicken out and go perl

[23:18:56] <l2trace99> GothAlice: but this had to be php

[23:19:24] <GothAlice> Because perl is a light-weight solution? O_o Perl is the the peg that fits any hole… by changing the meaning of the hole. XP

[23:20:43] <l2trace99> GothAlice: well yeah.

[23:21:01] <l2trace99> GothAlice: I just hate text manipulation in bash

[23:21:09] <l2trace99> GothAlice: math in bash

[23:21:20] <l2trace99> GothAlice: functions in bash

[23:21:23] <l2trace99> GothAlice: ;)

[23:21:43] <GothAlice> Hey, look at the functions in my example here. One even defines a default value for an argument.

[23:21:56] <GothAlice> Functions are the least of bash's worries. Text manipulation… sure. That's why the gods invented sed/awk/ed/etc. ;)

[23:21:58] <GothAlice> #truepowerofunix

[23:23:46] <GothAlice> (My favourite example from my automation is this: git diff --name-only @{1}.. | xargs qfile -Cq | xargs --no-run-if-empty equery files | grep init.d/ | sort -u — that's the pipeline that generates a list of services to reload/restart prior to doing so after a git pull. Yes, I automate using git hooks and the… #truepowerofunix.)

[23:24:32] <styles> I prefer Go :(

[23:24:43] <styles> Go, Bash, Python, PHP

[23:24:46] <styles> <3

[23:25:16] <styles> lol

[23:25:24] <styles> Clueless specs? :D

[23:25:36] <styles> Go is by far my favorite, it's just simple and seems to work

[23:25:43] <GothAlice> https://gist.github.com/amcgregor/a816599dc9df860f75bd — have some sample code.

[23:26:04] <styles> ewww

[23:26:11] <GothAlice> It's uses a mutable tree-based parser that allows you to redefine the syntax as you go.

[23:26:14] <GothAlice> >:D

[23:26:25] <GothAlice> (It also already has a JIT.)

[23:26:29] <bmillham> Hi all, new to MongoDB, playing trying to learn, and I've run into a problem with aggregate and unicode.

[23:26:39] <GothAlice> bmillham: How so?

[23:27:03] <styles> GothAlice, that's pretty cool :D

[23:27:14] <styles> I wanted to make a lang as pratice one of these days, don't have the time this second

[23:27:17] <bmillham> I have a document, Artists and am trying to get a count of artists by the first letter of their name

[23:27:28] <bmillham> It works, but mangles the unicode

[23:27:46] <styles> How does it mangle unicode? What language did you write it in too?

[23:28:21] <bmillham> Here's the command from the mongo utility:

[23:28:24] <GothAlice> bmillham: I'd need to see the code you've written.

[23:28:29] <GothAlice> Pastebin it, don't paste it here!

[23:28:34] <GothAlice> (Or gist.)

[23:28:35] <bmillham> > db.albums.aggregate({$group: {_id: {$substr: [ {$toUpper: "$name"}, 0, 1]}, count: {$sum: 1}}}, {$sort:{_id:1}})

[23:28:41] <bmillham> Opps, sorry

[23:29:06] <GothAlice> Could you gist/pastebin the query and its output?

[23:29:11] <styles> bmillham, I suspect it's the code you used to query / input the data

[23:29:14] <GothAlice> (I'd love to see this mangling.)

[23:29:24] <bmillham> According to the mongodb docst, $substr is not unicode save

[23:29:27] <bmillham> *safe

[23:29:36] <GothAlice> No, it's a byte-level operation.

[23:29:57] <GothAlice> Not to mention you have issues with combined or split accents.

[23:32:29] <bmillham> Here's the pastebin: http://pastebin.com/tvMcWcQf

[23:42:29] <harttho> :GothAlice is there a way to set profiling level of all db's? or is it just the one you're currently using?

[23:42:33] <bmillham> I do know that the data in the database is good, because I can display a complete document just fine, with the unicode

[23:43:59] <GothAlice> harttho: Per-db.

[23:44:01] <GothAlice> AFIK.

[23:44:17] <harttho> So i'd need to set it on the admin db?

[23:44:32] <harttho> (For the above slow queries I pasted)

[23:44:59] <Boomtime> harttho: if you are happy to read log files instead of the profiling colection: http://docs.mongodb.org/manual/reference/parameters/#param.logLevel

[23:45:19] <Boomtime> you can cause all ops to be logged if you need to

[23:45:41] <harttho> gotcha, that's helpful

[23:48:11] <GothAlice> bmillham: Indeed. However, what's the difference between: "ù" and "ù"?

[23:49:01] <GothAlice> (Hint: the second is one character "\u00F9", the first is two: "\u0060\u0075".)

[23:49:28] <GothAlice> bmillham: Unicode is hard.

[23:50:36] <GothAlice> Sorry, not \u0060, \u02CB.

[23:51:26] <GothAlice> In raw UTF-8 these turn into the byte sequences (in hex): C3 B9, or CB 8B 75.

[23:51:38] <bmillham> I can handle little things like that. I have web page that's currently written in php with mysql. I'm considering switching to Django/MongoDB.

[23:51:56] <GothAlice> bmillham: "Slice" of the first byte will give you an invalid character. Slice of the first two will either give you the ù or only the `.

[23:52:05] <bmillham> But I have to figure out a workaround for this issue, because it's important.

[23:52:36] <bmillham> I played with that and saw that behaviour.

[23:52:36] <GothAlice> bmillham: Avoid Django. Django is what happens when a cluster bomb goes off in code: disjoint parts and shrapnel everywhere. (I recommend any form of µframework like WebCore, or Flask. Full disclosure: WebCore is mine.)

[23:52:49] <GothAlice> (Django, along with Redis and Celery, killed my last project.)

[23:53:28] <GothAlice> bmillham: If that particular query matters, extract the first literal character, strip accents (ensure ASCII), and store that separately.

[23:53:35] <bmillham> Is WebCore python based?

[23:53:38] <GothAlice> bmillham: Sure is.

[23:53:44] <styles> lol

[23:53:52] <bmillham> I'll have a look at that GothAlice

[23:53:53] <styles> I stopped using frameworks a while back, full go net/http now

[23:54:09] <GothAlice> bmillham: https://github.com/marrow/WebCore/tree/master/examples and http://web-core.org

[23:54:26] <styles> GothAlice, dex doesn't seem to show agg funcccnsnsnsn is there anything else I can do?

[23:54:28] <bmillham> I always have problems with frameworks. It seems like I spend more time trying to get the framework to do what I want...

[23:54:40] <GothAlice> styles: I find OO abstractions of request/response handy, and mapping URLs to controller methods handy, and that's about all WebCore does by default. ;)

[23:54:52] <GothAlice> Oh, and using templates and JSON serialization.

[23:54:54] <styles> yeah

[23:54:58] <styles> Gorilla does most of that in Go

[23:55:13] <GothAlice> styles: Indeed. µframeworks FTW. (There's a reason they're µ.)

[23:55:18] <styles> lol

[23:55:38] <bmillham> The site I'm looking to convert is: http://bmillham.millham.net It's a request site for listeners for my internet radio show.

[23:55:44] <styles> I had a problem w/ uhh crap I think CakePHP at one point w/ unicode. there wer emovies in the db not in english and it'd break

[23:55:57] <GothAlice> styles: And no, dex doesn't show aggregate function output. But again, extract the query and run as a normal find().

[23:56:13] <styles> What's the diff in agg and a normal find?

[23:57:14] <styles> nvm blah

[23:57:16] <GothAlice> styles: Aggregates aren't logged. :P

[23:57:33] <GothAlice> bmillham: https://github.com/amcgregor/TSA-Timeline is an example "light-weight" project that still has some minimal structure to it.

[23:58:01] <bmillham> If you want to see what I'm trying to replicate, go to my site, authenticate so you can browse the site. Select Browse: By Artist and you will see a list of letters at the top of the page

[23:58:43] <GothAlice> bmillham: http://cl.ly/image/010d1J1h392r Damn your captcha.

[23:59:06] <bmillham> Sorry, I hate it also, but it was for preventing spam.

[23:59:16] <GothAlice> Hashcash FTW. ;P

[23:59:40] <GothAlice> Yup, totally manually extract the first literal character and store that separately.

[23:59:58] <GothAlice> (It'll also deal with that Hebrew and Chinese/Japanese.)

Log file Viewer

Help | Karma | Search:

#mongodb logs for Thursday the 4th of December, 2014