PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 18th of February, 2015

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:25:01] <BadHorsi1> I'm wondering how to save an IPv4 address (not in string notation) in the mongo shell... So like saving a 4 bytes value...
[00:27:08] <Boomtime> BadHorsie: what do you mean by "4 bytes value" can you give an example?
[00:27:20] <Boomtime> do you mean a 32bit integer?
[00:27:36] <Boomtime> perhaps an array of 4 byte values?
[00:29:54] <BadHorsie> Array of 4 bytes is OK I guess
[00:33:38] <BadHorsie> Like: perl -e 'use NetAddr::IP::Util qw(inet_aton);print inet_aton("192.168.1.1");'|od -c
[00:33:43] <BadHorsie> 0000000 300 250 001 001
[00:34:14] <BadHorsie> That's what it returns... So hrmmm \300 or something like that ?
[00:34:59] <Boomtime> this sounds like a data formatting problem, i think you need to decide how you want the data stored before trying to figure out how to do it
[00:56:25] <MacWinner> i have a sharded cluster with rs1 and rs2.. within rs1, I have a test_db.testcollection which I want to shard.. i've gone through the steps, but for some reason on mongos, if I do db.printShardingStatus(), it shows on rs2 as having chunks.. even though all the data is on rs1.. also, I don't see any entry for chunks on rs1
[00:56:49] <MacWinner> shouldn't the chunks be on rs1? And then shouldn't they be moved to rs2?
[00:57:31] <MacWinner> the db is about 6GB
[01:01:09] <Boomtime> MacWinner: can you put your shard status in a gist?
[01:01:28] <MacWinner> sure
[01:02:26] <MacWinner> Boomtime: http://pastebin.com/ueekTQrS
[01:10:27] <Boomtime> MacWinner: you say all the data is on rs1, how do you know this?
[01:11:13] <MacWinner> Boomtime, I'm connected via mongoshell to rs1 and I see the data there.. also I ran mongorestore against rs1.. (i had mongodumped from our production database)
[01:11:40] <Boomtime> so you pushed datya direct to a shard and expected the rest of the cluster to somehow know this?
[01:12:00] <MacWinner> i pushed it into a shard, then ran these commands to shard the collection:
[01:12:10] <Boomtime> if you have a sharded cluster you must do all operations through a mongos or you are on your own
[01:12:38] <MacWinner> sh.enableSharding('test_db')
[01:13:12] <Boomtime> "i pushed it into a shard" <- this data effectively doesn't exist
[01:13:31] <MacWinner> Boomtime, oh.. i see.. so I should run mongorestore against mongos?
[01:13:38] <Boomtime> correct
[01:13:40] <Boomtime> always
[01:14:03] <MacWinner> Boomtime, but what if I'm converted from a single replica set environment into a sharded cluster?
[01:14:07] <Boomtime> never talk direct to a shard unless you are doing something that relates directly to that shard - replica-set config, for example
[01:14:48] <MacWinner> isn't it possible for the original replica set to have all the data without it being pushed via mongos? (because mongos was not setup initially)
[01:14:53] <Boomtime> in order to upgrade from a single replica-set to sharded cluster, the upgrade necessarily involves adding your existing replica-set as the first shard
[01:14:58] <Boomtime> (and only shard to start with)
[01:15:36] <MacWinner> ahh.. ok..
[01:16:22] <MacWinner> i think I was skipping a step.. I setup a sharded cluster first, then imported all the data directly into a shard to try to simulate my initial state before setting up a shard key and such..
[01:17:32] <Boomtime> yeah, don't do that
[01:17:32] <MacWinner> Boomtime, thanks for the help! i'm gonna redo everything and try again
[01:17:39] <Boomtime> :D
[01:17:49] <MacWinner> this time doing mongorestore through the mongos
[01:17:56] <Boomtime> excellent
[01:18:28] <MacWinner> what's the expected behavior if you do the mongorestore through mongos before setting up a sharded collection or shard-key? will it by default put all the data into a single shard?
[01:18:53] <Boomtime> yes
[01:19:05] <Boomtime> the "primary" shard (yeah, this term is overloaded)
[01:19:24] <Boomtime> all databases have a primary shard
[01:19:46] <MacWinner> ahh.. ok.. i was confused by that..
[01:20:00] <Boomtime> the primary shard is used to house all unsharded collections
[01:20:11] <MacWinner> makes sense
[01:20:21] <Boomtime> once you shard a collection in a database, it no longer uses the "primary" shard for anything special
[04:28:33] <Jonno_FTW> hi, I'm trying to copy from localhost to a remote host that requires auth, how do I specify the auth?
[04:28:46] <Jonno_FTW> since the docs say you can only specufy the local auth
[04:31:35] <joannac> Jonno_FTW: "copy" - what does that mean? do you mean the copyDatabase command?
[04:32:55] <Jonno_FTW> joannac: yes, but the copyDatabase command doesn't specify the remote auth
[04:33:10] <joannac> you're copying from local to remote?
[04:33:21] <Jonno_FTW> yes
[04:33:30] <joannac> so connect to remote, and auth in the shell
[04:33:35] <joannac> and then run copyDatabase
[04:33:38] <Jonno_FTW> ok
[04:33:50] <Jonno_FTW> actually I can't, the sysadmin won't let me
[04:33:55] <Jonno_FTW> it has to be the other way
[04:34:05] <joannac> that's not the way copyDatabase works
[04:34:10] <Jonno_FTW> :(
[04:34:13] <MacWinner> i have a bunch of advertisement images being uploaded.. there sized will usually be below 100kb.. not reaching the 16MB bson limit. Is it still a best practice to store the actual ad image binary data using GridFS?
[04:34:43] <Jonno_FTW> joannac: the sysadmin won't open the firewall on my machine for remote connections to mongod
[04:35:22] <joannac> if you can't connect to the remote mongod from your machine, how would it work in either direction?
[04:35:28] <Boomtime> MacWinner: gridfs is so you can exceed the 16MB limit, if you know you'll stay below that (with certainty) then you have no need of gridfs
[04:35:52] <Jonno_FTW> I can connect to the remote, but the remote can't access my machine because of the firewall
[04:36:10] <MacWinner> Boomtime, cool, thanks
[04:36:35] <joannac> Jonno_FTW: mongodump/mongorestore ?
[04:36:41] <Jonno_FTW> guess I'll have to
[04:42:44] <Jonno_FTW> joannac: can it be done in 1 step using mongorestore?
[04:45:59] <Jonno_FTW> joannac: I don't have admin access on the remote, is it still possible?
[04:49:19] <Boomtime> Jonno_FTW: let me see if I understand: you can login to the remote MongoDB with a mongo shell and you want to push a database from your local machine over to the remote MongoDB?
[04:49:36] <Jonno_FTW> Boomtime: yes
[04:49:43] <Boomtime> then you can use just mongorestore
[04:49:51] <Jonno_FTW> can you give an example usage?
[04:50:12] <Boomtime> http://docs.mongodb.org/manual/reference/program/mongorestore/
[04:50:15] <Boomtime> examples are there
[04:50:39] <Jonno_FTW> I tried using robomongo, but it only copied a third of all the data
[04:53:07] <Jonno_FTW> can I restore without doing a dump?
[04:54:18] <Boomtime> do you want to restore the entire content of dbpath on your local machine?
[04:54:18] <joannac> you tried using robomongo? robomongo is an alternative to the mongo shell
[04:54:36] <joannac> it's not needed to mongodump/mongorestore
[04:55:18] <Jonno_FTW> Boomtime: just 1 database
[04:56:24] <Boomtime> you are better of using dump, but alternatively, you can copy the db files that comprise the db you want to another directory and run restore against just that directory
[04:57:57] <Jonno_FTW> but it's like 4gb of files
[04:59:56] <Boomtime> my mistake, it can only restore *into* a dbpath, not from
[05:00:08] <Boomtime> you have to use mongodump
[05:05:38] <Jonno_FTW> I did the dump but the restore didn't work
[05:05:44] <Jonno_FTW> I used mongodump.exe --out D:\dump --db trafficdata
[05:05:45] <Jonno_FTW> mongorestore.exe -d mack0242 --host remote.trafficdb --port 27017 --username mack0242 --password mypass --dbpath D:/dump
[05:06:22] <joannac> no dbpath is necessary?
[05:06:25] <Jonno_FTW> I get the error: don't know what to do with file [dump]
[05:06:40] <joannac> get rid of the "--dbpath" bit
[05:07:07] <Jonno_FTW> but how will it know where to read the dump from?
[05:07:51] <joannac> because I said "remove the "--dbpath" bit, not the actual path?
[05:07:58] <Jonno_FTW> ah
[05:08:43] <Jonno_FTW> all good, thanks for the help
[05:09:24] <Jonno_FTW> karma++
[05:14:06] <GothAlice> Quick question, since my Google fu is weak tonight: is there a way to $inc and limit to a maximum/minimum? I.e. I'm wanting to implement "trickling pool"-based rate limiting rules where you have a maximum pool size (say, 10), performing the limited action $dec's by one if possible, and every N minutes M elements are added back to the pool, not exceeding the maximum.
[05:15:59] <NoOutlet> You can add "pool: { $lt: 10 }" to the find portion of the update.
[05:16:49] <NoOutlet> I guess M isn't necessarily 1, huh?... Hmmm.
[05:16:54] <GothAlice> Yeah. :/
[05:17:37] <GothAlice> Want to abuse atomicity to rate limit API calls, signup attempts, even things like "is this username available" calls.
[05:25:02] <GothAlice> Hmm. I guess I could do it, with the overhead of potentially many extra documents, by exploiting upserts. (TTL indexes FTW.)
[05:25:27] <NoOutlet> My only other idea is to perform a second update after each element with $min: http://docs.mongodb.org/manual/reference/operator/update/min/#up._S_min
[05:25:44] <GothAlice> It has to be atomic; race conditions can make rate limiting not limit. ;)
[05:25:46] <NoOutlet> But that goes against your atomicity ask.
[05:25:49] <NoOutlet> Yea
[05:30:24] <GothAlice> db.limits.update({_id: {event: "auth.available", slice: ISODate(…)}, pool: {$gt: 0}}, {$inc: {pool: -1}, $setOnInsert: {pool: 10}}, {upsert: true}) — the unique constraint will explode if one is found but the pool is empty. This handles a pool that completely refills after each division of time. Which is close.
[05:33:11] <giowong> hi
[05:33:16] <GothAlice> Howdy, giowong!
[05:33:20] <NoOutlet> To be honest, I am not following what you're doing.
[05:33:50] <NoOutlet> I understood the problem you asked for help on, but not the overall use case.
[05:33:53] <giowong> how do i save my database so that when i push to git my teamamates cna access the collections i inserted?
[05:34:27] <giowong> currently they can see the database i created, but its empty
[05:34:36] <giowong> after they pulled of course
[05:35:40] <cheeser> uh... what?
[05:35:51] <GothAlice> NoOutlet: http://www.shorewall.net/ConnectionRate.html < This is the "algorithm" I'm attempting to implement. 2/min:5 — allow a burst of up to five attempts, adding two attempts back to the pool each minute. The use: if a single source IP attempts to register multiple accounts on my webapp too rapidly they'll be blocked for a short time. (With a friendly message saying "try back in X time".)
[05:35:54] <cheeser> you want to put your mongo database files in to git?
[05:36:35] <Boomtime> GothAlice: i feel you could create a rate limiter using an array of dates, updated by $push, with a $slice to limit entries to your rate quantity, the query predicate would include a date that is set back in time by the allowed frequency..
[05:36:37] <GothAlice> giowong: Yeah, don't do that. Source code management systems like Git are designed for text, not large binary blobs. Instead, consider having a central server somewhere hosting the MongoDB data for all to utilize.
[05:36:57] <giowong> o shoot damn
[05:36:58] <giowong> i mean
[05:37:05] <giowong> its a school project
[05:37:23] <giowong> i was wonder if there was a way to share the data i inserted so that they wouldn't have to do it on their own
[05:37:29] <giowong> and jsue use the local database to develop
[05:37:45] <NoOutlet> You can export and they could import.
[05:37:52] <NoOutlet> Or dump and restore.
[05:38:10] <cheeser> dump would generate binary files
[05:38:11] <Boomtime> i.e to update it must contain a date entry that is old enough to be replaced, the slice operator ensures that one of the old slots is removed and the new one is pushed in
[05:38:20] <cheeser> you could, of course, share those but I wouldn't do it via git
[05:38:24] <GothAlice> giowong: There is. Dump/restore is one approach, another is "fixtures". Basically your code detects that it's running on a clean database and bulk-loads stock data from YAML, mongodump, or some other data source.
[05:38:24] <cheeser> dropbox maybe.
[05:38:42] <GothAlice> Dropbox would be potentially catastrophic for data.
[05:38:46] <cheeser> but for tests, yeah, generate that as part of the test harness setup phase
[05:38:50] <GothAlice> Its sync is not POSIX-safe.
[05:39:01] <cheeser> GothAlice: i sync with it all the time between OS X machines
[05:39:03] <giowong> so..
[05:39:05] <cheeser> even git repos
[05:39:11] <giowong> it seems like there is no easy fix>
[05:39:12] <giowong> ??
[05:39:21] <GothAlice> cheeser: Ah, but hopefully not mongodb on-disk stripes from a live mongod instance?
[05:39:30] <GothAlice> giowong: Language and database layer in use?
[05:39:55] <cheeser> GothAlice: oh, god no. i'm not an animal. :D
[05:40:53] <GothAlice> cheeser: :P On the other hand, I *do* perform filesystem snapshot syncs of my 26 TiB Exocortex dataset to offsite backup provider Backblaze. Took three months for the first backup…
[05:41:57] <GothAlice> giowong: An alternative, depending on how much data you have, is to sign up for a trial of something like https://mongolab.com/plans/ — their free tier should be more than sufficient for school work.
[05:42:34] <cheeser> GothAlice: crashplan, personally, but yeah. initial syncs are a bitch.
[05:42:56] <NoOutlet> You could also use one of 40,000 openly available mongodb servers that are unsecured on the internet.
[05:43:04] <GothAlice> lol
[05:43:04] <cheeser> haha
[05:43:05] <cheeser> nice
[05:43:09] <GothAlice> There's always that!
[05:43:42] <cheeser> i wish i could tweet that. ;)
[05:44:13] <cheeser> nothing about that!
[05:44:38] <cheeser> shards a plenty1
[05:44:40] <cheeser> !
[05:45:01] <cheeser> ok. it's almost 1am. i've had a bit too much to drink. i'm to bed.
[05:45:03] <cheeser> later all.
[05:45:09] <NoOutlet> Gnight.
[05:45:17] <GothAlice> Well, I wouldn't be so conspicuous as to promote random hosts on the 'net to a shard bonnet. Too much interference with normal operation. ;)
[05:45:20] <GothAlice> Have a great one!
[05:47:23] <Boomtime> GothAlice: you want rate-limiting?
[05:47:28] <GothAlice> Boomtime: Aye.
[05:48:30] <Boomtime> how about a fixed size array of dates, 5 dates indicating the last time they logged in (or whatever), you find on having a date in the array "old enough" (2 mins?), use the positional operator to update it to current date
[05:49:11] <MacWinner> anyone have a recommendation on a PHP odm? I prefer simple. I see a bunch.. mandango, php-mongo-odm, purekid, phalcon
[05:49:34] <Boomtime> that gets you your burst ability and slow recovery, though i doubt it's exactly the algorithm you linked to (i didn't check)
[05:50:06] <GothAlice> MacWinner: Python and MongoEngine. ;) Sorry, suggesting people switch to a real^H^H^H^H^H different language is a compulsion. :P
[05:50:55] <MacWinner> if I switch, it's going straight to nodejs.. i think end-to-end javascript is the future.
[05:51:12] <Boomtime> MacWinner: yep, i think that too, for better or worse
[05:51:14] <MacWinner> python would be a close second though
[05:51:31] <GothAlice> MacWinner: https://www.destroyallsoftware.com/talks/wat (Gary Bernhardt's famous lightning talk) and http://youtu.be/et8xNAc2ic8 (WTFJS) may be amusing, or possibly depressing, for you. :)
[05:51:52] <NoOutlet> It would be pretty cool if all these unsecured databases were the inspiration for a paradigm shift in open data and cloud computing. That is a system of intentionally opening up thousands of database servers to the public.
[05:51:52] <MacWinner> Boomtime, i've learned to love javascript.. i use to hate the thought of it.. but have come to appreciate it quite a bit after doing a bunch of angularjs development
[05:51:54] <GothAlice> I consider PHP to be a useful template engine, and JS to be an anti-language. :P
[05:52:22] <GothAlice> Array(8).join("wat" - 1) + " Batman!"
[05:53:50] <GothAlice> NoOutlet: The dubious legality of non-forceful intrusion into others' systems is the only thing stopping me from slurping all of the data from all of those unsecured hosts into my Exocortex for analysis. ¬_¬
[05:55:14] <GothAlice> Imagine the glorious infographic one could create from all that, even just using the _id ObjectIds… a calendar heat map of the time of creation of every record across 40,000 hosts…
[05:59:36] <NoOutlet> Oooh. That's a good idea.
[06:00:34] <GothAlice> Some neat things could be made without ever really needing to access confidential data. For example, one could calculate how "green" MongoDB is by getting the collection stats from all 40,000 of 'em. Compare actual data size vs. on-disk stripe size to determine "efficiency". ;)
[07:52:56] <giowong> how do i deal with a ajax request that is too fast?
[07:53:10] <giowong> im trying to load 20 objects into a array, but the res.json returns nothing
[07:53:21] <giowong> but it returns fine wiht 6 obects
[09:56:43] <robopuff> Hi guys, what is the proper way to get a calculated value (I've got a basic price, but also in calendar embedded document price can appear - need to combine them and filter through this) and then use it to filter collection, or maybe it's not possible
[11:09:13] <guest9999> hello. how can i figure out which index is being used with .explain() ? its a hash.. and getIndexes() doesnt give me hash for each index
[11:10:37] <guest9999> nevermind
[12:27:39] <esko> Hi guys, i need a rest api for my local mongodb so i can post some stuff with jquery to it. any suggestions?
[12:49:57] <esko> ended up with restheart, thnx
[14:00:56] <Tug> Hey guys, I'm doing a test in the mongo shell and it seems I can't make the TTL feature work: http://pastebin.com/R9mK9Hee
[15:17:00] <d0x> Hi, is there a plan to have user defined functions in the aggegration framework?
[15:17:18] <d0x> Or can i do smth. custom like getting the calendar week out of a date_
[15:17:19] <d0x> ?
[15:18:16] <NoOutlet> http://docs.mongodb.org/manual/reference/operator/aggregation-date/
[15:18:29] <NoOutlet> There is a $week operator.
[15:22:36] <cheeser> Tug: get it figured out?
[15:23:00] <Tug> cheeser, yes :) http://stackoverflow.com/questions/28586271/documents-not-expiring-using-ttl-in-mongodb-2-6-7
[15:23:46] <Tug> NoOutlet, Thanks :-)
[15:24:54] <GothAlice> NoOutlet: G'mornin!
[15:25:05] <Tug> In fact, I didn't know you could add any options you wanted to an index
[15:25:17] <Tug> what's the purpose of this ?
[15:26:17] <NoOutlet> Good morning Alice!
[15:26:39] <GothAlice> Always good to see sunrises from the wrong direction… ¬_¬
[15:26:43] <NoOutlet> I was thinking about your problem. Isn't there a way to pre-allocate a specific size for documents?
[15:27:10] <GothAlice> NoOutlet: Actually, part of my code-on-my-phone-in-bed time last night was spent figuring out his solution.
[15:27:32] <GothAlice> Depending on how MongoDB handles $push with a $limit and $pop on the same field in an update, it might work. I can have it self-clean.
[15:27:44] <NoOutlet> Yeah.
[15:27:48] <cheeser> Tug: sadly it's just json. there's no real schema to it. just values we look for and use. :/
[15:28:47] <oznt> I'm lost with aggregartion ... how can you sum the elements of an array within a document?
[15:29:03] <cheeser> you $unwind the array then $sum
[15:29:05] <cheeser> iirc
[15:29:06] <GothAlice> NoOutlet: One minor point, Boomtime's solution will _require_ a periodic background task to repopulate the lists on a regular schedule.
[15:30:13] <d0x> NoOutlet: That was just one example. Anotherone would be to get the domain out of an (string) url. For that the String functions are not sufficient (yet)
[15:30:17] <GothAlice> http://docs.mongodb.org/v2.6/reference/operator/aggregation/unwind/#pipe._S_unwind and http://docs.mongodb.org/v2.6/reference/operator/aggregation/sum/#grp._S_sum specifically, oznt.
[15:31:04] <GothAlice> NoOutlet: Whereas my slightly less optimal solution self-populates by way of upsert, but can't drain and fill the pools at different speeds.
[15:31:51] <oznt> GothAlice, read those, but they are to complex. I am trying to understand how to sum a simple array inside one. I understand I need to use unwind, but not much further
[15:33:56] <GothAlice> document = {ages: [12, 24, 42]}; use test; db.foo.insert(document); db.foo.aggregate[{$unwind: '$ages'}, {$group: {_id: '$_id', average_age: {$sum: '$ages'}})
[15:34:17] <GothAlice> oznt: Have an example. This is first thing in the morning for me, so there may be errors in that due to a lack of caffeination.
[15:35:10] <GothAlice> Silly cold caches after offline maintenance…
[15:40:06] <NoOutlet> So a cron job for repopulating the lists or is there something more internally mongo that can do it, GothAlice?
[15:40:55] <GothAlice> NoOutlet: It'd have to be cron, or the equivalent. We use a distributed task execution system (basically the private version of marrow.task) that does scheduled one-off and periodic tasks, too.
[15:40:58] <oznt> GothAlice, that is very helpful. Can you concat the elements of an array if they are strings?
[15:41:28] <macabre> i'm looking for a free mongo gui, any suggestions?
[15:41:42] <NoOutlet> d0x, so yes there are times when you can't completely process the data in a $project step. The general advise is to create the data in a schema that works for your application.
[15:41:51] <cheeser> macabre: check mongodb.org under admin tools
[15:41:54] <GothAlice> oznt: There is http://docs.mongodb.org/v2.6/reference/operator/aggregation/concat/, but I don't think it does quite what you're wanting.
[15:42:00] <macabre> cheeser: thanks
[15:42:45] <NoOutlet> So if you need the domain of URLs a lot, store the domain when you create the document.
[15:42:52] <oznt> GothAlice, no it does not ... I am trying to to do the following [ "A", "B", "C"] should be in the end a string with "A,B,C"
[15:43:08] <GothAlice> oznt: Alas, you'll have to do that client-side.
[15:43:47] <NoOutlet> But it will be stupid easy client side, on the bright side.
[16:06:36] <izolate> how do you stop mongodb completely? i've done the "use admin; db.shutdownServer();", but if I type mongo, i can still connect to it ?
[16:06:58] <cheeser> can you?
[16:07:47] <izolate> i can. typing "mongo" connects me to the db
[16:08:03] <izolate> but ps aux | grep mongo doesn't have the process
[16:08:57] <cheeser> that seems odd
[16:09:18] <izolate> don't tell me that!
[16:09:37] <GothAlice> --nodb is a thing
[16:11:00] <izolate> what's that?
[16:11:13] <GothAlice> Prevents the shell from connecting to a database on startup.
[16:11:21] <GothAlice> I.e. it would behave as you are experiencing.
[16:12:02] <izolate> it still connected
[16:12:27] <izolate> also, when I do "sudo service mongod start" I get a red "[fail]" - pretty vague. any way to get more information on that?
[16:12:51] <GothAlice> izolate: /var/log/mongod — should be something around there.
[16:13:29] <GothAlice> Depends on your service configuration, though.
[16:14:49] <izolate> "exception in initAndListen: 10310 Unable to lock file: /data/db/mongod.lock. Is a mongod instance already running?, terminating"
[16:15:14] <izolate> seems to be the recurring problem. i have a persistent mongod process. can't kill it? thanks mongodb
[16:15:43] <GothAlice> It's pretty confident that there's a copy running. If you shut it down it's hardly persistent, eh?
[16:16:20] <izolate> i can't shut it down!
[16:16:34] <izolate> db.shutdownServer() doesn't work
[16:16:50] <izolate> and I'm not even convinced it's running since it doesn't exist in the ps aux list
[16:17:01] <izolate> but... "mongo" connects to it just fine
[16:17:02] <GothAlice> izolate: Step one: enhance your calm. This can be worked out. Step two: cd /proc; grep mongo */cmdline — excluding self/cmdline, are there any matches?
[16:17:26] <cheeser> telnet localhost 27017
[16:17:41] <GothAlice> Also that, but that won't help isolate a PID to kill. ;)
[16:17:54] <izolate> Binary file 24856/cmdline matches
[16:17:54] <izolate> Binary file self/cmdline matches
[16:18:04] <cheeser> no, but i'll confirm if something's actually there.
[16:18:24] <GothAlice> Perfect, there's a copy of mongo (or mongod) running as PID 24856. The telnet thing will inform you if it's actually mongod and not something else with 'mongo' in the name.
[16:19:15] <izolate> great, ill try to kill that
[16:19:46] <izolate> actually, that's not the mongod process. that's the mms automation agent
[16:19:54] <cheeser> haha. yeah.
[16:20:05] <izolate> ?
[16:20:06] <cheeser> the agent is probably restarting mongod for you
[16:20:13] <izolate> argh
[16:20:19] <cheeser> so kill the agent
[16:20:21] <GothAlice> I loathe most off-the-shelf automation systems.
[16:20:35] <cheeser> oh, this is bespoke!
[16:20:35] <cheeser> :D
[16:20:51] <izolate> you guys recommend against mms automation?
[16:20:57] <cheeser> i don't
[16:21:06] <cheeser> i highly recommend it.
[16:21:11] <GothAlice> Ah, no, mms is great. Puppet and kin are bad. ;)
[16:21:26] <cheeser> (disclaimer: i work on the mms automation team)
[16:21:52] <izolate> good to have you here
[16:22:13] <izolate> cheeser + GothAlice - thanks for your help. sorry for the frustration!
[16:22:54] <GothAlice> As a note, if you have automation, don't ever expect a db.shutdownServer() to stick. ;)
[16:23:01] <izolate> duly noted
[16:23:57] <GothAlice> (Also disclaimer: I roll my own cluster automation… and still manage to mix mms in there. ;)
[16:26:14] <izolate> i have another problem if you don't mind (mms related)? I get "Host is unreachable" in the mms dashboard. I thought I could fix it by putting "127.0.0.1 $HOSTNAME" into the hosts file, but no avail. off the top of your head, would you know how to fix this?
[16:27:10] <cheeser> sounds like MMS can't reach the agent?
[16:28:12] <izolate> yeah definitely. but unsure why. incoming ports are limited, but all ports are exposed for outbound
[16:28:44] <izolate> i would imagine it's a one-way response from the mms agent -> outside
[17:07:09] <preaction> i've got a collection that has 1GB of data in it, but has allocated 20GB of space. how do I prevent Mongo from doing that?
[17:08:14] <GothAlice> preaction: You can't, not really. It's allocating "stripes" on-disk for data to grow into. You can enable "smallFiles" on the server to reduce the rate at which it grows that space.
[17:08:37] <blizzy> did I get this right? https://gist.github.com/VBlizzard/e95203eade5674c310b0
[17:09:08] <GothAlice> preaction: See: http://docs.mongodb.org/manual/reference/configuration-options/#storage.smallFiles
[17:09:11] <preaction> GothAlice: thanks. that'll do for now
[17:09:40] <GothAlice> blizzy: Does it run? :)
[17:10:02] <blizzy> GothAlice, I wasn't asking about that, I was asking if the comments were right about what is happening
[17:10:16] <blizzy> sorry, I should of said that.
[17:11:08] <GothAlice> blizzy: What I'm attempting to convey is that questions like that are most easily tested in an interactive Python shell. Copy and paste that script into one, then you can answer your question by typing "db" or "collection" (without quotes) and pressing enter.
[17:11:29] <blizzy> ok, thanks, GothAlice.
[17:11:30] <GothAlice> (That's effectively how I'd answer the question. ;)
[17:13:42] <Guest22055> hi all .. help me please ..
[17:13:56] <GothAlice> Guest22055: Ask your question, don't ask to ask. What do you need help with? :)
[17:14:00] <blizzy> I just got confused between the database and collection. basically, the mongo client can store multiple databases
[17:14:08] <blizzy> a database can contain multiple collections?
[17:14:22] <Guest22055> I have a centos 7.0 machine where can I find 2.6.7 rpm version ?
[17:14:24] <GothAlice> blizzy: *server, not client. The client is just asking the server for access to them.
[17:14:48] <blizzy> so server has multiple databases, database has multiple collections.
[17:14:52] <preaction> yes
[17:14:57] <blizzy> ok, thanks.
[17:15:05] <GothAlice> blizzy: And yes. "Database" is analogous to use of that term in other database engines like MySQL. "Collection" is analogous to a table.
[17:16:33] <blizzy> I'm still kind of confused, but I got the basics of it.
[17:16:39] <cheeser> Guest22055: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-red-hat-centos-or-fedora-linux/
[17:17:03] <blizzy> so a database can contain multiple collections (tables)
[17:17:17] <preaction> it looks like i'm already using --smallfiles, but it still allocates 512MB files (down from 1GB files). is a nightly repairDatabase something I should consider then?
[17:17:20] <blizzy> so could someone give an example?
[17:17:53] <Guest22055> cheeser: I looked into that .. but I did not get the 2.6.7 repo
[17:17:57] <GothAlice> blizzy: See: http://docs.mongodb.org/manual/reference/glossary/#term-database and http://docs.mongodb.org/manual/reference/glossary/#term-collection
[17:19:15] <NoOutlet> Goth, do you have any mongorc enhancements?
[17:19:36] <NoOutlet> Or do you not use the shell enough to worry about it.
[17:19:42] <cheeser> Guest22055: they're there: http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/RPMS/
[17:19:47] <GothAlice> preaction: Depends on your data's inflation rate. I.e. do you frequently bulk-add data which is subsequently sparsely deleted? (That'd leave holes that might not be fillable later, requiring the stripes to grow.) Repair may be excessive, but compact might be an idea if that is the case. See: http://docs.mongodb.org/manual/reference/command/compact/
[17:20:01] <GothAlice> NoOutlet: I use the shell a lot. But it's mostly the Python shell, with all my MongoEngine goodies.
[17:20:26] <GothAlice> NoOutlet: This is important since I use MongoEngine's signal/event support to simulate triggers a lot.
[17:21:30] <preaction> GothAlice: the documents are long-lived, but have small bits added to them over time. I explicitly cap the number of bits I allow, so there's a limit to how big the documents get.
[17:22:13] <NoOutlet> I was wondering because I've been using mongo-hacker. It's got some niceties, but I think it's a bit too heavy, buggy, and version-specific.
[17:22:16] <GothAlice> preaction: Well, there's the 16MB hard limit. You may wish to investigate adjusting the document padding size to require moving records, and thus leaving holes, less frequently.
[17:24:38] <preaction> my avgObjSize is 113708, so you're saying if i adjust the padding, it'll leave more space for the updates so it doesn't have to move the document somewhere else, leaving a hole?
[17:25:12] <GothAlice> preaction: Aye. See also: http://blog.pythonisito.com/2012/09/mongodb-schema-design-at-scale.html ("Growing documents" section.)
[17:25:59] <preaction> GothAlice: thanks
[18:23:21] <abeard> I’m not sure the best way to ask this, but I’m trying to find out what can be done to optimize a less than query on a single indexed date field. I was expecting it to be a pretty quick query, but on two systems running mongo 2.6.7 it seems to be running suspiciously slowly
[18:24:00] <cheeser> did you index?
[18:24:11] <abeard> Yes
[18:24:27] <cheeser> pastebin the output of getIndexes() and your query, por favor.
[18:27:26] <abeard> http://pastebin.com/76PECaKt
[18:32:52] <abeard> I was trying to to use a similar query to age out anything older than 3 weeks ago from the table, but it seems like trying to do a less than query on a single indexed date field is killing me
[18:41:55] <abeard> One think I’m noticing is that if I run the query with explain() indexOnly is set to false
[18:42:17] <abeard> Which surprises me, since the only criteria in my query is $lt an indexed field
[18:56:37] <medmr> is the index in the right direction?
[18:57:27] <medmr> I don't remember if it matters in that case
[18:57:49] <medmr> but if you are doing $lt on "field"
[18:57:59] <abeard> All the docs I read said direction didn’t matter for single-field indexes
[18:58:09] <medmr> oh im thinking of sorting
[18:58:18] <medmr> like if it was sort -1 you'd want {field: -1}
[18:58:27] <medmr> to avoid scanandorder
[18:59:05] <abeard> right, but I don’t actually care about order, just finding the cutooff
[18:59:26] <abeard> I was under the impression it wouldn’t matter, but I could be wrong
[18:59:59] <medmr> what is the type of the field you are $lting on
[19:00:17] <medmr> ISODate?
[19:00:20] <abeard> yes
[19:00:41] <medmr> i have found that indexing on a string version of the date instead of an actual date object gives better cardinality
[19:00:46] <medmr> and faster lookups
[19:00:57] <medmr> which is kind of dumb at first glance
[19:01:10] <abeard> Really? That seems bizzare
[19:01:12] <medmr> that a field meant for a given data type would not be best at it
[19:01:13] <medmr> yes
[19:02:04] <medmr> looking up using $lt and $gt on a "YYYY-MM-DD HH:MM:SS" string has performed significantly better than doing it on an actual ISODate field
[19:02:30] <abeard> wow that’s… kinda scarey
[19:02:55] <medmr> this was last tested in 2.4
[19:02:58] <medmr> not sure if that's changed
[19:03:02] <abeard> I guess I can spin up a test DB and give it a shot
[19:03:07] <medmr> yeah try it out
[19:18:33] <ubique> Hi! Is there any way to get know which member of replicaset produce change, while reading oplog ?
[19:28:58] <GothAlice> ubique: Primaries are the only hosts that can make changes in a replica set, and there can only be one.
[19:30:14] <ubique> GothAlice: ok, I will rephrase question - is there any way to get know which client made change through primary to a replica set ?
[19:30:59] <MacWinner> i wanted to convert my replica set into a sharded cluster using the steps here: http://docs.mongodb.org/manual/tutorial/convert-replica-set-to-replicated-shard-cluster/ I wanted to see if I could do it on my production system without any downtime.. assuming I'm writing data into my existing replica set every 3 seconds. my current data pumper is pointed directly at the replica set.. if I convert it to a shard, will any new data that is directed
[19:30:59] <MacWinner> straight at the shard (not via mongos) be part of the sharded cluster?
[19:31:38] <MacWinner> i was doing some tests yesterday, and it seemed like if the data wasn't written via mongos, then it was not available to the cluster..
[19:32:39] <MacWinner> i'm planning on scheduling downtime.. but just more curious about if it's possible
[19:54:52] <edrocks> do I need to persist any data for an arbiter? ie if I run my arbiter in docker should I still use a volume for its data?
[21:37:20] <ladessa> Hi
[21:44:41] <daveops> edrocks the only "data" the arbiter needs is its config afaik
[21:45:43] <edrocks> daveops: I was testing it and It seems to be working fine on the same node even if i delete the container and rerun it without any volumes. I just have to pass in the name of replica set
[21:53:00] <daveops> edrocks awesome... that's what i meant by config. glad it worked for you.
[22:07:52] <MacWinner> i was reading this post about why fullcontact switched to cassandra: https://www.fullcontact.com/blog/mongo-to-cassandra-migration/ Is there reasoning FUD? I feel like they could have configured read operations to come from secondary nodes?
[22:17:46] <GothAlice> MacWinner: Without having read the article yet (it's processing…) there are a dozen FUD posts for every one with actually valid points.
[22:18:16] <GothAlice> https://blog.serverdensity.com/does-everyone-hate-mongodb/ is an excellent post for helping categorize the typical complaints.
[22:18:28] <MacWinner> GothAlice, yeah.. i saw that one..
[22:19:20] <MacWinner> i personally love mongo.. I only have a lingering concern in the back of my mind that things may slow down at scale.. we are not near scale yet. i also live in SF where it seems trendy to hate on mongo for no specific reason
[22:21:04] <GothAlice> "One day we realized our ‘data burn’ was on the order of 20GB a day and we had less than 200GB remaining on our SSDs." — Failure of monitoring and automated scaling. "We pushed Mongo onto the best database hardware AWS could provide…" — thinking just throwing hardware at a problem will solve a more fundamental architectural misunderstanding. "We could have done it in parallel, but…" — Admitting sharding was the right
[22:21:04] <GothAlice> solution, but they were unwilling to do it.
[22:21:10] <GothAlice> So, yeah, a reasonable amount of FUD spread throughout that.
[22:22:00] <fewknow> MacWinner: I have used mongo at scale and seen it actually speed up from scaling. The issue isn't mongo...it is the users of mongo. Not knowing how to design schema or access patterns
[22:22:13] <fewknow> that is what happens when you let developers try to use a database
[22:22:15] <salmaksim> FUD = ?
[22:22:22] <GothAlice> They also patched problems out instead of rewriting them out (law #50) resulting in duplicated code. FUD = Fear, Uncertainty, and Doubt
[22:22:27] <salmaksim> Are people fuddy duddying about?
[22:22:47] <MacWinner> salmaksim: fear uncertainty and doubt
[22:23:12] <GothAlice> fewknow: Instead of proper database engineers, yeah.
[22:23:39] <morenoh149> could someone explain to me. There's some limitation (or not allowed by mongodb) feature regarding one-to-many relationships?
[22:23:50] <GothAlice> morenoh149: MongoDB does not have relationships.
[22:23:58] <morenoh149> something to do with allowing indexing on array fields?
[22:24:06] <fewknow> morenoh149: you shouldn't be using one to many relationships ... or simulating them
[22:24:06] <morenoh149> GothAlice: refs rather
[22:24:25] <GothAlice> Danger, Will Robinson, danger!
[22:24:34] <GothAlice> http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html
[22:24:39] <fewknow> morenoh149: don't use arrays to store many relationships
[22:24:40] <salmaksim> If you wanted that you could create a scheme for making connections between things though, right?
[22:25:18] <fewknow> morenoh149: what are the relationships you have? why can't you use nested structure?
[22:25:53] <morenoh149> asking for a friend https://github.com/keystonejs/keystone/issues/1075#issuecomment-74867092
[22:26:27] <GothAlice> Problem: you have a user, and users are associated in some way with invoices. User A has placed 400 orders. You want to query them, so you get the user, pull the list of invoice IDs, then re-query the invoices collection passing in the most goddess-awful query ever imagined. (And $in with 400 elements.) User B has placed 60,000 orders, tries to place another, and the system explodes when it is unable to add the invoice ID to their
[22:26:27] <GothAlice> account—you've reached the 16MB per document limit.
[22:26:34] <GothAlice> morenoh149: ^
[22:26:57] <fewknow> morenoh149: you don't want to have relationships like that in mongo. It would be better to duplicate data than to have relationships
[22:27:21] <salmaksim> lol @ this risk "Risk writing a ranty hate blog that makes you look ignorant or worse"
[22:27:28] <GothAlice> lol
[22:27:48] <salmaksim> and crusty
[22:27:56] <GothAlice> Or incompetent.
[22:28:18] <GothAlice> And proudly so, if writing a post to advertise it. ;)
[22:28:28] <salmaksim> i have to admit that switching from doing sql databases for a long time broke my brain when i started using mongo.
[22:28:32] <salmaksim> i felt naked
[22:28:36] <salmaksim> and also confused
[22:28:49] <salmaksim> no ranty hate blogs came out of it though
[22:29:21] <fewknow> salmaksim: at least you thought about it...developers just start wirting data because mongo will take the object model they are using in their code
[22:29:38] <fewknow> then months down the road they want to know why it is slow...or they can't do something.....
[22:29:41] <GothAlice> salmaksim: It isn't the easiest transition. That "how-to-screw-up" post covers many of the data modelling points, plus the many SQL vs. Mongo query comparison tables, esp. for aggregation. MongoDB really forces you to think about how you are going to *use* your data, rather than encouraging the fantasy of a "perfect schema".
[22:29:43] <MacWinner> i really fell in love with mongo after I was able to push json documents into it from AngularJS
[22:29:57] <fewknow> before the database would prevent you from messing up...now (with mongno) you can fuck yourself if you don't know what you are doing.
[22:30:07] <fewknow> MacWinner: that is horrible
[22:30:08] <GothAlice> MacWinner: That's the kind of thing fewknow is talking about biting devs. ;)
[22:30:18] <fewknow> lol
[22:30:41] <fewknow> MacWinner: you should have a data access layer written by a data engineer that you use to write to the database
[22:30:56] <fewknow> so you don't just store JSON and mess up your scaling later
[22:31:01] <fewknow> or blame mongo for something
[22:31:19] <MacWinner> it was a stopgap solution.. i've since come to love it for many other reasons
[22:31:26] <GothAlice> I go a step further. I have an AMVC separation, with API at the front. Controller speaks API and returns a view, API queries model, view consumes the data returned by the API.
[22:31:52] <salmaksim> GothAlice: i saw that it was about just using my data rather than dating it, getting beat up, and rejected by it
[22:32:33] <fewknow> GothAlice: i have API->DAL->mongo
[22:32:45] <fewknow> do it could be API->DAL->elastic,sql,mongo,etc
[22:32:55] <fewknow> API always returns JSON to the UI
[22:32:55] <GothAlice> salmaksim: Indeed. Relational modelling is like an abusive spouse… typically before the invention of divorce due to lock-in. EAV (entity-attribute-value) is a testament to this.
[22:33:36] <GothAlice> Plus the many different models for inheritance. (Multi-table concrete, multi-table sparse, single-table, the dreaded EAV…)
[22:33:45] <fewknow> BTW...this is not mongo but doesn't anyone have experience writing UDF's PIG ?
[22:33:52] <fewknow> does*
[22:34:44] <salmaksim> fewknow: i think i made it through my years without writing a single one
[22:35:05] <salmaksim> *arms up in celebration*
[22:35:20] <fewknow> salmaksim: i need to covert a timstamp to INT96 for impala....using pig
[22:35:31] <fewknow> well a java UDF in PIG
[22:35:53] <fewknow> and the #PIG is invite only...lol
[22:36:16] <salmaksim> are you feeding the PIG?
[22:36:45] <GothAlice> Always remember: PIG loves you.
[22:37:23] <fewknow> feeding it from HDFS with an avro schema ... converting to parquet adding timestamp and putting in impala
[22:37:37] <NoOutlet> Get the zookeeper to do it.
[22:37:52] <fewknow> lol
[22:38:47] <GothAlice> http://cl.ly/image/0t273e2e000f at least my dashboard is looking even more dashboard-y today. ¬_¬ Now to actually wire up those aggregate queries…
[22:40:07] <NoOutlet> Yeah, looking good.
[22:40:40] <GothAlice> A breakout by type pie chart thingy will go in the large block of blank space.
[22:42:08] <NoOutlet> I was going to suggest making those three block divs bigger to fill the space, but a pie chart would be nicer.
[22:42:16] <NoOutlet> Or maybe just a picture of some pie.
[22:42:39] <NoOutlet> http://i.kinja-img.com/gawker-media/image/upload/s--rc5NiOhM--/18zqjbdo1l118jpg.jpg
[22:42:44] <GothAlice> It's important to see the breakout of the distribution of use of print vs. pay-per-click, etc.
[22:43:58] <GothAlice> (Also, all numbers from that screenshot are static and fake, FYI, except for the line/area and doughnut charts.)
[22:44:02] <NoOutlet> Print? Like newspaper, flyers, magazines?
[22:44:05] <GothAlice> Aye.
[22:46:12] <NoOutlet> How can you know when it was print? Special URL on those materials? Or just by deduction (when it isn't from a search page or an ad)?
[22:46:44] <NoOutlet> I'm actually currently working on a BI project too.
[22:47:11] <GothAlice> We do end-to-end tracking and link shortening for our job distributions. We can track an intention to apply for a job back to not just the job, but the individual order (in case a job is distributed multiple times).
[22:49:05] <morenoh149> the answer to my question was - multikey indexes can't be done on more than one array field http://docs.mongodb.org/manual/core/index-multikey/#limitations
[22:49:38] <GothAlice> morenoh149: Additionally, $elemMatch can only match within one list per query, AFIK.
[22:51:00] <GothAlice> However, except for certain situations with extremely small lists that don't grow or have bounded growth, storing one:many that way is The Wrong Solution™.
[22:54:37] <GothAlice> NoOutlet: https://gist.github.com/amcgregor/94427eb1a292ff94f35d is an example "record this click" upsert.
[22:55:37] <GothAlice> (With '__' replaced with '.' where appropriate.)
[22:57:15] <GothAlice> NoOutlet: And https://gist.github.com/amcgregor/1ca13e5a74b2ac318017 is an example aggregate query w/ complete sample record.