[00:03:59] <harttho> Kind of a silly question, but what's the standard way to name Databases/collections
[00:04:06] <harttho> databaseName vs. database_name
[00:04:10] <harttho> collectionName vs collection_name
[00:06:12] <lexi2> wanted to know with tag aware sharding if i update a doc thats using tag aware sharding and the update causes it to move to another tag or range. whats the process that taks place
[00:06:18] <Boomtime> harttho: that is entirely subjective
[00:08:36] <Boomtime> you can try, but any update that attempts to change the shard key of a document will be rejected
[00:09:27] <Boomtime> if you change the tag ranges themselves though i believe that it just triggers regular migrations.. but i'm not certain actually
[05:13:35] <speaker1234> any suggestions on how to detect duplicate records? The only thing I can think of is add an sha1 of all the variable fields and search on that.
[05:41:45] <Boomtime> speaker1234: can you define what you mean by duplicate records? _id must be unique across all documents, so it is theoretically impossible to have precisely duplicate records
[05:42:55] <speaker1234> The situation is that a customer of mine is sending out batches of records and every so often they send out a duplicate batch because their side burped
[05:43:16] <speaker1234> I have to do data normalization on what they send me so I get to handle all the fun filtering. :-)
[05:45:51] <Boomtime> ok, your problem is with the data you are receiving from some thrid-party?
[05:46:13] <Boomtime> is there some sort of key in these records? how do you know they are repeats?
[06:23:56] <speaker1234> Boomtime, sorry missed your response.
[06:24:45] <speaker1234> Boomtime, the only way I can tell if a record is a dupe is if certain fields are the same.
[06:27:37] <Boomtime> speaker1234: if certain fields being the same indicate a duplicate, then make a unique index on the combination of those fields
[06:28:05] <Boomtime> mongodb would not allow the duplicate to enter the database, specifically giving you the error "duplicate key"
[06:28:24] <sijojose> Hi, I've a collection called trainings, inside that there is a filed courses:[{id:1,name:'abc',},{id:2,name:'def'},{id:3,name:'fgh'}] , how can I query for a specific course by using its Id....?
[06:33:03] <Boomtime> sijojose: db.trainings.find({courses.id:<id>},{"courses.$":1}) or something like that
[06:35:49] <sijojose> Boomtime: let me see.. thanks
[06:37:26] <sijojose> Boomtime: one more thing for inserting values this approach courses:[{id:1,name:'abc',},{id:2,name:'def'},{id:3,name:'fgh'}] correct right ..?
[06:41:11] <sijojose> Boomtime, in training collection courses is a field.. in courses field I'm inserting values like that... in sql courses will be another table with foreign key reference to trainings table
[09:00:18] <optiz0r> Morning all, I've been banging my head against an authentication issue for a few hours. Could anyone help me with https://gist.github.com/optiz0r/f8ba0b8d382ab0884191 I can login via the shell with the user's credentials and run show collections but attempting to do the same via mongoengine gives an authorisation failure
[09:07:45] <optiz0r> looks like there was quite a lot of change relating to authentication between 2.4 and 2.6 so I should mention this is using 2.4.6
[10:54:02] <joannac> optiz0r: where's the authentication database in your test?
[10:54:54] <optiz0r> joannac: as in which database was the user created in? the pulp database
[10:56:58] <optiz0r> I also tried creating the user in the admin database, then adding it with userSource:"admin" in the pulp database; and adding it in the admin database with otherRoles:{"pulp":["readWrite","dbAdmin"]}. Both resulted in the same error message as when the user was defined in the pulp database directly
[11:00:51] <joannac> can you see the connection from your test program authing successfully in the mongod log?
[11:03:43] <optiz0r> joannac: I see the connection but not an auth attempt logged (successful or otherwise) https://gist.github.com/optiz0r/f8ba0b8d382ab0884191#file-mongodb-log Do I need to bump up the logging verbosity to see that?
[11:06:32] <joannac> optiz0r: I thought it would show up at default log level
[11:08:39] <joannac> yup just confirmed, should show up at default
[11:09:21] <joannac> where's the actual start of the connection?
[11:09:38] <joannac> is that all of the log entries for that connection?
[11:26:59] <optiz0r> joannac: sorry had to pop away from the desk for a few moments. you're right there was an extra line I omitted (there are replication heartbeats every few minutes so the log is somewhat noisy. I'm slightly confused that it's attempted authentication as __system with a key. I do have a keyfile for replication, but I'm explicitly attempting user/password authentication from the code
[11:27:13] <optiz0r> https://gist.github.com/optiz0r/f8ba0b8d382ab0884191#file-mongodb-log updated with the extra lines
[11:27:47] <optiz0r> and indeed that the authenticate db is local, rather than pulp
[11:36:41] <optiz0r> ok ignore that, I was grepping on the connection id and pulled in a line from another day by mistake. Even bumping up verbosity to 4 I don't see an authenticate line from my test script
[11:37:16] <joannac> right. well if it's not authing, that explains why it's not working
[11:39:51] <joannac> optiz0r: you'll need to get your program to actually auth :)
[11:40:14] <optiz0r> joannac: indeed, perhaps a bug in mongoengine then. thanks for your help :)
[13:41:34] <jmacdonald> As a beginner, it seems there are enough subtle administrative differeences between 2.4.9 and 2.6.x ... that I should deff go with 2.6 for updated docs. is this assumption correct? i ask cause ubuntu 14.04 has 2.4.9 by default and i have to source up a 3rd party repo to get 2.6
[13:41:53] <StephenLynx> yes, use the 3rd party repo.
[13:43:10] <jmacdonald> I don't actually use mongo, but i'm being asked to admin a dev setup of it, so i'm more wrapping my mind around roles for doing backups and whatnot.
[13:43:40] <jmacdonald> and yes, using official repo.
[14:28:38] <Folkol> Hello. Mongod just filled up my disk and crashed. The data is not important, so I would like to drop the database to free some disk space. But I can not start mongod due to no disk space for the journal... Can I remove the datafiles manually, or will that leave mongo in an inconsistent state?
[14:29:03] <cheeser> well, if you don't care about the data, just blow it away.
[14:29:10] <kali> erase everything under the dbpath diectory
[14:29:50] <cheeser> UberG0Su: anything to do with Gosu the language?
[14:30:42] <UberG0Su> @cheeser: nope rather with sc:bw ;p
[14:32:11] <cheeser> i don't remember that part...
[14:36:08] <jiffe> is there a way I can start a replica member and have it skip the first operation it will try to sync?
[14:52:07] <dschneider> I tried to update to mongodb 2.6.7 (with yum) but the rpm repository seams to be broken: http://downloads-distro.mongodb.org/repo/redhat/os/x86_64/repodata/primary.xml.gz: Metadata file does not match checksum
[14:52:46] <dschneider> Is this really a problem at mongodb.org or can I'm doing something wrong?
[14:53:04] <StephenLynx> I just updated a couple of hours ago, probably it's on your side.
[14:53:55] <StephenLynx> http://docs.mongodb.org/manual/tutorial/install-mongodb-on-red-hat-centos-or-fedora-linux/ this is what I use.
[15:01:23] <dschneider> StephenLynx: I have the same mongodb.repo configuration as in the tutorial and It was working before. I also did a "yum clean all" before I tried to update.
[15:35:22] <catphish> i'm planning to store power consumption of a large number of electronic devices (10s of thousands) at 1 minute intervals, i have a couple of basic questions about how to organize the data
[15:36:34] <catphish> firstly, is it common to use separate collections to shard data like mine on a per-device basis, or would you normally throw them all in together into one table as i would have done with a sql database?
[15:37:18] <catphish> and secondly, given that my data set will very quickly exceed available storage, does mongo offer anything to automatically aggregate averages of old data (similar to rrd)
[15:38:45] <StephenLynx> afaik, as long as you don't use $group, a sharded deploy will not have issues with a collection that is distributed among the machines
[16:07:11] <elux> hello.. will 2.8.0 be a drop-in replacement for 2.6.6 ..?
[16:13:08] <Torkable> can anyone comment on experience with the two-phase commit pattern or any alternatives?
[16:13:26] <Torkable> best comparison I could find was
[16:13:47] <Torkable> job queue seemed like only real alternative
[16:17:09] <cheeser> dschneider: any luck updating?
[17:08:33] <proteneer> my secondaries are stuck in { "$err" : "not master and slaveOk=false", "code" : 13435 }
[17:08:44] <proteneer> ie. it's not synced to the master
[17:08:52] <proteneer> but rs.status() shows that health etc. is fine
[17:10:48] <Chubbs> I am building an API where I need to append a few fields to a mongo result (in Node.JS) before sending it as JSON, however whenever I try and add a key value pair to my mongo object it does not appear. I can change the value of an existing field without issue however. Is there some way to make this object modifyable or do I need to clone the result into a new object before I can alter it?
[18:14:39] <StephenLynx> There's a whole stack of ideas that combine to make channels and transducers valuable, but I'll pick just one: events are a bad primitive for data flow."
[18:14:44] <StephenLynx> they solve the same problem.
[18:15:09] <StephenLynx> you can just have a secondary function as a property to the callback.
[18:15:39] <StephenLynx> they don't solve the same problems
[18:15:45] <matejjjj> collection of {array:[1,2,3]},{array:[1,2,4]},{array:[2,1,2]} and to return all objects where array has [1,2,*anyNumber] I didnt sucseed, it allways returns random positions and nubmers
[18:15:47] <StephenLynx> because switches are more fit for primitives
[18:16:15] <StephenLynx> that you can safely just do a boolean comparison instead of an equality check.
[18:16:37] <StephenLynx> with events and callbacks it is the same thing. you have a key and a function.
[18:16:49] <matejjjj> ok thanks guys, I will look more
[18:16:51] <StephenLynx> but with events you have a string and an object instead of a property in a function
[19:23:09] <jiffe> is there a way I can start a replica member and have it skip the first operation it will try to sync?
[19:36:47] <Mmike> Hi, lads. When I do rs.initiate() that command returns 'ok' (assuming I didn't do syntax error or provided bogus rs.config or such). But the replicaset is initialized only later. Subsequent calls to replSetGetStatus shows that Mongo goes from throwing OperationalFailure with 'replset being intialized' text, then it is in state startup->startup2->recovering->secondary->primary.
[19:37:02] <Mmike> Is there a situation where going trough this can stop with failure?
[19:41:57] <bmbouter> wow mongodb 2.4 is difficult to configure
[19:42:25] <bmbouter> I set the parameter enableLocalhostAuthBypass=0 when I start mongod
[19:42:56] <bmbouter> and yet in the verbose output I still see 'note: no users configured in admin.system.users, allowing localhost access' in the output
[19:51:37] <harttho> Do you have any users defined?
[19:54:44] <harttho> Your 'note' would suggest you don't
[19:55:57] <bmbouter> I have a user defined on the database of interest
[20:05:35] <bmbouter> and now I have an admin user
[20:05:49] <harttho> Does it work as you would like now?
[20:06:11] <bmbouter> well I have auth now, but I need this to have certain roles on the pulp_database
[20:06:27] <bmbouter> so I can make a second user with fewer privs
[20:06:51] <bmbouter> but do I make it in the admin db like we did here, or do I add it into my specific db of interest 'pulp_database'
[20:06:59] <harttho> I'd have to brush up, but I think if you have the second user within the pulp_database, you can restrict things within that database under the umbrella of the admin user
[20:07:10] <bmbouter> I was putting it at db.system.users.find( ) where db is pulp_database
[20:07:28] <bmbouter> do you always configure users in 'admin' or on the database itself 'pulp_database'
[20:08:58] <harttho> "Authentication requires at least one administrator user in the admin database. You can create the user before enabling authentication or after enabling authentication."
[21:18:14] <jayjo> If I'm running a mongo daemon on my server to hold my data, what is the best way to ensure that data's integrity? Do I copy it to a local machine every night with a cron job or spin up a separate server for that same task?
[21:22:07] <joshua> I think that depends on how important the data is and what your resources are. You can do a database dump during off peak hours if you have a period where it won't interfere, you can run a replica set and use one of those nodes to do a filesystem snapshot, you can snapshot the filesystem where you have it now.
[21:30:03] <cheeser> jayjo: use a replica set and mms backup
[21:33:55] <joshua> Yeah if you can leverage MMS, go for it. Saves having to figure out all the stuff on your own
[21:59:39] <theRoUS> is there a way to apply .distinct() to a .find() resultset? i.e., i want to find the distinct values for a field within a subset of the collection rather than the whole collection
[22:01:32] <theRoUS> derrr, never mind, didn't read far enough
[22:16:44] <FunnyLookinHat> If I have a bunch of items in a collection with the same indexed key - is there a fast way to update each to have a new value for a different key rather than looping each record?
[22:17:44] <FunnyLookinHat> Ah I just wasn't googling right: http://docs.mongodb.org/manual/reference/method/db.collection.update/
[22:20:10] <catphish> i am looking to store large quantities of time-based numerical data, too much to reasonably store, so i'd like to store hourly, daily averages automatically and delete older data, is this easy to achieve?
[22:21:25] <kexmex> hey guys, i've found records with a missing field, and the code didn't change
[22:21:35] <kexmex> i looked through it, and besides save(), i see only updates with $set on specific fields
[22:32:27] <kexmex> although maybe i should check for documents missing those other fields as well
[22:32:35] <catphish> Torkable: thanks for your help by the way, i'm afraid i'm quite new to mongo so i hope i'm explaining myself clearly
[22:32:59] <joannac> kexmex: so the field is there, but no value?
[22:32:59] <catphish> as said, my input data is {"device":1, "timestamp": "2015-01-01 00:00:00", "value":100}, for many values of timestamp and device, and i will delete data in this collection older than approx 72 hours, i then want hourly averages for each value of device (i assume in another collection), however it is probably insane to completely regenerate all the hourly averages every time
[22:34:51] <Torkable> catphish, read the docs on aggregation queries and compound indexes
[22:35:28] <Torkable> may as well learn how to do it
[22:36:13] <kexmex> joannac: could it be that an update makes it to server before the insert? :)
[22:36:24] <kexmex> although the update is not touching that particular field
[22:36:27] <catphish> Torkable: thanks, am i right in thinking i can just overwrite data in my "hourly_averages" collection at regular intervals, looking only at data from he main table from the previous 120 minutes, running the query every 30 minutes, to ensure everything is correct?
[22:45:44] <kexmex> maybe the thread that was supposed to do the insert got aborted
[22:45:48] <kexmex> before the update went out over the wire
[22:46:25] <kexmex> i am guessing i should have a writer thread for this and i should just queue commands and have the writer thread pick them off one by one in proper order
[22:52:20] <kexmex> doesn't Mongo driver have it's own queue ?
[22:52:47] <kexmex> so when i do collection.Save(), i am handing it over, and if the thread that called .Save() dies, that shouldn;t prevent the insert from going through, should it?