[00:02:28] <joshua> unholycrab: I think you need to have them seperated cause if one goes down and you have multiple config servers there they all go down and it defeats the purpose. It might work ok sharing the resources with a mongod node but you can also use a really lightweight machine. Config servers don't need as much resources
[00:11:11] <valonia-> So i have 2x columns, one named level and one named maxlevel
[00:11:27] <valonia-> If i wanna remove the one that level == maxlevel
[00:20:50] <ruphos> it is precisely the mongo equivalent to delete
[00:21:09] <valonia-> joshua: so you are sugestion that i would use a find and then remove the ones i found?
[00:21:47] <joshua> Never done it with PHP, but heres the remove manual for that http://www.php.net/manual/en/mongocollection.remove.php
[00:22:41] <joshua> I don't know how the where operator works in that case, you might be able to do the two in one command
[00:24:34] <joshua> remove syntax says the first option is query, so on the mongodb shell at least you should be able to pass the same query to remove as you do to find
[00:36:44] <Zitter> Hi, Ive found this collection http://media.mongodb.org/zips.json I'm new to mongo, is there a tut focused on that data?
[05:15:30] <fluter> so I should use python-pymongo, as its newer version, right?
[06:44:06] <Rahar> Hi everybody! I need an advice for db choice/design. Say I have millions of user ids and billions of post ids. I need to store whether user have seen a post or not and later query that data. The query is something like "have this used seen this post?". The updates are done very frequently (each time user reads a post). Any suggestions to use mongo in this case or to go with other type of db?
[07:36:21] <txt23> I have a CSV file like this http://pastebin.com/4Uy3wgK8 which has 1st row has column names and rest is data. How can I import it into mongodb via mongoimport? Can someone please guide me?
[07:37:02] <iwantoski> if my documents defines a type as a string, i.e. { type: 'Answer' }, { type: 'Question' }, { type: 'Other' }, can I by the type field order like so: Question, Answer, Other? Basically specify what type (string) goes in what priority?
[07:37:10] <txt23> I know technically it will be --type csv --file /opt/backups/contacts.csv but how will column name work?
[07:39:56] <ElephantHunter> Derick joannac Number6 - The interactive tutorial on the site can not be completed, due to the server returning errors when modifying collections. There's no readily available site admin contact information... so I'm hoping one of you would be able to pass this on to whomever manages it.
[07:58:40] <Froobly> i have lists of items. i'm using mongoose. when i save an item it contains the list id, but the list does not update to contain the item id. is there a way of doing this in mongoose?
[08:00:32] <joannac> Gr1: you have a primary for every shard?
[08:02:45] <daniel-s_> I'm using pymongo. What is the best way to store the time of something.
[08:03:13] <daniel-s_> Should I just use a field named "time" or something similar, then save the time as a floating point number?
[08:03:22] <daniel-s_> i.e., the output of time.time() ?
[08:07:58] <Froobly> if it's not possible, what would be an elegant solution?
[08:50:57] <jacksmith> I have to store 4 GB of key-value pair of data, i want faster access to the data, i have 1.6 GB of RAM on my server will it beneficial if i migrate to mongoDB ?
[08:55:03] <Froobly> i'm using mongoose. i have lists of items. when i save an item it contains the list id, but the list does not update to contain the item id. is there a way of doing this in mongoose? if not, what is an elegant solution?
[09:01:37] <Nodex> jacksmith : if your indexes fit in RAM yes
[09:02:41] <jacksmith> Nodex: i tried using redid-server but i could not afford that much of RAM
[09:12:30] <kamol> how to limit amount of result in runCommand ?
[09:13:04] <ron> kamol: if you don't mind me asking, what's your native language?
[09:31:09] <alexgr> can i use @Reference List<String> usernames where User entity has Id and userName as index? I want to have the username readily available without hitting the db
[09:32:13] <alexgr> i want to make a document that stores the usernames of the users (which are unique and indexed) but i don't need their id and i don't want to fetch them while displaying the referencing document , only their usernames
[10:53:07] <BaNzounet> yep with .find(), thanks Derick :)
[10:53:56] <romaric_> Hi all, we have an issue with the php driver after we've sharded our mongodb database. Previously, we was connecting to mongo like this: $mongo = new Mongo($dsn, $params); $db = $mongo->selectDb($dbName); And we was getting the connection id like this: $lastError = $db->getLastError(); $conId = $lastError['connectionId']; Once the db has been been sharded there is no more connectionId. Is there someone who know what is the good
[10:53:56] <romaric_> practice to get the connection id ?
[11:20:28] <Nodex> you're asking a stupid question. How on earth do you expect a client to perform a query with no data?
[11:23:10] <gingerninja> Nodex someone told me that the whole collection is transferred to the client and then the query for a specific document is then searched for on the client sidr
[12:21:50] <iwantoski> Is it possible to specify a projection of a subdocument?
[12:33:10] <groundup> I have a collection, named likes, that holds all of the likes that relate to several other things - places, events, and guides. Each document is something like this: {place: 'id', user: 'id'} or {guide: 'id', user: 'id'}. There is always a user but the other field can be non-existant.
[12:34:14] <groundup> So, I added a sparse, unique index for each type. ensureIndex({place:1, user:1}, {unique:1, sparse:1}) ensureIndex({guide:1, user:1}, {unique:1, sparse:1})
[12:34:44] <groundup> I am getting duplicate key errors when I add a new like for a different place. The guide index is the one that's complaining.
[12:44:17] <kali> i'm not sure sparse will work as expected with multiple key index
[12:45:24] <kali> i think the "sparse" logic says "if key is null, don't index". but with an hybrid key, the key itself is not null, it's one of its composants which is null
[12:46:25] <kali> groundup: that's just speculation, but it could explain what you're experiencing :)
[12:50:54] <groundup> kali, http://docs.mongodb.org/manual/core/index-sparse/ "You can specify a sparse and unique index, that rejects documents that have duplicate values for a field, but allows multiple documents that omit that key."
[12:51:12] <Gr1> I have a sharded cluster of 3 machines. Now, if I need to make it a replica set, do I need extra machines, or is it Ok to use these 3 machines itself as replica set?
[12:51:50] <Gr1> And if I make it to a replica, there can be only one primary right?
[12:55:56] <groundup> Ugh, this just made my life very complicated.
[12:59:05] <kAworu> hi, is this expected: https://gist.github.com/kAworu/9527549 ?
[12:59:39] <kAworu> basically I want a unique index on a subdocument's property, but when I use the $push operator I can create a duplicate.
[13:02:47] <kali> groundup: one way would be to add two composite fields yourself: place_and_user_if_both_not_null, guide_and_user_if_both_not_null and have unique sparse index on them
[13:02:57] <kAworu> I found nothing on the Indexes FAQ
[13:03:58] <kali> kAworu: unique index must be understood as "at most one document in the collection has this value"
[13:04:36] <kAworu> kali: thank you, I just found out this link http://grokbase.com/t/gg/mongodb-user/1252d9dg96/push-with-unique-indexes in which the answer concure with you.
[13:05:50] <groundup> I think I am going to change it to something like this: user, type, id, value. Then type would be one of [place, event, guide, review]
[13:06:36] <groundup> Then remove the sparse index. Add the unique index as {user, type, id}.
[13:38:52] <coreyfinley> Has anyone experienced and issue where adding a query to a map/reduce seems to process more data than running the same map/reduce against the entire collection?
[14:02:39] <starfly> Gr1: you can use the 3 systems, assuming their performance capacity is adequate. Yes, one primary exists in a replica set, so with 3 shards at present, there will be 3 primaries inside 3 replica sets.
[14:32:46] <ekristen> what is the recommended way to backup mongodb, when you are dealing with 100s of GB of data?
[14:33:01] <ekristen> right now I’m at like 180gb and with mongodump it takes 3 hours
[14:34:19] <Joeskyyy> Point in time snapshots can work.
[14:34:38] <Joeskyyy> I typically set all my stuff in an LVM and take an LVM snapshot
[14:34:57] <Joeskyyy> That way if I need to restore anything from a backup, I have the actual data files to restore from
[14:35:10] <KamZou> Hi, on this command : db.runCommand( { shardcollection : "stats.mycollection", key : {"_id":1} }) could you please explain to me what do : "id":1 ?
[14:36:46] <Joeskyyy> That specifies "id" as your shardkey
[14:45:57] <Joeskyyy> If you're not balancing, sharding is pretty much silly
[14:46:14] <Joeskyyy> the point of sharding is to have relatively even data chunks to query to
[14:47:12] <KamZou> Joeskyyy, cause we've not any disk space soon (the first shard is ssd for recent data, the second one is for olders chunks with HDD)
[14:49:16] <Joeskyyy> Gotcha, well typically you can do a dump on that collection, drop it, then reimport it
[14:49:21] <Joeskyyy> If you're going to reshard it
[14:49:47] <Joeskyyy> I still don't think sharding is the best solution you're looking for… but it's your database
[14:50:22] <KamZou> Joeskyyy, i'm open to suggestions if you have some for this particular case :)
[14:51:44] <Joeskyyy> Do you actually access the older data?
[14:52:06] <Joeskyyy> Or is it possible to archive it?
[14:52:23] <starfly> it depends on the profile of writes and reads to old and new data
[14:53:12] <KamZou> Joeskyyy, yes our customers access to older data
[14:57:06] <step3profit> Can anyone point me at an example of consuming the oplog in python? I am using pymongo to connect. I have tried using a tailable cursor, and not using one... in any case, it seems to pause for about 20-40 seconds, then return 100-300 records, then pause again.
[14:57:27] <step3profit> I'm not using any timestamp or anything in the find at this time
[14:57:29] <Joeskyyy> It really depends on the access patterns KamZou.
[15:00:29] <KamZou> Joeskyyy, ok. Is this a big issue to set a shard key on "full" _id instead of _id.d in my case ?
[15:01:37] <Froobly> Is it possible to use an update to add a value to an array? or am i going about this the wrong way?
[15:02:46] <Joeskyyy> KamZou: That should sparse it out pretty neatly
[15:03:02] <Joeskyyy> But functionality wise it won't make sense if you're not querying on _id
[15:03:03] <Nodex> Froobly : check out $push and or $addToSet
[15:16:42] <Nodex> yup, what I don't understand is all the queries end up as something you must be able to do on the shell so what's the problem with working in a format that most people know
[15:17:03] <Nodex> the polyorphic part is given by the language and not by the ORM anyway
[15:27:56] <coreyfinley> When running my map reduce with a query, it aggregates 32 records with the same key, however when I run the same map reduce on the full collection, w/o a query, it only finds 4 records for the same key.
[15:32:37] <alexgr> i want to make a document that stores the usernames of the users (which are unique and indexed) but i don't need their id and i don't want to fetch them while displaying the referencing document , only their usernames
[15:33:21] <alexgr> @reference List<string> with only the usernames (indexed) but without the id or part of the object (username,id) is possible with morphia?
[15:35:03] <cheeser> it isn't. morphia will pull the entire referenced object.
[15:36:29] <alexgr> but @reference only keeps refs to the id's according to documentation no?
[15:37:54] <cheeser> if you use @Reference morphia will store a DBRef (by default) in mongodb
[15:38:25] <cheeser> but when loading the containing object, that reference will be used to load the other object and that's done in its entirety
[15:39:34] <alexgr> so i can only have this functionality if i make the userName the default key for the collection... is it bad practice if i don't have an objectid but a unique string that i set?
[15:40:19] <cheeser> the id can be whatever you want it to be. ObjectId is just the default the database uses.
[15:40:45] <cheeser> but even using a string as the id, you can't do what you want. morphia will use that string to load the entire object
[15:41:13] <alexgr> can't i just print this array without fetching the objects?
[16:04:12] <KamZou> Joeskyyy, about the discussion we had few minutes ago
[16:04:18] <KamZou> You said me : <Joeskyyy> Gotcha, well typically you can do a dump on that collection, drop it, then reimport it
[16:05:09] <KamZou> Cause i've any data on the second shard right know, could i stop insertions, stop sharding and reshard the collections with the correct key ?
[16:05:21] <KamZou> instead of dropping and recreating ?
[16:06:04] <Joeskyyy> No, because it's already been sharded. Mongo will complain
[16:19:04] <ekristen> if I am adding a new member to a replicaset and I have a full backup using mongodump, whats the right way to restore it into mongo so that the initial sync is small?
[16:19:39] <Joeskyyy> mongoimport it before adding it to the repl set
[16:21:59] <kali> because there is no information in the mongodump about "when" the snapshot was done
[16:23:01] <Joeskyyy> what about if you use the —oplog flag in the mongodump?
[16:24:22] <ekristen> ok, I’ll snapshot the disk, this’ll be fun :/
[16:24:52] <ekristen> I wonder if it’d be easier to just let mongo sync 160gb at this point
[16:25:55] <kali> Joeskyyy: i did not know that exists, but even so... http://docs.mongodb.org/manual/tutorial/resync-replica-set-member/#copy-the-data-files
[16:26:27] <Joeskyyy> yeah just reading that as well
[16:26:35] <kali> ekristen: if your system is not overloaded, just letting the automatic sync works damn well
[16:26:51] <ekristen> or if I should just rsync the files
[16:27:58] <kali> ekristen: honestly, just try the automatic way... it's so easy... if you're getting into trouble, stop the syncing node and try something else
[16:32:35] <Joeskyyy> kali: Thanks for pointing that out though haha. I've always just done it the easy way too and never done an attempt to put preliminary data first
[16:35:57] <cheeser> that would've been foolishly late to the party.
[16:37:09] <starfly> Yeah, but after everyone in the tech company avoided most issues, most people were like 'huh, what was the fuss, no problems resulted"
[16:40:53] <draggie> I have a dev environment with mongodb installed on it. My code checks to see if a particular document in a particular collection exists. If it does, then it returns it. Otherwise, it inserts the document into the database. I drop the collection containing that document from the database and then run the program. 2 out of 3 times, it still finds the document even though I've already told mongo to drop it via the mongo shell. Is there
[16:40:53] <draggie> a delay on mongo to drop collections? I am not replicating or sharding at all.
[16:43:08] <bcpk> is there a "this" equivalent in update operators?
[16:44:08] <bcpk> I just want to lowercase all keys in my objects
[16:44:15] <cheeser> write js in the shell or a small app in ${language} to iterate and update your docs
[17:27:54] <ekristen> kali, Joeskyyy — syncing the new member is goign to take a while and I have 3 to add :/
[17:29:56] <unholycrab> is there any reason i shouldn't have more than 3 config servers?
[17:30:55] <unholycrab> like, running a config server on every persist instance?
[17:31:54] <Joeskyyy> You can only have 1 or 3 (1 for dev, 3 for production)
[17:40:36] <starfly> unholycrab: even if you wanted more than 3, presumably 3 for prod was chosen as the sweet spot between the need to maintain multiple copies of critical sharding maps and the overhead of what's used to keep them in sync, two phase commit
[18:02:47] <royiv> Hi all; we're trying to configure MMS, but MMS and/or the the MMS agent keeps overriding the hostname I give it with the hostname of the machine.
[18:03:29] <royiv> The monitoring agent appears to have been re-written in Go; the old one made this mistake, but you could just remove the broken hosts and re-add them.
[18:03:56] <royiv> However, with this agent, every time I add a host, it removes it and re-adds `hostname`:27017.
[18:04:20] <royiv> Is there any way to get around this? (Should I downgrade to the older Python agent?)
[19:14:04] <bobbytek1> Trying to increase the read performance
[19:14:11] <bobbytek1> I would have thought it would be quicker
[19:14:27] <bobbytek1> kali: Thanks for taking a look :)
[19:14:36] <kali> bobbytek1: not by throwing more memory at it
[19:14:49] <bobbytek1> Assuming I have the correct indexes setup, how can I optimize cursor reads?
[19:15:16] <Wil> For example, I find all documents that match some value. Then I want to return a specific field in that document where the doucment has a value that is some minimum. Does that make sense?
[19:17:11] <starfly> unholycrab: there's setup time for config servers and mongos routers, but as far as prod outage to enable sharding, I believe it's minimal, but haven't taken a large unsharded MongoDB database and done that. The heavy work occurs in the background when sharding large collections, but that (of course) could impact production performance, depending upon how well or over-provisioned your production environment is.
[19:17:46] <Wil> Get doucments that match X, and in the document which has the minimum value of ABC, return the value Y.
[19:19:41] <Wil> Let's say I have a document: { name: smith, age: 3 }... {name: bob, age: 6}... I want to select the minimum document by age (3), and return the name (smith)
[19:19:54] <bobbytek1> I have a basic query that searches for all documents with a field that contains a certain value. I then stream through the cursor client side using the Java driver. How can I make this performant as possible for a single node setup?
[19:23:38] <unholycrab> im guessing the most disruptive phases of converting to a sharded cluster are creating the indexes, and the initial balancing out of the documents accross new shards
[19:29:09] <starfly> unholycrab: new index generation can of course be a big hit whether sharding or not, so yes, additional indexing needed for formerly unindexed shard keys will be a load. Not sure disruptive is the right term, but guess if your collection is large enough, that could apply. You can do things like pre-split documents across new shards to minimize automatic chunk movement by the balancer later, etc. to minimize transition to a fully-balanced shard
[20:00:56] <starfly> unholycrab: this is probably the best reference: http://docs.mongodb.org/manual/tutorial/create-chunks-in-sharded-cluster/
[20:01:03] <unholycrab> hmm maybe im oversimplifying how this works
[20:01:34] <unholycrab> i want to shard over a timestamp key... leave old documents on the existing shard, and persist new documents to a new shard
[20:02:21] <unholycrab> it would not be an even split between the two shards
[20:02:36] <starfly> unholycrab: how big are the collection(s) you want to shard? If they aren't excessively large, you might be better off spending time consider the best shard key and let automatic sharding do the work
[20:05:04] <ranman> sweet indeed, GL, come back if you need help :)
[20:10:08] <starfly> unholycrab: another consideration, sharding is primarily a way of scaling writes, if you're load is mostly reads, you might be better off scaling those with multiple secondaries in a replica set
[20:17:55] <unholycrab> can i modify tag ranges as easily as i can add them?
[20:19:21] <unholycrab> if i make a change to the tag ranges, is the shards rebalance according to the new assignments
[20:19:31] <unholycrab> derp; would the shards rebalance
[20:22:17] <ekristen> so um yeah this doesn’t look good — http://pastebin.com/Uug8QapP
[20:53:42] <unholycrab> "the general rule is: minimum of N servers for N shards" is this true? if i get to the point where i have 4 shards, should i convert each shard to a 5 member replica set?
[21:32:27] <joannac> ekristen: don't worry about it. it's an internal message just saying the internal queue has gotten large
[21:32:38] <joannac> (which is fine since you're index building)