PMXBOT Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 29th of January, 2014

(Back to #mongodb overview) (Back to channel listing) (Animate logs)
[00:58:26] <proteneer> how do I update a field in a document without deleting the other fields?
[00:58:38] <proteneer> i have a manager document with 3 fields
[00:58:42] <proteneer> mdb.managers.update({'_id': email}, {'token': new_token})'
[00:58:53] <proteneer> but that wiped out the 3rd field
[01:02:08] <proteneer> nm
[01:02:09] <proteneer> figured it out
[01:29:29] <queuetip> proteneer: what did you do?
[01:29:39] <queuetip> proteneer: i'm interested in your solution
[01:29:54] <proteneer> $set
[01:30:45] <queuetip> can you elaborate?
[01:30:58] <queuetip> i'm new to mongodb and this came up for me and i ended up just updating all the fields :/
[01:45:05] <mcr> is it hard to move to MongoDB.pm 0.702, where MongoDB::MongoClient from 0.45 (with) MongoDB::Connection. 0.45 is packaged in debian, and therefore much easier to deploy.
[01:45:17] <mcr> I'm looking at why Connection() won't connect to IPv6 addresses.
[01:53:31] <mcr> reading source code, no IPv6 support.
[02:25:13] <thesheff17> I'm using pymongo... this works: output = mongo_collection.update({"revision": {"$gte": 1, "$lte": 5}},{"$set": {"status": False}}, multi=True) but this does not output = mongo_collection.update({"revision": {"$gte": startValue, "$lte": endValue}}, {"$set": {"status": checkBoxValue}}, multi=True)
[02:25:35] <thesheff17> any ideas?...do I need to cast the types of startValue, endValue, and checkBoxValue...these are all unicode obj types
[02:29:40] <thesheff17> I got it...need to cast everything to the right type
[06:06:06] <mark___> anybody using moongoose-rbac
[09:39:00] <Richhh> i just want to use mongodb as a key value store like redis, how can i best do this? im guessing using collection names as keys, containing a single object value is a bad idea?
[09:39:34] <Richhh> or maybe im using the wrong database
[09:41:00] <zamnuts> Richhh, use a collection to store a single KVP, from there, you could add meta (i.e. tag/label) to further isolate it from other unrelated KVPs
[09:41:40] <zamnuts> Richhh, although you need to be cautious of flooding the mongodb socket (?), upserts will start failing if you're doing too much too fast
[09:42:11] <zamnuts> Richhh, that is... without the right mongodb infrastructure
[09:42:28] <Richhh> gah
[09:45:16] <zamnuts> redis is typically in-memory, so it is supa-fast and can handle frequent reads/writes, that's not really what mongodb is for, but w/o knowing your application I am simply supplying the warning
[09:45:55] <zamnuts> Richhh, ^ + when you say "KVP like redis" - that's the first thing i thought of, keep in mind mongodb is essentially a KVP database...
[09:52:48] <Richhh> i want to implement 'sometag':[id,id,id...100 ids] and id:'string' in a way that enables the highest read and upsert throughput
[09:53:09] <Richhh> essentially tag:['string,'string',...] without dupes
[09:58:46] <ruphos> fwiw, redis sets should prevent dupes and sorted sets can also preserve order
[10:00:27] <Richhh> thanks ruphos , any idea zamnuts ?
[10:00:36] <Nodex> Richhh : why don't you just use redis? - that's what it's built for
[10:00:49] <ruphos> ^ agree
[10:01:16] <zamnuts> jw... is redis persistent? will it recover from power loss / reset?
[10:01:17] <Richhh> because the affordable capacity is not great enough
[10:01:23] <Richhh> zamnuts: yeah
[10:01:35] <Nodex> zamnuts : yes
[10:02:07] <Richhh> though i am giving it consideration
[10:03:02] <Nodex> what are you storing in your app?
[10:03:11] <Richhh> lots of strings
[10:04:04] <Nodex> and what are you hoping to achieve from using Mongo over Redis or vv?
[10:04:39] <Richhh> greater storage, I'll check vv now
[10:04:52] <Nodex> vv = vice versa
[10:04:53] <Nodex> lmfaoo
[10:04:57] <Nodex> -o
[10:05:02] <Richhh> oh
[10:05:06] <Richhh> thought it was some database
[10:05:12] <Richhh> lol
[10:05:29] <ruphos> Richhh: how many are you expecting to have?
[10:05:30] <Nodex> if your dataset will outgrow RAM then you can't really use redis
[10:06:44] <Richhh> possibly up to TBs
[10:07:13] <zen_> hi, ist there a way to find what locks my db, i mean what type of request
[10:07:20] <Nodex> then unless you're sharding in your app Redis is not a good choice beacuse RAM is more expensive
[10:07:29] <Nodex> zen_ : tail the log
[10:07:35] <zen_> k
[10:08:14] <zamnuts> Richhh, in what sense do you want to read/write to a database, whether it is redis or mongodb...? is this for cached data? how important is performance?
[10:08:17] <zen_> 25% locked on average is too high
[10:08:20] <zen_> ?
[10:08:50] <ruphos> not great, not terrible
[10:09:02] <ruphos> depends on the speed of replies you need vs what you're getting
[10:09:33] <zen_> first question was in the direction:
[10:09:40] <ruphos> high load will cause lock, as will slow queries
[10:09:51] <zen_> there's dex to find slow queries but nothing to find locking queries
[10:10:26] <zen_> just moved to a really way bigger server and thought that would fix my performance problems but didn't help
[10:11:55] <Nodex> perhaps you need some indexes
[10:12:22] <zamnuts> zen_, are they write locks?
[10:12:29] <zamnuts> or you don't know yet
[10:13:17] <zamnuts> zen_, w/ mongodb, vertical scaling is better than horizontal scaling, i.e. replication sets / sharding opposed to a machine with SSD
[10:13:46] <Nodex> read locks are shared, all they do is stop a write lock from happening and they're fast. It's highly doubtful that it's read locks that's causing your problems
[10:13:50] <zen_> know that, but thought sharding could wait a few months
[10:14:15] <Nodex> zamnuts : you also have the scaling the wrong way round. Horizontal scaling is better
[10:14:32] <zen_> no from mongostat i could say inserts an updates increase my lock %
[10:15:11] <Nodex> how big is your dataset?
[10:15:34] <zen_> i think the problem i have is "qr" in mongostat thats alwas like 20 or so
[10:15:52] <zen_> not dramatic, completely fit in ram
[10:16:01] <zen_> below a gig
[10:16:34] <zamnuts> Nodex, lawl, sorry brain fart there, ty for the edit
[10:16:36] <Nodex> and how big is your server?
[10:17:13] <kali> have you just grep nscanned in your logs ?
[10:17:14] <zen_> 24 cores 128GB ram !!!
[10:17:23] <zen_> should be enought ?
[10:17:25] <zen_> ;)
[10:17:32] <Nodex> have you tailed your logs
[10:17:38] <kali> yeah.
[10:17:39] <Richhh> zamnuts: throughput is more important than latency (if they are disentanglable) in both reads and writes, writes will be users adding a new string to the specified tag arrays (of strings), reads will be users querying with tags as keys for said string arrays as values
[10:17:43] <Nodex> and or grep nscanned as kali suggests?
[10:18:05] <Scud> Hi, im having trouble with the sort() operation. does mongodb allow me to apply the sort operation on a subdocument? e.g. entry: {foo:[obj1, obj2, obj3]} can i sort regarding parameters contained in objects 1..3?
[10:18:37] <zen_> missed that about nscanned
[10:18:44] <zen_> what does that show me?
[10:18:44] <zamnuts> zen_, what is your write concern?
[10:18:46] <Nodex> Richhh : I do similar things to that, I use Redis as a queue/cache in between the mongo persistence, gies me the best of both worlds
[10:19:03] <Nodex> it shows you what is locking your database (the query)
[10:19:21] <zen_> ah, answer to my question... sorry
[10:20:19] <kali> zen_: high figures there show slow queries scanning lot of documents. indexing the queries can help a lot
[10:21:55] <zen_> you know the dex tool. as far as that goes i have no slow queries
[10:22:27] <Nodex> tail -f /path/to/your/log
[10:22:39] <Nodex> don't rely on somehting else, look for yourself
[10:22:51] <zen_> ok
[10:23:13] <kali> the only dex tools I know about were used to kill people
[10:23:49] <Nodex> haha, on the table :P
[10:24:32] <zen_> i mean that: http://blog.mongolab.com/2012/06/introducing-dex-the-index-bot/
[10:24:59] <zen_> that is unreliable?
[10:25:02] <kali> ok great. now go and check out your logs :)
[10:25:17] <Nodex> lol
[10:25:26] <zen_> i'm on it, thanks so far
[10:28:25] <zamnuts> Richhh, i'm having trouble distinguishing your use case, you want to use it like you would redis/memcached but you are storing multiple TB of data
[10:38:10] <Richhh> zamnuts: i was thinking to store the arrays of indices to the strings in redis, then look them up with mongodb
[10:38:47] <Richhh> just .get() the whole array of indices
[10:39:09] <Richhh> i guess thats not gonna work because you'd have to query every one of them then
[10:39:14] <Richhh> i don't know
[10:40:37] <Scud> ah got it, didnt know i required the aggregate command
[10:40:57] <Richhh> i guess i need to think and research more about this
[10:42:49] <LoonaTick> Hi, I have a question. Reading in the MongoDB documentation I saw that MongoDB is atomic within 1 document. I'm trying to create an atomic lock around some piece of processing code, so I have a collection "lock", with a unique key on 'name'. For debugging purposes I do not intend to clear the documents from the collection when the lock is removed, but just have a 'processing' boolean in the
[10:42:49] <LoonaTick> document. When I update this lock I update the processing field, where name = 'lockname' and processing = false. I was wondering: In concurrent situations, will the update statement always return a correct value if another thread has already updated the row? In other words: With the atomicness, are the criterions of the query checked within that atomic lock in MongoDB?
[10:43:40] <LoonaTick> I mean: will the update statement always return a correct value of the number of documents affected by the query?
[10:43:42] <kali> LoonaTick: you need findAndModify
[10:43:55] <LoonaTick> kali: Thanks very much, I will read in to that
[10:51:01] <LoonaTick> kali: If I understand the documentation properly, I have to do findAndModify on the document. I make it return the old version of the document and check if the processing boolean was false before that query updated the document. Is that correct?
[10:52:40] <kali> LoonaTick: mmmm yeah that's the idea
[10:52:55] <LoonaTick> thanks again!
[11:02:38] <_boot> Hi there, I was wondering if it was possible to $push the entire document in aggregation?
[11:03:19] <Derick> _boot: I don't think so
[11:03:32] <Derick> _boot: I do think there is a ticket for it... let me see
[11:03:48] <_boot> hmm, didn't think so :( never mind
[11:04:17] <Derick> _boot: https://jira.mongodb.org/browse/SERVER-5916?jql=project%20%3D%20SERVER%20AND%20text%20~%20%22push%20full%20document%20aggregation%22
[11:04:23] <Derick> implemented for 2.5.3 it seems
[11:04:34] <_boot> nice
[11:05:03] <_boot> okay then, I'll bookmark this and revisit when the next version is released - thanks!
[11:07:12] <Derick> _boot: actually, if I read that right, you can already use $$ROOT
[11:07:30] <_boot> hmm, I'll give it a try in a moment
[11:08:15] <Derick> nope
[11:09:01] <ppetermann> i wish there was a proper hhvm mongo implementation
[11:11:26] <_boot> oh well ;)
[11:11:29] <Derick> _boot: it should be in, but I can't get it to work
[11:12:16] <_boot> it's not a huge problem
[11:13:00] <Derick> yeah, but it is
[11:13:05] <_boot> heh
[11:13:06] <Derick> it's shown as fixed, but it doesn't work :-)
[11:13:23] <_boot> says fix version 2.5.3 though
[11:13:30] <Derick> yup, I am running 2.5.4
[11:13:33] <_boot> oh right
[11:14:18] <_boot> i'll check a box with 2.5.4 too
[11:14:33] <Derick> it's a beta pre-release though
[11:14:39] <Derick> only even numbers are GA releases
[11:15:56] <_boot> works for me on 2.5.4
[11:17:39] <Derick> _boot: I've asked the author
[11:37:54] <Secretmapper_> guys if I use subdocuments
[11:37:58] <Secretmapper_> how can I reference them?
[11:38:26] <Secretmapper_> e.g. if I have a User that has an Assets Subdocument
[11:38:46] <_boot> { "assets.something" ...
[11:40:46] <Secretmapper_> sorry I just realized how to do it as I was typing out
[11:40:49] <Secretmapper_> :)
[11:41:19] <Secretmapper_> Basically I am having trouble how to make 'sellable' assets
[11:41:37] <Secretmapper_> because I don't have a unique ID for each of them like if I were using a RDBMS
[11:53:45] <Derick> _boot: I made a mistake testing, $$ROOT works fine with 2.5.5-pre
[11:54:56] <_boot> works on 2.5.4 too
[11:56:44] <Derick> yes, it should :-)
[11:57:02] <_boot> ;D
[12:38:03] <Richhh> so if i db.c.insert({'k':'v'}) where k is a unique string key, how can i lookup the object with the key 'k'?
[12:38:42] <Derick> Richhh: you need to redesign that to: { key: 'k', value: 'v' }
[12:38:48] <Derick> don't have arbitrary key names
[12:39:37] <Richhh> ok, mongodb is not going to store the string 'key' repeatedly, then, i guess?
[12:39:42] <Derick> yes it is
[12:39:47] <Richhh> no way to avoid that?
[12:39:50] <Derick> no
[12:40:26] <Nodex> shortening the keyname is the knoly way to mitigate it
[12:40:32] <Nodex> knoly -> only
[12:41:51] <Richhh> seems like 1) it wastes at least 1 byte per object, 2) its going to iterate and so slow reads and writes
[12:42:24] <Richhh> both negligible?
[12:43:12] <Nodex> it is what it is, nothing can be done about it
[12:45:04] <Richhh> is there a key-value store disk database alternative?
[12:51:04] <Richhh> looking at maybe neo4j for that now
[12:58:01] <Richhh> seems to do the same
[12:58:12] <joannac> Richhh: iterate?
[12:58:20] <joannac> what do you mean by "iterate"?
[12:58:38] <Richhh> i mean when you query with .find(), it seems you can't do a lookup
[12:59:27] <Richhh> rather, mongodb is searching through objects
[12:59:29] <joannac> Have you got an index on "k"?
[13:00:28] <Derick> joannac: "k" is arbitrary, you can't have an unbound amount of arbitrary keys if you want to index them all...
[13:01:07] <joannac> Derick: I thought we were telling him to do {k: "key", v: "value"}
[13:01:17] <joannac> and then he can index on "k" to get an index on keys
[13:01:21] <Derick> oh yes
[13:01:24] <joannac> his arbitrary keys*
[13:02:57] <Richhh> i'm new to mongodb, trying to design my db, i can do ensureIndex({'sometag':1},{unique:true}), yes
[13:03:34] <Richhh> or 'someid'
[13:03:52] <Derick> yes you can
[13:06:24] <Richhh> if i wanted to lookup about 100 strings from a list of ~1M:1Bn strings, how am I going to do that?
[13:06:49] <Richhh> each string can have an associated unique id
[13:07:07] <Richhh> then i'd like in one query to look them all up, and return the array of them
[13:09:21] <Nodex> neo4j is a graphdb
[13:10:00] <Richhh> so if i do ensureIndex({'k':1},{unique:true}), is mongodb going to be able to lookup those strings, or is it going to iterate through objects to find them?
[13:10:24] <Richhh> Nodex: as i understand, a graphdb can implement a key-value store
[13:10:35] <Nodex> db.foo.find({k:"somestring"}); <--- will use an index
[13:11:03] <Nodex> and "right tool for the job" r/e "can be implemented as"
[13:11:43] <Nodex> by that I mean - use the right tool for the job
[13:11:50] <Richhh> r/e = rarely equals?
[13:12:01] <Nodex> regarding
[13:12:11] <Richhh> k
[13:12:56] <Richhh> internally, if mongodb is using an index, is it doing a simple lookup, and not iterating over objects in a collection?
[13:13:21] <dandre> hello,
[13:13:28] <Nodex> yes, it's using a btree iirc
[13:13:34] <dandre> please see: http://pastebin.fr/32498
[13:15:16] <Nodex> {foo:1},{bar:1}
[13:16:00] <Nodex> if you mean "bar" with only foo:1 in it then no, not without an aggregation / map/reduce
[13:17:17] <Richhh> binary tree would still not be a lookup like with a KVP, and so still be slow for fetching 100s of objects from among a large number of objects, wouldnt it?
[13:17:45] <Richhh> talking about 100s of randomly distributed objects among a very large set of objects
[13:17:53] <Nodex> the cursor fetches the documents
[13:18:19] <Richhh> or not 100s, but up to 100
[13:18:45] <Nodex> it's this simple. Mongo will not be as fast as an IN MEMORY key/value store
[13:18:57] <Nodex> it does however offer persistence, sharding and rich querying
[13:19:09] <Nodex> if it doesn't fit your needs then use something else
[13:19:18] <Richhh> but im not asking for in memory
[13:19:31] <Richhh> just a hard disk key-value store
[13:20:31] <Richhh> seems like its not the right tool here as u suggest
[13:21:06] <dandre> ok
[13:21:10] <dandre> thanks
[13:21:13] <Nodex> I don't have a clue what you're trying to achieve bar retrieve 100 objects from a collection
[13:22:22] <dandre> in fact bar can be relatively large and I wanted to try to reduce the amount of data returned
[13:25:21] <joannac> dandre: how big are we talking?
[13:25:45] <joannac> Richhh: how large are your objects? You may be able to fit the whole collection + indexes into RAM anyway...
[13:26:50] <Nodex> joannac : he said "Tb's"
[13:26:58] <Nodex> but that was a few hours ago
[13:29:31] <dandre> joannac: in my test case 400kB
[13:35:35] <Richhh> joannac: the data (values for each id) collection size could grow to TBs from the large number of objects, each object being {k:'uniuquek',v:'upTo100Bstring'}
[13:36:16] <Richhh> joannac: the collection size could grow to TBs from the large number of objects, each object being {k:'uniuquek',v:'upTo100Bstring'}*
[13:36:52] <Richhh> 'uniquek'*
[13:38:24] <Richhh> large meaning millions to billions of objects
[13:40:10] <joannac> well, okay
[13:40:22] <joannac> but your index would still fit in memory
[13:40:54] <joannac> so you'd be restricted to 100*(size of document) needed to fit on disk
[13:43:03] <Richhh> sorry I'm new to all this, what do you mean by my index would still fit in memory?
[13:43:47] <joannac> well... we talked about creating an index for lookup, right?
[13:43:53] <joannac> that index will fit in memory
[13:44:12] <joannac> and then when you do a query, we look for matching documents in the index (fast)
[13:44:25] <joannac> and then go to disk for the full contents of the document (potentially slow)
[13:45:50] <dandre> Nodex: if you mean "bar" with only foo:1 in it then no, not without an aggregation / map/reduce
[13:45:50] <dandre> what could be the command to use for this?
[13:51:42] <Richhh> so, if im wanting to fetch 100 or so of these objects from a collection of Ms:Bns, is mongodb the best tool for that?
[13:51:51] <Richhh> ( joannac )
[13:53:06] <Richhh> (fetch them given their unique keys)
[13:58:23] <Richhh> reading about mongodb indexing now
[14:27:01] <Nodex> what is "Ms:Bns"?
[14:27:50] <BurtyB> A female called :Bns maybe
[14:28:01] <dandre> Ok I've found with $project then $unwind then match
[14:29:53] <Nodex> Makes sense BurtyB !
[15:37:18] <Richhh> joannac: Nodex Derick zamnuts thanks all for your help btw
[15:40:39] <Nodex> no problemi
[15:40:43] <Nodex> -i +o
[15:41:45] <sum1> hi
[15:42:17] <sum1> I have auth=true at config level. Is there a chance to leave some db without auth needed?
[15:47:03] <cheeser> i don't think so
[15:47:53] <sum1> thx cheeser ;-(
[16:13:29] <Mmike> Hello! Is there a way to have secondaries 'removed' from the mongodb replset cluster while it's catching up with the rest of the cluster?
[16:14:34] <Mmike> I'm doing a repair that takes cca 6 hours to complete. When I fire up repaired secondary it takes cca 20-30 minutes for it to be in sync with the rest of the cluster - but during that time mongod is accepting connections - other than firewall, is there a way to tell mongod not to allow connections untill it syncs?
[16:48:05] <Joeskyyy> Mmike: You can try a hidden replset member until you're all caught up?
[16:48:06] <Joeskyyy> http://docs.mongodb.org/manual/tutorial/configure-a-hidden-replica-set-member/
[16:52:06] <brendan6> Does anyone know if there is a way to create an index so that that the query db.collection.find({'foo.0.bar': 1}) uses an index for a document {foo: [{bar: 1},{bar: 2}]}?
[16:52:56] <michaelholley> I'm looking to find an official word of supported distribution of Linux. Any suggestions of where to look?
[16:53:56] <michaelholley> I've looked all over mongodb.org and haven't found anything stated beyond RHEL, EC2, and Azure.
[16:55:15] <Derick> michaelholley: http://www.mongodb.org/downloads
[16:55:23] <Derick> look at "Packages" at the bottom
[16:56:51] <michaelholley> Derick: So if my company deploy's mongodb on SLES 11 we can't get support because mongoDB doesn't build a package for it?
[16:57:28] <Derick> michaelholley: hmm, that I don't know!
[16:57:37] <Derick> michaelholley: let me ask
[16:57:44] <michaelholley> Derick: Thanks!
[17:05:55] <Joeskyyy> michaelholley: You should be able to just pull the files from this tutortial: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-linux/
[17:06:09] <Joeskyyy> Granted, upgrades and such are little more painful without package managers
[17:06:21] <Joeskyyy> But it works. (On my OpenSuse install at least)
[17:08:37] <Mmike> Joeskyyy, thnx, will see how that will work. There is an issue with re-election, and my cluster being unavailable for 10-20 seconds, which I can't really have :/
[17:10:10] <Mmike> Joeskyyy, mongod docs say 'do not run arbiter on replicaset member' - do you know why is that?
[17:10:35] <Joeskyyy> Because if that part of the repl set goes down, so does your arbiter.
[17:10:47] <Joeskyyy> Which is pretty essential in an election process if you have one.
[17:11:24] <Joeskyyy> Typically you run an arbiter on a node outside of a repl set member, like say a mongos or something lightweight. Just so you have an extra voting member without having to sync data.
[17:11:57] <michaelholley> Joeskyyy: Thanks for the link. You are right, I would prefer a package manager, but at least there is a manual install method.
[17:15:37] <remonvv> \o
[17:16:49] <Purefan> Hi everyone! Not sure Im understanding the docs properly, could someone please telle me if in replication one of my shards (lets say I have 10 collections and 2 shards) goes down will the apps still be able to get info from the db?
[17:17:27] <joannac> do you mean shard or relica set node?
[17:18:07] <Purefan> shard, Im not setting specific replicas so as I understand each shard would be its own and only replica
[17:34:38] <Joeskyyy> Purefan: If your shard goes down, then the data contained in that shard is also inaccessible, you'd need a sharded replset for autofailover and sharding
[17:41:00] <michael_____> Hi all, is it a problem to add tracking events to each of 1000+ embedded documents (e.g. blowing up the whole document?)
[17:41:28] <joannac> you might hit the document size limit
[17:42:17] <michael_____> so is it a better approach to store a referenced document for each embedded document which will contain all the tracking?
[18:18:21] <monmortal> hi gents I added node 4 to a single replicaset cluster but made the mistake of starting it with out my mongodb config file, only the default
[18:18:24] <monmortal> node wont add
[18:18:27] <monmortal> :(
[18:18:44] <monmortal> so I turned it off and added correct vonfig and still wont add
[18:18:57] <monmortal> this is with rsynxinf entire /data/mongo each time
[18:19:00] <monmortal> and checking perms
[18:19:07] <monmortal> no I redid the vm
[18:19:14] <monmortal> removed it from replicaset
[18:19:20] <monmortal> and added new on with proper config
[18:19:22] <monmortal> still ERROR!!
[18:19:24] <monmortal> why?
[18:19:29] <monmortal> do I have corrupt data?
[18:19:37] <monmortal> it says to run a fix on data or something
[18:19:48] <monmortal> mongo will start then down itself
[18:20:00] <monmortal> ERROR: writer worker caught exception: bad offset:0 accessing file: /data/mongo/ApplicantSearchIndexer.0
[18:20:06] <monmortal> now the 3 orig nodes are up
[18:20:20] <monmortal> but previously I could simply rs.add nodes with data synced and no problem
[18:20:24] <monmortal> no I fear data dmage
[18:20:32] <monmortal> now I fear data damage
[19:04:19] <kali> what ? you still have a working RS... why would you have data loss ?
[19:04:36] <kali> show us the full log of the failing instance
[19:11:49] <treaves> For queries on array fields, why is there no $contains operator?
[19:12:23] <treaves> I have to use $all, and specify an array with a single argument, in order to find documents with an array field that contain a value I want.
[19:13:01] <treaves> Whereas these two (in this case) are symatically the same, I'd think the cost to execute would be much higher.
[19:14:43] <kali> treaves: "a": "b" will match a: ["b"]
[19:15:59] <treaves> kali: I did not realize that. Thanks!
[19:16:42] <treaves> (although that's a bit not-obvious)
[19:18:39] <TheDracle> http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb
[19:18:46] <TheDracle> So, I'm storing time series data in my mongodb.
[19:18:59] <TheDracle> And, at the moment I'm using a flat row by row model like described above.
[19:19:13] <TheDracle> I actually started with a model more similar to what they describe, with embedded blocks of time series data.
[19:19:27] <TheDracle> But the issue is- the data I have is spurious, and comes at any time.
[19:19:34] <TheDracle> And I want it to immediately update into the database.
[19:19:43] <TheDracle> Like, I don't want to cache a minute worth of data before doing an insert.
[19:19:59] <TheDracle> I want to push the very first data point, so people can see it immediately as it occurs in the database.
[19:20:19] <TheDracle> So.. The issue was, I was performing 'update' on an embedded array, and inserting the new values.
[19:20:32] <TheDracle> And it ended up being very very slow...
[19:20:59] <TheDracle> It seems like the only way to make it work well is to basically wait for a minutes worth of data, and do a single insert with a document containing that data embedded internally.
[19:22:06] <TheDracle> With the row by row model, the insertion is fast, but reading it out is slow.
[19:25:19] <whaley> TheDracle: it's slow because mongo has to allocate new disk space and move the entire document over with your appended values added once the document size goes over a certain threshold. It's easily the biggest performance problem I have with my system's usage of mongo at present.
[19:25:59] <TheDracle> whaley, Yeah, I assumed as much.. It also started causing the size of the document to explode.
[19:26:18] <whaley> TheDracle: when you say the row by row model is slow, what is slow about it? have you tried using aggregation?
[19:26:21] <TheDracle> whaley, Have you done any profiling to figure out where exactly it starts to move the document?
[19:26:47] <whaley> TheDracle: one of my coworkers has a pretty detailed ticket with 10gen on it. let me ask
[19:26:47] <TheDracle> whaley, Just the read out of the data is slow- I.E: for the reasons described in the above recommendation on storage of time series data.
[19:26:52] <TheDracle> Since it can store it anywhere on the disk.
[19:27:04] <whaley> TheDracle: you might be able to help that with proper indices
[19:27:07] <TheDracle> When I do a Collection.find() to get all of the data, it has to seek for every document that is dispersed.
[19:27:10] <TheDracle> Alright.
[19:27:38] <TheDracle> So, I'm sorry for my ignorance, but how would indices help here?
[19:28:09] <TheDracle> And if you could provide me with a link to that ticket, it would be really appreciated.
[19:30:01] <whaley> TheDracle: because the reads wouldn't involve scanning the full collection
[19:30:15] <TheDracle> Ahh.. so- that's the issue, I'm pulling the full collection pretty much.
[19:30:19] <whaley> TheDracle: it's not a public ticket. we have paid support
[19:30:41] <TheDracle> I'm basically plotting out 2-weeks worth of data.
[19:30:51] <TheDracle> And I basically take data older than two weeks, and move it into another collection.
[19:31:38] <TheDracle> So really I'm just doing Data.find(), taking all of the JSON, and plotting it.
[19:32:00] <TheDracle> And with even ~600 points of data, which is pretty small, it seems a lot slower than it ought to be.
[19:32:25] <TheDracle> When I start getting into 5000 + it's awful.
[19:32:46] <Nodex> with time sensitive data I tend to cache it in redis first then move it after and use redis for my reads
[19:32:51] <Nodex> reads/writes
[19:33:12] <TheDracle> Nodex, Hm, yeah, I was worried something like that would be necessary.
[19:33:47] <Nodex> I have a rather large server (RAM) dedicated for that kind of thing, then move it to Mongo for persistence
[19:34:05] <TheDracle> I was thinking maybe I could have a process that every hour or something bundles up all of the data into a structure like above...
[19:34:21] <TheDracle> I.E: Pulls the single data points, bundles them into embedded documents, and removes the previous ones...
[19:34:21] <Nodex> I store the raw (json in my case) as files in gridfs and store aggregates in mongo after the 2 weeks (a month in my case)
[19:34:36] <TheDracle> I'm just scared it will do something weird, and cause an efficiency issue elsewhere.
[19:35:00] <Nodex> the rest is all read/written to Mongo in a queue and I use a redis slave for that also
[19:35:46] <Nodex> the pain is that modifying data is a migration :/
[19:36:23] <Nodex> luckily my data is pretty uniform and fits nicely, ironcically the adverse of unstructured data
[19:36:36] <TheDracle> Yeah... It sounds a lot more painful than I had hoped.
[19:37:22] <Nodex> it really depends on your data tbh
[19:37:36] <TheDracle> It's really simple data.
[19:37:47] <TheDracle> Like, a timestamp, and a true/false or scalar value.
[19:38:00] <TheDracle> It's coming from Z-Wave sensors in a home.
[19:38:04] <Nodex> sounds perfect for a redis hash
[19:38:08] <TheDracle> So, someone opens the door, and it creates a piece of data.
[19:38:12] <TheDracle> Yeah.
[19:38:23] <TheDracle> The thing is I'm using a reactive data model to push updates to my front end.
[19:38:34] <TheDracle> And sort of using mongo as the single backend for that works really well for 90% of everything.
[19:38:49] <TheDracle> It's just this high-arity data that I need to preserve a lot of is the one case where it doesn't work well.
[19:39:20] <Nodex> :/
[19:39:31] <TheDracle> Nodex, I'll check out Redis..
[19:39:48] <TheDracle> It seems like there ought to maybe just be some sort of formal Redis system for solving this problem.
[19:40:42] <Nodex> for persistence you will want summit like mongo
[19:41:23] <brendan6> Does anyone know if there is a way to create an index so that that the query db.collection.find({'foo.0.bar': 1}) uses an index for a document {foo: [{bar: 1},{bar: 2}]}?
[19:41:53] <TheDracle> I think I'm going to try the sort of garbage collector approach first. I.E: Every hour process comes by, and bundles everything from the last hour into a document, and then trashes the rest of the data.
[19:42:04] <brendan6> explain() tells me that no index is used when I expect db.collection.ensureIndex({'foo.bar': 1}) to be used
[19:45:53] <Nodex> brendan6 : if you have a document {foo: [{bar: 1},{bar: 2}]} and you do db.foo.ensureIndex({"foo.bar":1}); then all of bar should be indexed
[19:46:25] <Nodex> can you pastebin your explain() and your db.foo.getIndexes();
[19:50:57] <cortexman> any idea about this?
[19:51:18] <cortexman> $ wc -l sonic_lex.csv # 1465427 sonic_lex.csv (this means ~1.5 million lines)
[19:51:34] <cortexman> mongoimport --host 127.0.0.1 --port 3002 -d meteor -c lexicon --type csv --file sonic_lex.csv --headerline [….] Wed Jan 29 12:47:11.788 imported 734928 objects
[19:51:45] <cortexman> only around half of lines imported?!
[20:11:09] <cortexman> Hi.
[20:11:17] <cortexman> what is the correct way to escape a double quote when using mongoimport?
[20:16:15] <NaN> hi guys
[20:16:51] <NaN> on this collection > http://pastie.org/private/mg70dolewktolhiq937w < how could be the query that get's me all the 'parent' : '23489AJ'?
[20:18:08] <joannac> MongoDB doesn't support searching with a wildcard on keysWhy do you have custom keys?
[20:20:22] <NaN> I don't know, the collection cames from a json, but I suppose I will need to regenerate it
[20:34:39] <monmortal> mongo is a bastard
[20:35:01] <monmortal> it should just work
[20:35:04] <monmortal> but nooo
[20:35:13] <monmortal> oit buggers be with a monringstar
[21:14:58] <pinvok3> Good evening guys. I'm writing a small web-based feed application for myself. Ive found that mongodb is a bit more suitable for storing rss data than mysql. My question is, what is the best way to update an document, when new feeds are available? I have a collection for each feed, and only one document. This document should contain an array of ~300 feed items. Newer ones get added, older ones
[21:15:08] <pinvok3> removed. Can someone help me a bit?
[21:15:46] <pinvok3> Or should I create a new document for every feed item?
[21:26:51] <a|3x> i am getting an exception when i am trying to search only for SOME keywords b['collectionname'].runCommand('text', { search: 'some keywords' ,limit:1, language: 'english' , filter: {"removed":{"$exists":false}} }); "exception: BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB)"
[21:27:19] <a|3x> the strange thing is i only get this error if filter is specified, no filter returns 1 element
[21:28:27] <a|3x> any filter has this effect, even {"_id":{"$nin":[]}}
[21:40:49] <brendan6> Nodex: Here is the pastebin outlining http://pastebin.com/HFZ4eYm8
[21:41:09] <brendan6> Nodex: MongoDB shell version: 2.4.8
[21:46:37] <Mmike> Joeskyyy, but, if I have 4 boxes, and I put arbitrer on one, when that box goes down there will still be three node members and they'll be able to vote... no?
[21:49:01] <Joeskyyy> Correct. I think the documentation means something like
[21:49:48] <Joeskyyy> If you have three servers, each having a member of a replset, don't throw your arbiter on one of those three servers. Because then you won't have a majority if the server that goes down is the one with your arbiter on it.
[21:50:38] <Joeskyyy> The point of an arbiter is to be agnostic of your replset functions, not affected by the possibility of something happening with a member of the replset.
[21:50:50] <Mmike> Hm, but with 3 boxes I don't need an arbiter, right?
[21:51:28] <Joeskyyy> Correct. There's a lot of here's and theres with the examples
[21:51:42] <Joeskyyy> but it's just a little pointless to put an arbiter on a point of failure.
[21:51:57] <Joeskyyy> Because you can have two points of failure where there would typically only be one.
[21:52:01] <Joeskyyy> Just bad practice is all.
[21:52:04] <Mmike> But what harm can it do?
[21:52:30] <Joeskyyy> None, in reality. Unless you hit a snag like I mentioned. Again, it's just bad practice.
[21:52:50] <Mmike> I have 4 boxes in replset. I have arbiter on 5th box so I have 5 voting members. If one box goes down, all is gut. If box with arbiter goes down, all is also gut.
[21:53:47] <Joeskyyy> What's the point of your arbiter then? If one box goes down you still have 3/4 to decide.
[21:54:03] <Joeskyyy> Still a majority for the decision.
[21:54:34] <Joeskyyy> Taking the same example above, two nodes go down, one of those two has your arbiter on it. Now you only have 2/5 to vote.
[21:54:45] <Joeskyyy> Where as if your arbiter was on it's own low risk node, youd still have 3/5
[21:55:39] <Joeskyyy> You can take that same example and keep adding more nodes and the results would still be the same, just on a higher scale.
[21:56:18] <Mmike> I see.
[21:56:36] <Mmike> But, I don't need an arbiter in the first place then? If I have 4 boxes...
[21:56:46] <Mmike> Aha, I do, if I want my cluster to work on 2 boxes (+ arbiter)
[21:56:54] <Joeskyyy> You don't "need" one. I'd recommend one.
[21:56:57] <Joeskyyy> Correct.
[21:57:12] <Joeskyyy> Running an arbiter on a teeny tiny no risk box would be excellent for your cluster.
[21:57:23] <Joeskyyy> That way if you have two nodes go down, you're not completely f'ed
[21:57:58] <Mmike> yup, that makes sense.
[21:58:02] <Mmike> Thnx for the elaborate :)
[22:00:36] <Joeskyyy> anytime
[22:10:38] <brucelee_> is there a recommended size for a database server?
[22:10:41] <brucelee_> before creating another shard
[22:10:52] <brucelee_> or does it not matter
[22:23:01] <tongcx> hi guys, i have a big database and I want to map these data to another collection which just keep some fields, what's the best way to do it?
[22:23:17] <tongcx> i know i can use mapreduce, but the reduce step is not necessary and it will slow things down
[22:28:06] <monmortal> dont us emongo
[22:28:12] <monmortal> www.prevayler.org use this
[22:28:16] <monmortal> aw yeah
[22:28:21] <monmortal> zodb might not be bad either
[22:28:26] <tongcx> hi guys, i have a big database and I want to map these data to another collection which just keep some fields, what's the best way to do it?
[22:28:29] <tongcx> i know i can use mapreduce, but the reduce step is not necessary and it will slow things down
[22:29:12] <monmortal> use common lisp
[22:30:20] <tongcx> monmortal: are u talking to me?
[22:30:40] <monmortal> yeah
[22:30:53] <monmortal> backward chaining might help
[22:49:40] <monmortal> zodb!
[22:51:04] <tongcx> monmortal: thanks
[22:51:16] <monmortal> well
[22:51:30] <monmortal> I love single lane rotaries see
[22:51:35] <monmortal> simplicity
[22:52:11] <monmortal> what confuses me about mongo is what is to stop new data incoming from corrupting old data?
[22:52:15] <monmortal> whats to stopit?
[23:09:22] <tongcx> hi, if i write a script.js and mongo script.js, how do I change the db in script.js, if i write 'use xxx', there is error
[23:13:50] <ehershey> tongcx: try db = db.getSiblingDB('otherdb');
[23:17:03] <schmidtj> Hello. Does anyone have a systemctl service definition file to run MongoDB as a service? I could create one, but this has got to be something someone else has already done before me. Anyone know where I could find one?
[23:17:39] <schmidtj> pmxbot: help
[23:19:58] <monmortal> what os?
[23:23:01] <blizzow> okay, total n00b question here. I have a collection called foo with index bar. I want to do a count of all indexes where the value of bar is set to 1. I failed when trying this -
[23:23:01] <blizzow> db.domains.count( { active: '1' } )
[23:24:31] <blizzow> Anyone know what I'm doing wrong?
[23:29:37] <monmortal> u didnt read the manual
[23:29:40] <monmortal> ;)
[23:34:53] <blizzow> monmortal. I'm looking at the little mongodb book and have tried a couple of different ways to query, but am obviously doing something wrong. Pardon my retardedness, I figured I'd ask for some clarification here.
[23:35:44] <Joeskyyy> blizzow: Is the value set to the string value "1" or the integer value 1?
[23:36:20] <blizzow> I think it's integer value 1.
[23:36:42] <Joeskyyy> Try db.domains.count({active: 1}) then
[23:36:47] <Joeskyyy> Without the '' around the 1.
[23:36:53] <Joeskyyy> That would signify a non integer value
[23:39:21] <Joeskyyy> Rather, putting the value IN the '' signifies a non-integer value
[23:41:33] <monmortal> I ask for help here
[23:41:36] <monmortal> not get much
[23:41:49] <monmortal> my life is sad
[23:41:50] <blizzow> It's returning 0 no matter what I try.
[23:46:31] <blizzow> I posted an image here http://i.imgur.com/C3sX3e1.png of what I'm trying to grab (a count of all records in the active column that are set to 1).
[23:53:07] <schmidtj> monmortal: OpenSuse 13.1, which uses the 'systemctl' startup/service manager package.
[23:55:10] <schmidtj> err, excuse me, 'systemd'
[23:55:18] <schmidtj> systemctl is the command to interact with systemd
[23:55:38] <schmidtj> Ahh well, I guess I'll have to follow up on this tomorrow. I really need to call it a night
[23:55:43] <schmidtj> goodbye
[23:56:05] <Joeskyyy> What does db.domains.find({active:1}).count() return?
[23:56:11] <Joeskyyy> A little redudant, but I'm curious.