pmxbot IRC Log Viewer

[00:58:26] <proteneer> how do I update a field in a document without deleting the other fields?

[00:58:38] <proteneer> i have a manager document with 3 fields

[00:58:42] <proteneer> mdb.managers.update({'_id': email}, {'token': new_token})'

[00:58:53] <proteneer> but that wiped out the 3rd field

[01:02:08] <proteneer> nm

[01:02:09] <proteneer> figured it out

[01:29:29] <queuetip> proteneer: what did you do?

[01:29:39] <queuetip> proteneer: i'm interested in your solution

[01:29:54] <proteneer> $set

[01:30:45] <queuetip> can you elaborate?

[01:30:58] <queuetip> i'm new to mongodb and this came up for me and i ended up just updating all the fields :/

[01:45:05] <mcr> is it hard to move to MongoDB.pm 0.702, where MongoDB::MongoClient from 0.45 (with) MongoDB::Connection. 0.45 is packaged in debian, and therefore much easier to deploy.

[01:45:17] <mcr> I'm looking at why Connection() won't connect to IPv6 addresses.

[01:53:31] <mcr> reading source code, no IPv6 support.

[02:25:13] <thesheff17> I'm using pymongo... this works: output = mongo_collection.update({"revision": {"$gte": 1, "$lte": 5}},{"$set": {"status": False}}, multi=True) but this does not output = mongo_collection.update({"revision": {"$gte": startValue, "$lte": endValue}}, {"$set": {"status": checkBoxValue}}, multi=True)

[02:25:35] <thesheff17> any ideas?...do I need to cast the types of startValue, endValue, and checkBoxValue...these are all unicode obj types

[02:29:40] <thesheff17> I got it...need to cast everything to the right type

[06:06:06] <mark___> anybody using moongoose-rbac

[09:39:00] <Richhh> i just want to use mongodb as a key value store like redis, how can i best do this? im guessing using collection names as keys, containing a single object value is a bad idea?

[09:39:34] <Richhh> or maybe im using the wrong database

[09:41:00] <zamnuts> Richhh, use a collection to store a single KVP, from there, you could add meta (i.e. tag/label) to further isolate it from other unrelated KVPs

[09:41:40] <zamnuts> Richhh, although you need to be cautious of flooding the mongodb socket (?), upserts will start failing if you're doing too much too fast

[09:42:11] <zamnuts> Richhh, that is... without the right mongodb infrastructure

[09:42:28] <Richhh> gah

[09:45:16] <zamnuts> redis is typically in-memory, so it is supa-fast and can handle frequent reads/writes, that's not really what mongodb is for, but w/o knowing your application I am simply supplying the warning

[09:45:55] <zamnuts> Richhh, ^ + when you say "KVP like redis" - that's the first thing i thought of, keep in mind mongodb is essentially a KVP database...

[09:52:48] <Richhh> i want to implement 'sometag':[id,id,id...100 ids] and id:'string' in a way that enables the highest read and upsert throughput

[09:53:09] <Richhh> essentially tag:['string,'string',...] without dupes

[09:58:46] <ruphos> fwiw, redis sets should prevent dupes and sorted sets can also preserve order

[10:00:27] <Richhh> thanks ruphos , any idea zamnuts ?

[10:00:36] <Nodex> Richhh : why don't you just use redis? - that's what it's built for

[10:00:49] <ruphos> ^ agree

[10:01:16] <zamnuts> jw... is redis persistent? will it recover from power loss / reset?

[10:01:17] <Richhh> because the affordable capacity is not great enough

[10:01:23] <Richhh> zamnuts: yeah

[10:01:35] <Nodex> zamnuts : yes

[10:02:07] <Richhh> though i am giving it consideration

[10:03:02] <Nodex> what are you storing in your app?

[10:03:11] <Richhh> lots of strings

[10:04:04] <Nodex> and what are you hoping to achieve from using Mongo over Redis or vv?

[10:04:39] <Richhh> greater storage, I'll check vv now

[10:04:52] <Nodex> vv = vice versa

[10:04:53] <Nodex> lmfaoo

[10:04:57] <Nodex> -o

[10:05:02] <Richhh> oh

[10:05:06] <Richhh> thought it was some database

[10:05:12] <Richhh> lol

[10:05:29] <ruphos> Richhh: how many are you expecting to have?

[10:05:30] <Nodex> if your dataset will outgrow RAM then you can't really use redis

[10:06:44] <Richhh> possibly up to TBs

[10:07:13] <zen_> hi, ist there a way to find what locks my db, i mean what type of request

[10:07:20] <Nodex> then unless you're sharding in your app Redis is not a good choice beacuse RAM is more expensive

[10:07:29] <Nodex> zen_ : tail the log

[10:07:35] <zen_> k

[10:08:14] <zamnuts> Richhh, in what sense do you want to read/write to a database, whether it is redis or mongodb...? is this for cached data? how important is performance?

[10:08:17] <zen_> 25% locked on average is too high

[10:08:20] <zen_> ?

[10:08:50] <ruphos> not great, not terrible

[10:09:02] <ruphos> depends on the speed of replies you need vs what you're getting

[10:09:33] <zen_> first question was in the direction:

[10:09:40] <ruphos> high load will cause lock, as will slow queries

[10:09:51] <zen_> there's dex to find slow queries but nothing to find locking queries

[10:10:26] <zen_> just moved to a really way bigger server and thought that would fix my performance problems but didn't help

[10:11:55] <Nodex> perhaps you need some indexes

[10:12:22] <zamnuts> zen_, are they write locks?

[10:12:29] <zamnuts> or you don't know yet

[10:13:17] <zamnuts> zen_, w/ mongodb, vertical scaling is better than horizontal scaling, i.e. replication sets / sharding opposed to a machine with SSD

[10:13:46] <Nodex> read locks are shared, all they do is stop a write lock from happening and they're fast. It's highly doubtful that it's read locks that's causing your problems

[10:13:50] <zen_> know that, but thought sharding could wait a few months

[10:14:15] <Nodex> zamnuts : you also have the scaling the wrong way round. Horizontal scaling is better

[10:14:32] <zen_> no from mongostat i could say inserts an updates increase my lock %

[10:15:11] <Nodex> how big is your dataset?

[10:15:34] <zen_> i think the problem i have is "qr" in mongostat thats alwas like 20 or so

[10:15:52] <zen_> not dramatic, completely fit in ram

[10:16:01] <zen_> below a gig

[10:16:34] <zamnuts> Nodex, lawl, sorry brain fart there, ty for the edit

[10:16:36] <Nodex> and how big is your server?

[10:17:13] <kali> have you just grep nscanned in your logs ?

[10:17:14] <zen_> 24 cores 128GB ram !!!

[10:17:23] <zen_> should be enought ?

[10:17:25] <zen_> ;)

[10:17:32] <Nodex> have you tailed your logs

[10:17:38] <kali> yeah.

[10:17:39] <Richhh> zamnuts: throughput is more important than latency (if they are disentanglable) in both reads and writes, writes will be users adding a new string to the specified tag arrays (of strings), reads will be users querying with tags as keys for said string arrays as values

[10:17:43] <Nodex> and or grep nscanned as kali suggests?

[10:18:05] <Scud> Hi, im having trouble with the sort() operation. does mongodb allow me to apply the sort operation on a subdocument? e.g. entry: {foo:[obj1, obj2, obj3]} can i sort regarding parameters contained in objects 1..3?

[10:18:37] <zen_> missed that about nscanned

[10:18:44] <zen_> what does that show me?

[10:18:44] <zamnuts> zen_, what is your write concern?

[10:18:46] <Nodex> Richhh : I do similar things to that, I use Redis as a queue/cache in between the mongo persistence, gies me the best of both worlds

[10:19:03] <Nodex> it shows you what is locking your database (the query)

[10:19:21] <zen_> ah, answer to my question... sorry

[10:20:19] <kali> zen_: high figures there show slow queries scanning lot of documents. indexing the queries can help a lot

[10:21:55] <zen_> you know the dex tool. as far as that goes i have no slow queries

[10:22:27] <Nodex> tail -f /path/to/your/log

[10:22:39] <Nodex> don't rely on somehting else, look for yourself

[10:22:51] <zen_> ok

[10:23:13] <kali> the only dex tools I know about were used to kill people

[10:23:49] <Nodex> haha, on the table :P

[10:24:32] <zen_> i mean that: http://blog.mongolab.com/2012/06/introducing-dex-the-index-bot/

[10:24:59] <zen_> that is unreliable?

[10:25:02] <kali> ok great. now go and check out your logs :)

[10:25:17] <Nodex> lol

[10:25:26] <zen_> i'm on it, thanks so far

[10:28:25] <zamnuts> Richhh, i'm having trouble distinguishing your use case, you want to use it like you would redis/memcached but you are storing multiple TB of data

[10:38:10] <Richhh> zamnuts: i was thinking to store the arrays of indices to the strings in redis, then look them up with mongodb

[10:38:47] <Richhh> just .get() the whole array of indices

[10:39:09] <Richhh> i guess thats not gonna work because you'd have to query every one of them then

[10:39:14] <Richhh> i don't know

[10:40:37] <Scud> ah got it, didnt know i required the aggregate command

[10:40:57] <Richhh> i guess i need to think and research more about this

[10:42:49] <LoonaTick> Hi, I have a question. Reading in the MongoDB documentation I saw that MongoDB is atomic within 1 document. I'm trying to create an atomic lock around some piece of processing code, so I have a collection "lock", with a unique key on 'name'. For debugging purposes I do not intend to clear the documents from the collection when the lock is removed, but just have a 'processing' boolean in the

[10:42:49] <LoonaTick> document. When I update this lock I update the processing field, where name = 'lockname' and processing = false. I was wondering: In concurrent situations, will the update statement always return a correct value if another thread has already updated the row? In other words: With the atomicness, are the criterions of the query checked within that atomic lock in MongoDB?

[10:43:40] <LoonaTick> I mean: will the update statement always return a correct value of the number of documents affected by the query?

[10:43:42] <kali> LoonaTick: you need findAndModify

[10:43:55] <LoonaTick> kali: Thanks very much, I will read in to that

[10:51:01] <LoonaTick> kali: If I understand the documentation properly, I have to do findAndModify on the document. I make it return the old version of the document and check if the processing boolean was false before that query updated the document. Is that correct?

[10:52:40] <kali> LoonaTick: mmmm yeah that's the idea

[10:52:55] <LoonaTick> thanks again!

[11:02:38] <_boot> Hi there, I was wondering if it was possible to $push the entire document in aggregation?

[11:03:19] <Derick> _boot: I don't think so

[11:03:32] <Derick> _boot: I do think there is a ticket for it... let me see

[11:03:48] <_boot> hmm, didn't think so :( never mind

[11:04:17] <Derick> _boot: https://jira.mongodb.org/browse/SERVER-5916?jql=project%20%3D%20SERVER%20AND%20text%20~%20%22push%20full%20document%20aggregation%22

[11:04:23] <Derick> implemented for 2.5.3 it seems

[11:04:34] <_boot> nice

[11:05:03] <_boot> okay then, I'll bookmark this and revisit when the next version is released - thanks!

[11:07:12] <Derick> _boot: actually, if I read that right, you can already use $$ROOT

[11:07:30] <_boot> hmm, I'll give it a try in a moment

[11:08:15] <Derick> nope

[11:09:01] <ppetermann> i wish there was a proper hhvm mongo implementation

[11:11:26] <_boot> oh well ;)

[11:11:29] <Derick> _boot: it should be in, but I can't get it to work

[11:12:16] <_boot> it's not a huge problem

[11:13:00] <Derick> yeah, but it is

[11:13:05] <_boot> heh

[11:13:06] <Derick> it's shown as fixed, but it doesn't work :-)

[11:13:23] <_boot> says fix version 2.5.3 though

[11:13:30] <Derick> yup, I am running 2.5.4

[11:13:33] <_boot> oh right

[11:14:18] <_boot> i'll check a box with 2.5.4 too

[11:14:33] <Derick> it's a beta pre-release though

[11:14:39] <Derick> only even numbers are GA releases

[11:15:56] <_boot> works for me on 2.5.4

[11:17:39] <Derick> _boot: I've asked the author

[11:37:54] <Secretmapper_> guys if I use subdocuments

[11:37:58] <Secretmapper_> how can I reference them?

[11:38:26] <Secretmapper_> e.g. if I have a User that has an Assets Subdocument

[11:38:46] <_boot> { "assets.something" ...

[11:40:46] <Secretmapper_> sorry I just realized how to do it as I was typing out

[11:40:49] <Secretmapper_> :)

[11:41:19] <Secretmapper_> Basically I am having trouble how to make 'sellable' assets

[11:41:37] <Secretmapper_> because I don't have a unique ID for each of them like if I were using a RDBMS

[11:53:45] <Derick> _boot: I made a mistake testing, $$ROOT works fine with 2.5.5-pre

[11:54:56] <_boot> works on 2.5.4 too

[11:56:44] <Derick> yes, it should :-)

[11:57:02] <_boot> ;D

[12:38:03] <Richhh> so if i db.c.insert({'k':'v'}) where k is a unique string key, how can i lookup the object with the key 'k'?

[12:38:42] <Derick> Richhh: you need to redesign that to: { key: 'k', value: 'v' }

[12:38:48] <Derick> don't have arbitrary key names

[12:39:37] <Richhh> ok, mongodb is not going to store the string 'key' repeatedly, then, i guess?

[12:39:42] <Derick> yes it is

[12:39:47] <Richhh> no way to avoid that?

[12:39:50] <Derick> no

[12:40:26] <Nodex> shortening the keyname is the knoly way to mitigate it

[12:40:32] <Nodex> knoly -> only

[12:41:51] <Richhh> seems like 1) it wastes at least 1 byte per object, 2) its going to iterate and so slow reads and writes

[12:42:24] <Richhh> both negligible?

[12:43:12] <Nodex> it is what it is, nothing can be done about it

[12:45:04] <Richhh> is there a key-value store disk database alternative?

[12:51:04] <Richhh> looking at maybe neo4j for that now

[12:58:01] <Richhh> seems to do the same

[12:58:12] <joannac> Richhh: iterate?

[12:58:20] <joannac> what do you mean by "iterate"?

[12:58:38] <Richhh> i mean when you query with .find(), it seems you can't do a lookup

[12:59:27] <Richhh> rather, mongodb is searching through objects

[12:59:29] <joannac> Have you got an index on "k"?

[13:00:28] <Derick> joannac: "k" is arbitrary, you can't have an unbound amount of arbitrary keys if you want to index them all...

[13:01:07] <joannac> Derick: I thought we were telling him to do {k: "key", v: "value"}

[13:01:17] <joannac> and then he can index on "k" to get an index on keys

[13:01:21] <Derick> oh yes

[13:01:24] <joannac> his arbitrary keys*

[13:02:57] <Richhh> i'm new to mongodb, trying to design my db, i can do ensureIndex({'sometag':1},{unique:true}), yes

[13:03:34] <Richhh> or 'someid'

[13:03:52] <Derick> yes you can

[13:06:24] <Richhh> if i wanted to lookup about 100 strings from a list of ~1M:1Bn strings, how am I going to do that?

[13:06:49] <Richhh> each string can have an associated unique id

[13:07:07] <Richhh> then i'd like in one query to look them all up, and return the array of them

[13:09:21] <Nodex> neo4j is a graphdb

[13:10:00] <Richhh> so if i do ensureIndex({'k':1},{unique:true}), is mongodb going to be able to lookup those strings, or is it going to iterate through objects to find them?

[13:10:24] <Richhh> Nodex: as i understand, a graphdb can implement a key-value store

[13:10:35] <Nodex> db.foo.find({k:"somestring"}); <--- will use an index

[13:11:03] <Nodex> and "right tool for the job" r/e "can be implemented as"

[13:11:43] <Nodex> by that I mean - use the right tool for the job

[13:11:50] <Richhh> r/e = rarely equals?

[13:12:01] <Nodex> regarding

[13:12:11] <Richhh> k

[13:12:56] <Richhh> internally, if mongodb is using an index, is it doing a simple lookup, and not iterating over objects in a collection?

[13:13:21] <dandre> hello,

[13:13:28] <Nodex> yes, it's using a btree iirc

[13:13:34] <dandre> please see: http://pastebin.fr/32498

[13:15:16] <Nodex> {foo:1},{bar:1}

[13:16:00] <Nodex> if you mean "bar" with only foo:1 in it then no, not without an aggregation / map/reduce

[13:17:17] <Richhh> binary tree would still not be a lookup like with a KVP, and so still be slow for fetching 100s of objects from among a large number of objects, wouldnt it?

[13:17:45] <Richhh> talking about 100s of randomly distributed objects among a very large set of objects

[13:17:53] <Nodex> the cursor fetches the documents

[13:18:19] <Richhh> or not 100s, but up to 100

[13:18:45] <Nodex> it's this simple. Mongo will not be as fast as an IN MEMORY key/value store

[13:18:57] <Nodex> it does however offer persistence, sharding and rich querying

[13:19:09] <Nodex> if it doesn't fit your needs then use something else

[13:19:18] <Richhh> but im not asking for in memory

[13:19:31] <Richhh> just a hard disk key-value store

[13:20:31] <Richhh> seems like its not the right tool here as u suggest

[13:21:06] <dandre> ok

[13:21:10] <dandre> thanks

[13:21:13] <Nodex> I don't have a clue what you're trying to achieve bar retrieve 100 objects from a collection

[13:22:22] <dandre> in fact bar can be relatively large and I wanted to try to reduce the amount of data returned

[13:25:21] <joannac> dandre: how big are we talking?

[13:25:45] <joannac> Richhh: how large are your objects? You may be able to fit the whole collection + indexes into RAM anyway...

[13:26:50] <Nodex> joannac : he said "Tb's"

[13:26:58] <Nodex> but that was a few hours ago

[13:29:31] <dandre> joannac: in my test case 400kB

[13:35:35] <Richhh> joannac: the data (values for each id) collection size could grow to TBs from the large number of objects, each object being {k:'uniuquek',v:'upTo100Bstring'}

[13:36:16] <Richhh> joannac: the collection size could grow to TBs from the large number of objects, each object being {k:'uniuquek',v:'upTo100Bstring'}*

[13:36:52] <Richhh> 'uniquek'*

[13:38:24] <Richhh> large meaning millions to billions of objects

[13:40:10] <joannac> well, okay

[13:40:22] <joannac> but your index would still fit in memory

[13:40:54] <joannac> so you'd be restricted to 100*(size of document) needed to fit on disk

[13:43:03] <Richhh> sorry I'm new to all this, what do you mean by my index would still fit in memory?

[13:43:47] <joannac> well... we talked about creating an index for lookup, right?

[13:43:53] <joannac> that index will fit in memory

[13:44:12] <joannac> and then when you do a query, we look for matching documents in the index (fast)

[13:44:25] <joannac> and then go to disk for the full contents of the document (potentially slow)

[13:45:50] <dandre> Nodex: if you mean "bar" with only foo:1 in it then no, not without an aggregation / map/reduce

[13:45:50] <dandre> what could be the command to use for this?

[13:51:42] <Richhh> so, if im wanting to fetch 100 or so of these objects from a collection of Ms:Bns, is mongodb the best tool for that?

[13:51:51] <Richhh> ( joannac )

[13:53:06] <Richhh> (fetch them given their unique keys)

[13:58:23] <Richhh> reading about mongodb indexing now

[14:27:01] <Nodex> what is "Ms:Bns"?

[14:27:50] <BurtyB> A female called :Bns maybe

[14:28:01] <dandre> Ok I've found with $project then $unwind then match

[14:29:53] <Nodex> Makes sense BurtyB !

[15:37:18] <Richhh> joannac: Nodex Derick zamnuts thanks all for your help btw

[15:40:39] <Nodex> no problemi

[15:40:43] <Nodex> -i +o

[15:41:45] <sum1> hi

[15:42:17] <sum1> I have auth=true at config level. Is there a chance to leave some db without auth needed?

[15:47:03] <cheeser> i don't think so

[15:47:53] <sum1> thx cheeser ;-(

[16:13:29] <Mmike> Hello! Is there a way to have secondaries 'removed' from the mongodb replset cluster while it's catching up with the rest of the cluster?

[16:14:34] <Mmike> I'm doing a repair that takes cca 6 hours to complete. When I fire up repaired secondary it takes cca 20-30 minutes for it to be in sync with the rest of the cluster - but during that time mongod is accepting connections - other than firewall, is there a way to tell mongod not to allow connections untill it syncs?

[16:48:05] <Joeskyyy> Mmike: You can try a hidden replset member until you're all caught up?

[16:48:06] <Joeskyyy> http://docs.mongodb.org/manual/tutorial/configure-a-hidden-replica-set-member/

[16:52:06] <brendan6> Does anyone know if there is a way to create an index so that that the query db.collection.find({'foo.0.bar': 1}) uses an index for a document {foo: [{bar: 1},{bar: 2}]}?

[16:52:56] <michaelholley> I'm looking to find an official word of supported distribution of Linux. Any suggestions of where to look?

[16:53:56] <michaelholley> I've looked all over mongodb.org and haven't found anything stated beyond RHEL, EC2, and Azure.

[16:55:15] <Derick> michaelholley: http://www.mongodb.org/downloads

[16:55:23] <Derick> look at "Packages" at the bottom

[16:56:51] <michaelholley> Derick: So if my company deploy's mongodb on SLES 11 we can't get support because mongoDB doesn't build a package for it?

[16:57:28] <Derick> michaelholley: hmm, that I don't know!

[16:57:37] <Derick> michaelholley: let me ask

[16:57:44] <michaelholley> Derick: Thanks!

[17:05:55] <Joeskyyy> michaelholley: You should be able to just pull the files from this tutortial: http://docs.mongodb.org/manual/tutorial/install-mongodb-on-linux/

[17:06:09] <Joeskyyy> Granted, upgrades and such are little more painful without package managers

[17:06:21] <Joeskyyy> But it works. (On my OpenSuse install at least)

[17:08:37] <Mmike> Joeskyyy, thnx, will see how that will work. There is an issue with re-election, and my cluster being unavailable for 10-20 seconds, which I can't really have :/

[17:10:10] <Mmike> Joeskyyy, mongod docs say 'do not run arbiter on replicaset member' - do you know why is that?

[17:10:35] <Joeskyyy> Because if that part of the repl set goes down, so does your arbiter.

[17:10:47] <Joeskyyy> Which is pretty essential in an election process if you have one.

[17:11:24] <Joeskyyy> Typically you run an arbiter on a node outside of a repl set member, like say a mongos or something lightweight. Just so you have an extra voting member without having to sync data.

[17:11:57] <michaelholley> Joeskyyy: Thanks for the link. You are right, I would prefer a package manager, but at least there is a manual install method.

[17:15:37] <remonvv> \o

[17:16:49] <Purefan> Hi everyone! Not sure Im understanding the docs properly, could someone please telle me if in replication one of my shards (lets say I have 10 collections and 2 shards) goes down will the apps still be able to get info from the db?

[17:17:27] <joannac> do you mean shard or relica set node?

[17:18:07] <Purefan> shard, Im not setting specific replicas so as I understand each shard would be its own and only replica

[17:34:38] <Joeskyyy> Purefan: If your shard goes down, then the data contained in that shard is also inaccessible, you'd need a sharded replset for autofailover and sharding

[17:41:00] <michael_____> Hi all, is it a problem to add tracking events to each of 1000+ embedded documents (e.g. blowing up the whole document?)

[17:41:28] <joannac> you might hit the document size limit

[17:42:17] <michael_____> so is it a better approach to store a referenced document for each embedded document which will contain all the tracking?

[18:18:21] <monmortal> hi gents I added node 4 to a single replicaset cluster but made the mistake of starting it with out my mongodb config file, only the default

[18:18:24] <monmortal> node wont add

[18:18:27] <monmortal> :(

[18:18:44] <monmortal> so I turned it off and added correct vonfig and still wont add

[18:18:57] <monmortal> this is with rsynxinf entire /data/mongo each time

[18:19:00] <monmortal> and checking perms

[18:19:07] <monmortal> no I redid the vm

[18:19:14] <monmortal> removed it from replicaset

[18:19:20] <monmortal> and added new on with proper config

[18:19:22] <monmortal> still ERROR!!

[18:19:24] <monmortal> why?

[18:19:29] <monmortal> do I have corrupt data?

[18:19:37] <monmortal> it says to run a fix on data or something

[18:19:48] <monmortal> mongo will start then down itself

[18:20:00] <monmortal> ERROR: writer worker caught exception: bad offset:0 accessing file: /data/mongo/ApplicantSearchIndexer.0

[18:20:06] <monmortal> now the 3 orig nodes are up

[18:20:20] <monmortal> but previously I could simply rs.add nodes with data synced and no problem

[18:20:24] <monmortal> no I fear data dmage

[18:20:32] <monmortal> now I fear data damage

[19:04:19] <kali> what ? you still have a working RS... why would you have data loss ?

[19:04:36] <kali> show us the full log of the failing instance

[19:11:49] <treaves> For queries on array fields, why is there no $contains operator?

[19:12:23] <treaves> I have to use $all, and specify an array with a single argument, in order to find documents with an array field that contain a value I want.

[19:13:01] <treaves> Whereas these two (in this case) are symatically the same, I'd think the cost to execute would be much higher.

[19:14:43] <kali> treaves: "a": "b" will match a: ["b"]

[19:15:59] <treaves> kali: I did not realize that. Thanks!

[19:16:42] <treaves> (although that's a bit not-obvious)

[19:18:39] <TheDracle> http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

[19:18:46] <TheDracle> So, I'm storing time series data in my mongodb.

[19:18:59] <TheDracle> And, at the moment I'm using a flat row by row model like described above.

[19:19:13] <TheDracle> I actually started with a model more similar to what they describe, with embedded blocks of time series data.

[19:19:27] <TheDracle> But the issue is- the data I have is spurious, and comes at any time.

[19:19:34] <TheDracle> And I want it to immediately update into the database.

[19:19:43] <TheDracle> Like, I don't want to cache a minute worth of data before doing an insert.

[19:19:59] <TheDracle> I want to push the very first data point, so people can see it immediately as it occurs in the database.

[19:20:19] <TheDracle> So.. The issue was, I was performing 'update' on an embedded array, and inserting the new values.

[19:20:32] <TheDracle> And it ended up being very very slow...

[19:20:59] <TheDracle> It seems like the only way to make it work well is to basically wait for a minutes worth of data, and do a single insert with a document containing that data embedded internally.

[19:22:06] <TheDracle> With the row by row model, the insertion is fast, but reading it out is slow.

[19:25:19] <whaley> TheDracle: it's slow because mongo has to allocate new disk space and move the entire document over with your appended values added once the document size goes over a certain threshold. It's easily the biggest performance problem I have with my system's usage of mongo at present.

[19:25:59] <TheDracle> whaley, Yeah, I assumed as much.. It also started causing the size of the document to explode.

[19:26:18] <whaley> TheDracle: when you say the row by row model is slow, what is slow about it? have you tried using aggregation?

[19:26:21] <TheDracle> whaley, Have you done any profiling to figure out where exactly it starts to move the document?

[19:26:47] <whaley> TheDracle: one of my coworkers has a pretty detailed ticket with 10gen on it. let me ask

[19:26:47] <TheDracle> whaley, Just the read out of the data is slow- I.E: for the reasons described in the above recommendation on storage of time series data.

[19:26:52] <TheDracle> Since it can store it anywhere on the disk.

[19:27:04] <whaley> TheDracle: you might be able to help that with proper indices

[19:27:07] <TheDracle> When I do a Collection.find() to get all of the data, it has to seek for every document that is dispersed.

[19:27:10] <TheDracle> Alright.

[19:27:38] <TheDracle> So, I'm sorry for my ignorance, but how would indices help here?

[19:28:09] <TheDracle> And if you could provide me with a link to that ticket, it would be really appreciated.

[19:30:01] <whaley> TheDracle: because the reads wouldn't involve scanning the full collection

[19:30:15] <TheDracle> Ahh.. so- that's the issue, I'm pulling the full collection pretty much.

[19:30:19] <whaley> TheDracle: it's not a public ticket. we have paid support

[19:30:41] <TheDracle> I'm basically plotting out 2-weeks worth of data.

[19:30:51] <TheDracle> And I basically take data older than two weeks, and move it into another collection.

[19:31:38] <TheDracle> So really I'm just doing Data.find(), taking all of the JSON, and plotting it.

[19:32:00] <TheDracle> And with even ~600 points of data, which is pretty small, it seems a lot slower than it ought to be.

[19:32:25] <TheDracle> When I start getting into 5000 + it's awful.

[19:32:46] <Nodex> with time sensitive data I tend to cache it in redis first then move it after and use redis for my reads

[19:32:51] <Nodex> reads/writes

[19:33:12] <TheDracle> Nodex, Hm, yeah, I was worried something like that would be necessary.

[19:33:47] <Nodex> I have a rather large server (RAM) dedicated for that kind of thing, then move it to Mongo for persistence

[19:34:05] <TheDracle> I was thinking maybe I could have a process that every hour or something bundles up all of the data into a structure like above...

[19:34:21] <TheDracle> I.E: Pulls the single data points, bundles them into embedded documents, and removes the previous ones...

[19:34:21] <Nodex> I store the raw (json in my case) as files in gridfs and store aggregates in mongo after the 2 weeks (a month in my case)

[19:34:36] <TheDracle> I'm just scared it will do something weird, and cause an efficiency issue elsewhere.

[19:35:00] <Nodex> the rest is all read/written to Mongo in a queue and I use a redis slave for that also

[19:35:46] <Nodex> the pain is that modifying data is a migration :/

[19:36:23] <Nodex> luckily my data is pretty uniform and fits nicely, ironcically the adverse of unstructured data

[19:36:36] <TheDracle> Yeah... It sounds a lot more painful than I had hoped.

[19:37:22] <Nodex> it really depends on your data tbh

[19:37:36] <TheDracle> It's really simple data.

[19:37:47] <TheDracle> Like, a timestamp, and a true/false or scalar value.

[19:38:00] <TheDracle> It's coming from Z-Wave sensors in a home.

[19:38:04] <Nodex> sounds perfect for a redis hash

[19:38:08] <TheDracle> So, someone opens the door, and it creates a piece of data.

[19:38:12] <TheDracle> Yeah.

[19:38:23] <TheDracle> The thing is I'm using a reactive data model to push updates to my front end.

[19:38:34] <TheDracle> And sort of using mongo as the single backend for that works really well for 90% of everything.

[19:38:49] <TheDracle> It's just this high-arity data that I need to preserve a lot of is the one case where it doesn't work well.

[19:39:20] <Nodex> :/

[19:39:31] <TheDracle> Nodex, I'll check out Redis..

[19:39:48] <TheDracle> It seems like there ought to maybe just be some sort of formal Redis system for solving this problem.

[19:40:42] <Nodex> for persistence you will want summit like mongo

[19:41:23] <brendan6> Does anyone know if there is a way to create an index so that that the query db.collection.find({'foo.0.bar': 1}) uses an index for a document {foo: [{bar: 1},{bar: 2}]}?

[19:41:53] <TheDracle> I think I'm going to try the sort of garbage collector approach first. I.E: Every hour process comes by, and bundles everything from the last hour into a document, and then trashes the rest of the data.

[19:42:04] <brendan6> explain() tells me that no index is used when I expect db.collection.ensureIndex({'foo.bar': 1}) to be used

[19:45:53] <Nodex> brendan6 : if you have a document {foo: [{bar: 1},{bar: 2}]} and you do db.foo.ensureIndex({"foo.bar":1}); then all of bar should be indexed

[19:46:25] <Nodex> can you pastebin your explain() and your db.foo.getIndexes();

[19:50:57] <cortexman> any idea about this?

[19:51:18] <cortexman> $ wc -l sonic_lex.csv # 1465427 sonic_lex.csv (this means ~1.5 million lines)

[19:51:34] <cortexman> mongoimport --host 127.0.0.1 --port 3002 -d meteor -c lexicon --type csv --file sonic_lex.csv --headerline [….] Wed Jan 29 12:47:11.788 imported 734928 objects

[19:51:45] <cortexman> only around half of lines imported?!

[20:11:09] <cortexman> Hi.

[20:11:17] <cortexman> what is the correct way to escape a double quote when using mongoimport?

[20:16:15] <NaN> hi guys

[20:16:51] <NaN> on this collection > http://pastie.org/private/mg70dolewktolhiq937w < how could be the query that get's me all the 'parent' : '23489AJ'?

[20:18:08] <joannac> MongoDB doesn't support searching with a wildcard on keysWhy do you have custom keys?

[20:20:22] <NaN> I don't know, the collection cames from a json, but I suppose I will need to regenerate it

[20:34:39] <monmortal> mongo is a bastard

[20:35:01] <monmortal> it should just work

[20:35:04] <monmortal> but nooo

[20:35:13] <monmortal> oit buggers be with a monringstar

[21:14:58] <pinvok3> Good evening guys. I'm writing a small web-based feed application for myself. Ive found that mongodb is a bit more suitable for storing rss data than mysql. My question is, what is the best way to update an document, when new feeds are available? I have a collection for each feed, and only one document. This document should contain an array of ~300 feed items. Newer ones get added, older ones

[21:15:08] <pinvok3> removed. Can someone help me a bit?

[21:15:46] <pinvok3> Or should I create a new document for every feed item?

[21:26:51] <a|3x> i am getting an exception when i am trying to search only for SOME keywords b['collectionname'].runCommand('text', { search: 'some keywords' ,limit:1, language: 'english' , filter: {"removed":{"$exists":false}} }); "exception: BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB)"

[21:27:19] <a|3x> the strange thing is i only get this error if filter is specified, no filter returns 1 element

[21:28:27] <a|3x> any filter has this effect, even {"_id":{"$nin":[]}}

[21:40:49] <brendan6> Nodex: Here is the pastebin outlining http://pastebin.com/HFZ4eYm8

[21:41:09] <brendan6> Nodex: MongoDB shell version: 2.4.8

[21:46:37] <Mmike> Joeskyyy, but, if I have 4 boxes, and I put arbitrer on one, when that box goes down there will still be three node members and they'll be able to vote... no?

[21:49:01] <Joeskyyy> Correct. I think the documentation means something like

[21:49:48] <Joeskyyy> If you have three servers, each having a member of a replset, don't throw your arbiter on one of those three servers. Because then you won't have a majority if the server that goes down is the one with your arbiter on it.

[21:50:38] <Joeskyyy> The point of an arbiter is to be agnostic of your replset functions, not affected by the possibility of something happening with a member of the replset.

[21:50:50] <Mmike> Hm, but with 3 boxes I don't need an arbiter, right?

[21:51:28] <Joeskyyy> Correct. There's a lot of here's and theres with the examples

[21:51:42] <Joeskyyy> but it's just a little pointless to put an arbiter on a point of failure.

[21:51:57] <Joeskyyy> Because you can have two points of failure where there would typically only be one.

[21:52:01] <Joeskyyy> Just bad practice is all.

[21:52:04] <Mmike> But what harm can it do?

[21:52:30] <Joeskyyy> None, in reality. Unless you hit a snag like I mentioned. Again, it's just bad practice.

[21:52:50] <Mmike> I have 4 boxes in replset. I have arbiter on 5th box so I have 5 voting members. If one box goes down, all is gut. If box with arbiter goes down, all is also gut.

[21:53:47] <Joeskyyy> What's the point of your arbiter then? If one box goes down you still have 3/4 to decide.

[21:54:03] <Joeskyyy> Still a majority for the decision.

[21:54:34] <Joeskyyy> Taking the same example above, two nodes go down, one of those two has your arbiter on it. Now you only have 2/5 to vote.

[21:54:45] <Joeskyyy> Where as if your arbiter was on it's own low risk node, youd still have 3/5

[21:55:39] <Joeskyyy> You can take that same example and keep adding more nodes and the results would still be the same, just on a higher scale.

[21:56:18] <Mmike> I see.

[21:56:36] <Mmike> But, I don't need an arbiter in the first place then? If I have 4 boxes...

[21:56:46] <Mmike> Aha, I do, if I want my cluster to work on 2 boxes (+ arbiter)

[21:56:54] <Joeskyyy> You don't "need" one. I'd recommend one.

[21:56:57] <Joeskyyy> Correct.

[21:57:12] <Joeskyyy> Running an arbiter on a teeny tiny no risk box would be excellent for your cluster.

[21:57:23] <Joeskyyy> That way if you have two nodes go down, you're not completely f'ed

[21:57:58] <Mmike> yup, that makes sense.

[21:58:02] <Mmike> Thnx for the elaborate :)

[22:00:36] <Joeskyyy> anytime

[22:10:38] <brucelee_> is there a recommended size for a database server?

[22:10:41] <brucelee_> before creating another shard

[22:10:52] <brucelee_> or does it not matter

[22:23:01] <tongcx> hi guys, i have a big database and I want to map these data to another collection which just keep some fields, what's the best way to do it?

[22:23:17] <tongcx> i know i can use mapreduce, but the reduce step is not necessary and it will slow things down

[22:28:06] <monmortal> dont us emongo

[22:28:12] <monmortal> www.prevayler.org use this

[22:28:16] <monmortal> aw yeah

[22:28:21] <monmortal> zodb might not be bad either

[22:28:26] <tongcx> hi guys, i have a big database and I want to map these data to another collection which just keep some fields, what's the best way to do it?

[22:28:29] <tongcx> i know i can use mapreduce, but the reduce step is not necessary and it will slow things down

[22:29:12] <monmortal> use common lisp

[22:30:20] <tongcx> monmortal: are u talking to me?

[22:30:40] <monmortal> yeah

[22:30:53] <monmortal> backward chaining might help

[22:49:40] <monmortal> zodb!

[22:51:04] <tongcx> monmortal: thanks

[22:51:16] <monmortal> well

[22:51:30] <monmortal> I love single lane rotaries see

[22:51:35] <monmortal> simplicity

[22:52:11] <monmortal> what confuses me about mongo is what is to stop new data incoming from corrupting old data?

[22:52:15] <monmortal> whats to stopit?

[23:09:22] <tongcx> hi, if i write a script.js and mongo script.js, how do I change the db in script.js, if i write 'use xxx', there is error

[23:13:50] <ehershey> tongcx: try db = db.getSiblingDB('otherdb');

[23:17:03] <schmidtj> Hello. Does anyone have a systemctl service definition file to run MongoDB as a service? I could create one, but this has got to be something someone else has already done before me. Anyone know where I could find one?

[23:17:39] <schmidtj> pmxbot: help

[23:19:58] <monmortal> what os?

[23:23:01] <blizzow> okay, total n00b question here. I have a collection called foo with index bar. I want to do a count of all indexes where the value of bar is set to 1. I failed when trying this -

[23:23:01] <blizzow> db.domains.count( { active: '1' } )

[23:24:31] <blizzow> Anyone know what I'm doing wrong?

[23:29:37] <monmortal> u didnt read the manual

[23:29:40] <monmortal> ;)

[23:34:53] <blizzow> monmortal. I'm looking at the little mongodb book and have tried a couple of different ways to query, but am obviously doing something wrong. Pardon my retardedness, I figured I'd ask for some clarification here.

[23:35:44] <Joeskyyy> blizzow: Is the value set to the string value "1" or the integer value 1?

[23:36:20] <blizzow> I think it's integer value 1.

[23:36:42] <Joeskyyy> Try db.domains.count({active: 1}) then

[23:36:47] <Joeskyyy> Without the '' around the 1.

[23:36:53] <Joeskyyy> That would signify a non integer value

[23:39:21] <Joeskyyy> Rather, putting the value IN the '' signifies a non-integer value

[23:41:33] <monmortal> I ask for help here

[23:41:36] <monmortal> not get much

[23:41:49] <monmortal> my life is sad

[23:41:50] <blizzow> It's returning 0 no matter what I try.

[23:46:31] <blizzow> I posted an image here http://i.imgur.com/C3sX3e1.png of what I'm trying to grab (a count of all records in the active column that are set to 1).

[23:53:07] <schmidtj> monmortal: OpenSuse 13.1, which uses the 'systemctl' startup/service manager package.

[23:55:10] <schmidtj> err, excuse me, 'systemd'

[23:55:18] <schmidtj> systemctl is the command to interact with systemd

[23:55:38] <schmidtj> Ahh well, I guess I'll have to follow up on this tomorrow. I really need to call it a night

[23:55:43] <schmidtj> goodbye

[23:56:05] <Joeskyyy> What does db.domains.find({active:1}).count() return?

[23:56:11] <Joeskyyy> A little redudant, but I'm curious.

Log file Viewer

Help | Karma | Search:

#mongodb logs for Wednesday the 29th of January, 2014