[00:54:54] <GothAlice> zivix: Never before have I seen anyone with as many network transitions as you. Please accept these bytes as a token of my sympathy. (Either that of you've got a host fight on your hands. Both of these problems can be alleviated by running an "IRC bouncer" such as ZNC.)
[04:03:18] <Jonno_FTW> hi, when I use $near with min and max distance, I still get the document exactly on the query location
[04:04:48] <Jonno_FTW> my query and result looks like this: http://i.imgur.com/cX50K8D.png
[04:05:12] <Jonno_FTW> why is the first result included even though it's not within the range?
[07:55:18] <amitprakash> Hi, with mongo I am getting a key '$set' must not start with '$' at random for some update queries
[07:56:55] <amitprakash> db.items.update({'_id': ObjectId(...)}, {'$set': {'cnc': 'Heathrow'}}) for example throws this error
[08:40:56] <amitprakash> filing a bug with mongo support then
[10:13:41] <yauh> is there a recommended way to get mongod stats into graphite/collectd/statsd?
[11:51:17] <BBin> Not entirely mongo-related but since it's similar: I have kind of the reverse situation of the regular use case. Instead of there being one query that should return all matching documents, I want to input a document and get all matching "queries" (match patterns, currently using the mongo syntax). Where should I look for info about this type of problem?
[11:55:18] <joannac> I don't understand what that means
[11:55:20] <BBin> joannac: Let me explain why I want it as well, I understand my explanation wasn't good
[11:59:28] <BBin> joannac: So, I have an arbitrary list of queries. Say {a: {$gte: 5}, b: "hello"}, {a: {$gte: 5}}, {a: {$lte: 5}} (This list will be LONG). Now when inputing an object I want to find the rules from that list that match the object in the most optimal fashion. Obviously any object that matches the first one also matches the second one and it certainly does not match the third one, I want to exploit this type of relationship between the
[12:03:49] <kexmex> i mean, the result of two queries will contain what you are looking for
[12:05:04] <BBin> hmm, I'm still not sure what you are driving at
[12:05:26] <BBin> If two queries match the document, two queries are returned as the result
[12:05:52] <joannac> BBin: I still don't understand the use case
[12:06:17] <BBin> joannac: I'll explain the actual use case. It's pretty simple
[12:07:51] <BBin> I am building a real time publish/subscribe app. Clients can connect to the server as either publishers or subscribers. When a client subscribes it does that in the form of submitting a "query" (matching pattern). When someone publishes an object it is matched against all the subscribers rules and if a subscriber rule matches, the object is delivered to that subscriber
[12:08:28] <kexmex> so you are doing wildcard subscriptions eh?
[12:09:01] <kexmex> you probably want a scheme for the topics
[12:09:10] <BBin> Obviously I could just check every rule against the object, but considering that clients will sometimes have identical or very similar subscriptions this should be possible to optimize pretty well I think
[12:09:15] <kexmex> sometihng like /a/b/c/d/e so a wildcard would be /a/b/*/d/e or wahtever
[12:09:37] <kexmex> you'll be able to do a graph search at least ;)
[12:14:55] <joannac> BBin: sounds a bit like this https://github.com/10gen-labs/socialite/blob/master/docs/feed.md
[13:06:52] <yauh> any recommendations on how to integrate MongoDB stats into graphite with collectd or statsd?
[13:27:58] <the_drow> I accidently dropped the system databases. How can I restore them?
[13:47:59] <cheeser> the_drow: reinstalling wouldn't fix anything if your data files are mangled.
[13:50:40] <ChALkeR> mmapv1 takes 1.6G on disk, but has a stable memory consumption (~220M) for me.
[13:54:57] <ChALkeR> Why is the memory limit calculated in whole gbs?
[13:57:20] <the_drow> cheeser: Well I removed /var/lib/mongodb/ and reinstalled again and it did help
[13:57:39] <the_drow> I have excluded the admin and local database from my script that clears everything in mongo
[13:59:01] <cheeser> right. you'd have to toss your data files and just restart mongod
[14:04:46] <fkt> i have an entry structured like such: { "_id" : ObjectId("54fc4db01b8562722b9e8c67"), "03082015" : { "test" : { "data1" : "490.56", "data2" : [ 6, 12, 21 ], "data3" : [ 5, 32, 74 ] }}} . each entry has a datestamp object which contains other items. how would i query mongo for that specific date - in this example - 03082015
[14:07:50] <cheeser> well, you could use $exists on that field name
[14:51:57] <cheeser> fkt: it depends on what you're modeling.
[14:53:09] <fkt> yeah i simply don't know enough about mongodb honestly, need to read more
[14:53:33] <ChALkeR> cheeser: Quering by _id saves an extra index. It has limited applications, though. If there are more than one timestamp per item, if the item _id does not correspond to the timestamp it should represent — then you need a separate field (and index).
[14:53:34] <fkt> this isnt work related or anything, just trying to teach myself about something new vs using another sql type db for this which ive done countless times
[14:53:50] <fkt> each id will have unique timestamp
[14:55:50] <cheeser> if i had billions of such records i might consider it for space reasons, though.
[14:55:59] <cheeser> to date (ha!) that hasn't been a problem.
[14:57:01] <ChALkeR> cheeser: I replaced ObjectId with manually-crafted ids in some collections, to save space and indexes. =) Because it was a problem there.
[14:57:44] <fkt> oh so i could make object id my datestamp?
[14:58:19] <cheeser> you can reuse _id (if it's an ObjectID) and it matches your timestamp.
[14:59:08] <GothAlice> However, on collections where I need to perform date-based aggregate queries, I use a distinct creation field, since date projection operators don't work on ObjectId.
[14:59:39] <ChALkeR> Has anyone used WT on low-ram machines?
[15:00:10] <GothAlice> ChALkeR: I'm running a WT instance that's consuming maybe 230MB of RAM.
[15:00:53] <GothAlice> (It's got a 3.3 GiB virtual size, of course, but currently 230 MiB wired into it.)
[15:02:05] <ChALkeR> I guess that noone cares about virtual on 64bit =)
[15:02:32] <ChALkeR> GothAlice: What's the data size?
[15:03:17] <GothAlice> It's got a dataSize of 196 MiB. (This is MongoDB running in development on my laptop with a reduced dataset vs. production.)
[15:04:38] <ChALkeR> GothAlice: Have you tried WT in production?
[15:04:48] <ChALkeR> In development, everything was ok for me.
[15:05:11] <GothAlice> ChALkeR: Not yet. Now that I've got the app working in development, next step is to roll out a new 3.0.0 WT cluster on MMS for staging with a full copy of production data.
[15:05:48] <ChALkeR> Ah. My production is memory-limited a bit.
[15:06:02] <GothAlice> ChALkeR: Our production is the reverse of that. ;)
[15:07:10] <GothAlice> (We're running 60% data growth vs. the prior 90 days… so yeah. We've had to over-provision up front quite a bit.)
[15:07:41] <ChALkeR> On ~700M dump size, mmap1 reports ~1.3G dataSize and ~11.5M index size. And consumes ~260M ram.
[15:08:54] <ChALkeR> On ~700M dump size, wt takes up on disk ~400M for data and ~4.7M for indexes. And consumes >800M ram.
[15:10:27] <ChALkeR> And I don't know how to limit that, because wiredTigerCacheSizeGB is calculated in whole GBs =).
[15:10:41] <ChALkeR> It doesn't understand 0.5 or 0,5.
[15:10:51] <GothAlice> ChALkeR: My test data here has 68 indexes totalling 9 MiB of storage in WT. The key difference, I suspect, is that mmapv1 does what it says: it uses memory mapped files, so much of the virtual size is dedicated to the on-disk files, without adding to the "allocated stack size" of the process. (I.e. mmapv1 is giving you a very large under-estimate of the memory usage.) WT, OTOH, manages memory itself more, using actual malloc()'d memory
[15:10:51] <GothAlice> blocks and thus more apparent memory usage.
[15:11:09] <GothAlice> ChALkeR: I hope your database host is dedicated to the purpose…
[15:12:28] <ChALkeR> It actually looks like caches.
[15:13:10] <ChALkeR> Because it doesn't consume that on startup, but keeping it running increases memory usage over time.
[15:13:40] <ChALkeR> And no, no compex queries are involved.
[15:14:30] <GothAlice> That's pretty normal. Again, as long as you have a host dedicated to your database, its growth is clearly bounded (to the limits of the machine). Have you benchmarked and identified a drop in performance when the memory usage hits the limit?
[15:14:39] <GothAlice> Or is this a case of pre-emptive worry? ;)
[15:15:22] <ChALkeR> The host is not dedicated yet. I could limit the memory usage by other means of course.
[15:16:28] <GothAlice> Step 1: isolate your DB. Mixing and matching services is a quick path to unknowable pain. (Unknowable as you'll have difficulty identifying the culprit if something terrible happens. OOM-killer is not your friend. ;)
[15:17:14] <ChALkeR> GothAlice: That would double the nodes count. I will probably do that in some time, but not now. Maybe in a few months =).
[15:17:37] <GothAlice> ChALkeR: I'll hazard a guess that you also aren't running a replica set.
[15:40:47] <GothAlice> Half-way through a multi-megabyte-per-second streaming find/update pass mongod disappeared, the testing app caught the socket error, then the whole host went away. Anyone else running into panics with WT under high load?
[15:55:04] <NoOutlet> Is the kernel panic reproducible, GothAlice?
[15:56:39] <GothAlice> NoOutlet: Not in the least, unfortunately. I've repeated that ten minute test suite four times now without another crash. (And looking over the dom0 logs, it looks like I somehow reset the virtual SATA controller and the root filesystem went *pif*. I've never, ever seen that before.)
[15:58:01] <nosleep77> hi there, is a simple command I can run from the shell to get a short status on health of db?
[16:02:03] <GothAlice> nosleep77: For monitoring, you'd also want rs.status() if you have a replica set. See https://blog.serverdensity.com/mongodb-monitoring-rs-status/ for some in-depth monitoring advice. :)
[16:04:38] <nosleep77> which one can i use just to see if its up or down. this gives me great information but I want to have that basic check
[16:05:24] <cheeser> if you can't connect, it's down
[16:05:51] <nosleep77> makes sense, so any of them should result in non-zero exit code
[16:10:00] <GothAlice> nosleep77: With the rs.status() command run on any member of the replica set, it'll let you know if any of the other members aren't available, or what state they are in. (See the "state" field.) In theory you only actually need to monitor one of the members.
[16:50:16] <oriceon> i created a test user db.createUser({ user: "test123", pwd: "password123", roles: [ { role: "userAdminAnyDatabase", db: "admin" } ]});
[17:46:27] <inspiron> i have a mongo dbs, and i have a json dump of it. i need to write a mongoose schema for the collection in it. is there a way to auto create that schema in code?
[17:46:49] <inspiron> like actually auto create the code that i would have written for the schema
[17:55:29] <GothAlice> inspiron: MongoDB has no concept of a schema, so no.
[17:56:03] <latestbot> Stupid question, is there a way I can specify fields in a collection?
[17:57:03] <GothAlice> latestbot: If you mean "enforce a schema", then no.
[17:59:47] <latestbot> After being used to SQL, it’s kinda weird to me
[18:00:02] <GothAlice> If you are used to SQL, the article I just linked should be helpful.
[18:01:12] <GothAlice> latestbot: See also: http://docs.mongodb.org/manual/reference/sql-comparison/ and http://docs.mongodb.org/manual/reference/sql-aggregation-comparison/
[18:32:32] <girb1> http://pastebin.com/nRL0gcL2 — split is not happening please may I know the reason
[18:33:08] <attrib> hi all, I'm experience a weird issue. If I do a db.agg.findAndModify({query: {_id: {date: '2015-02-08', nid: 3898}}, update: {$inc: {session: 1}}, upsert: true}) - I get E11000 duplicate key error.
[18:34:01] <GothAlice> attrib: findAndModify — why are you using this particular operation instead of a standard update?
[18:34:01] <cheeser> run that query manually and see what you get.
[18:34:07] <attrib> checking db.agg.stats() - I get indexDetails of _id with a count of 328, but if I do db.agg.count() I only get 298
[18:35:05] <attrib> GothAlice: In my code (original using node.js), I return the modifed document
[18:35:13] <attrib> just broke everything now down to this
[18:35:30] <attrib> to continure working with this document
[18:37:08] <GothAlice> attrib: Unfortunately, that's not what that does. http://docs.mongodb.org/manual/reference/command/findAndModify/#output — if an existing record gets updated, you get back the value prior to the modification, not the current value. Better, in this instance, is to remember the _id you are using and simply continue to issue atomic updates as needed. (The "session" value won't be reliable as it stands. If you do need an accurate value,
[18:37:08] <GothAlice> .update() then .find() separately.)
[18:37:18] <attrib> so having this issue now in plain mongo (wanted to be sure nothing interferes here)
[18:37:45] <GothAlice> attrib: However, cheeser's question would be helpful to diagnose. What happens if you .find() just the query portion of that findAndMofidy?
[18:43:42] <attrib> I did a remove today [db.agg.remove({'_id.date': '2015-02-08'})]
[18:44:00] <attrib> I have the feeling the index still has the values
[18:44:31] <GothAlice> Yeah. There's something pretty funky with that collection. It will probably be worthwhile to transfer the contents to a new collection.
[18:45:21] <attrib> If I'm doing now a db.agg.reIndex()
[18:45:27] <attrib> it works, till I do the remove again
[18:45:31] <GothAlice> (That terminal recording demonstrates inserting explicitly, upserting the existing record, then upserting a new record.
[18:47:01] <GothAlice> attrib: Why collection.remove instead of colleciton.find().remove/removeOne?
[18:47:37] <GothAlice> Yeah, collection.remove isn't really a thing, it's a console macro.
[18:47:51] <GothAlice> (Type "db.agg.remove" and press enter in the shell to see the source for it.)
[18:48:04] <GothAlice> Basically boils down to db.agg.find(query).remove()
[18:48:30] <attrib> okay, but like said - in the normal case this is handled by the mongodb - nodejs driver
[18:48:43] <attrib> was currently just testing if I can reproduce it on the console
[18:48:52] <attrib> to not spam here with node.js issues
[18:50:19] <GothAlice> Are you able to reproduce the issue on a completely clean collection? (I.e. re-name the existing one out of the way, do nothing to the indexes on the new collection.)
[18:50:37] <GothAlice> My terminal recording demonstrates that this (clean) should work.
[18:54:58] <GothAlice> attrib: http://showterm.io/39afe4940aff9b7deb9eb < demonstrates that a potentially conflicting upsert works after record removal, too.
[18:56:55] <attrib> seems like I broke something with the index there
[18:57:12] <attrib> not sure what, can't replicate right now on a fresh collection
[18:58:09] <inspiron> has anyone used mongo-connector with elasticsearch?
[18:58:18] <attrib> but inserting a value, doing a count, returns me 299; checking stats, tells me the size of the id index is 299
[18:58:45] <GothAlice> attrib: .count() results are "best effort" guesses.
[18:58:46] <attrib> deleting the entry again, count is now 298, stats index size ist still 299
[18:58:59] <GothAlice> Yeah, looks like you've got index updates disabled somehow.
[18:59:25] <GothAlice> AFIK, that's impossible, as index updates are synchronous with the applied atomic operations.
[19:00:06] <attrib> I'm getting the impossible possible to day... the day gets better and better :P
[19:00:59] <GothAlice> However, a disparity between .count() results and the rest of the world can be expected. Doubly expected if you have are using sharding.
[19:01:29] <attrib> if I do a db.agg.reIndex() all the values are 298 again
[19:01:43] <attrib> so currently trusting count more than the indexes ;)
[19:01:45] <GothAlice> Indeed. MongoDB doesn't do much in the way of "online compaction".
[19:02:23] <attrib> I'm connected to a mongos, but only have one shard, but using this for easy access to the replication behind it
[19:03:19] <attrib> trying the same now on the primary
[19:03:26] <attrib> to test if mongos inbetween is the problem
[19:03:41] <GothAlice> Indexes are (using the mmapv1 back-end driver) stored as binary trees, with leaves representing ordered "buckets". Inserting records may allocate more buckets that removal won't clean up. When attempting to count() MongoDB estimates the number of records based on the number of populated buckets, and sharding makes this a hella non-trivial operation.
[19:04:01] <GothAlice> .reIndex() would throw away the existing buckets and create new, highly packed ones.
[19:04:52] <attrib> uhh, yeah. working directly on the primary it works...
[19:06:34] <attrib> so somehow I know now that the problem is with mongos
[19:11:15] <attrib> oO, now it works with mongos too
[19:13:11] <GothAlice> I'm not a fan of it spontaneously working...
[19:14:26] <cheeser> "things that start working by themselves tend to stop working by themselves." -- my dad
[19:17:14] <GothAlice> cheeser: In a related vein: “Do you have a backup?” means “I can’t fix this.” (Alice's Law #105: https://gist.github.com/amcgregor/9f5c0e7b30b7fc042d81) I'll totally need to add your father's quote in there, though.
[19:17:59] <GothAlice> cheeser: Your quote is now Law #145. ;)
[19:18:02] <cheeser> he also had a theory that electronics ran on smoke and once you let the smoke out you can't put it back in... so, you know, he was interesting. :D
[19:21:36] <attrib> colleague told me he had problems in his last company with mongodb and replicasets without using mongos, so....
[19:21:50] <cheeser> yeah. it was a premature optimization by the ops guys since when we finally decided to shard, the cluster topology didn't have to change
[19:22:05] <cheeser> how that worked out in practice, i dunno. i left before they sharded.
[19:29:49] <GothAlice> attrib: I only run one cluster (26 TiB dataSize) as a sharded replica set, for obvious reasons, but the other clusters I manage are simply bare replica sets. When their dataSize + indexSize grows > RAM, I'll consider sharding them. ;)
[19:42:43] <attrib> fun day. after now using the replica set directly without mongos, I get the error now with the primary too... I really just should go home and never come back :P
[19:44:43] <GothAlice> attrib: If you can reproduce reliably, open a JIRA ticket that includes the steps needed.
[19:45:01] <GothAlice> Indexes going out-of-sync are a Problem™ that deserves a look at by MongoDB staff.
[22:29:50] <aliasc> i find it hard to track and analyze my data in mongodb
[22:30:05] <aliasc> are there good tools for analysis
[22:33:58] <GothAlice> aliasc: At work we rolled our own "dynamic aggregate query reporting framework". I used http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework and http://docs.mongodb.org/ecosystem/use-cases/hierarchical-aggregation/ to assist in structuring my data in the most efficient way for how it needed to be queried.
[22:35:20] <GothAlice> aliasc: Things like our dashboard (http://cl.ly/image/142o1W3U2y0x) are simply a series of standard and aggregate queries.
[22:43:13] <GothAlice> aliasc: For data analysis links, see the following presentations: http://www.slideshare.net/crcsmnky/analyzing-data-in-mongodb and http://www.mongodb.com/presentations/high-throughput-data-analysis and this may be of use/interest: http://community.pentaho.com :)
[23:58:45] <_blizzy_> so I can think of NoSQL like this? https://gist.github.com/NotBlizzard/53ab334627d7fc42eb8d