[02:33:12] <xudongz> looking at http://docs.mongodb.org/manual/reference/command/getLastError/. Does mongodb 2.6 (or 3.0.4) completely replace err with errmsg?
[02:48:58] <Boomtime> xudongz: that's a good question.. i don't know that specific answer, but i can say that you should use .ok to determine first-stage success, and then parse the other fields if that is false
[02:49:39] <Boomtime> oops, that's silly, you should use the .ok if TRUE to parse the other fields
[02:50:07] <Boomtime> since .ok tells you if the getLastError command worked, it will tell you if the other fields make sense for what you need to know
[03:10:01] <xudongz> it seems that when ok is 1 err is null, and when ok is 0, errmsg exists and is a string
[03:10:06] <xudongz> anyways i should probably do a bit more testing
[03:37:44] <pylua> could the value of index be duplicated?
[04:01:39] <Boomtime> pylua: the "value of index"? do you mean can multiple documents match the same value in an index? yes
[09:51:42] <Lope> how can I findOne (or find with a limit of 1) with sort?
[09:57:20] <d0x> Hi, i like to speedup this kind of query {a: "foobar", b: {$exists: true} query. The field a exists in all documents, but b only in 10% of them. I created a sparse index on {a:1, b:1} but because a is included in all documents, that doesn't help. How to speedup my query?
[09:57:37] <adrian_lc> hi, I'm trying the 3.0 and the localhost exception doesn't seem to allow me to create the superuser admin
[09:58:11] <adrian_lc> I'm getting this error: 2015-07-10T09:49:36.264+0000 E QUERY Error: couldn't add user: not master
[09:58:30] <adrian_lc> I have auth enabled with a keyfile
[09:59:08] <adrian_lc> doesn't make much sense that it's not the master, since I can't start the replica set without authenticating :/
[10:20:59] <pylua> Boomtime:I need to make multiple documents match the only one record
[10:22:41] <pylua> Boomtime: it is saying :db.col.insert({"a":a,"b":b}) will generate duplicate error if there is already one with the same value existing in collection
[11:45:23] <d0x> Hi, i like to speedup this kind of query {a: "foobar", b: {$exists: true} query. The field a exists in all documents, but b only in 10% of them. I created a sparse index on {a:1, b:1} but because a is included in all documents, that doesn't help. How to speedup my query?
[12:06:32] <NoReflex> d0x, I'm no expert but why not create a sparse index on {b: 1} ?
[12:10:10] <d0x> NoReflex: The query needs to utilize an "a" filter also. If i remove "a" form the index, it needs to scan to much documents.
[12:10:54] <d0x> { a : "foobar" }) // 517.967 documens and { a : "foobar", b : { $exists: true} }) // 44.922 documents
[12:11:03] <d0x> I just posted the question to SO: http://stackoverflow.com/questions/31340290/optimal-compound-indexes-for-exists-true-sparse-indexes
[12:53:45] <NoReflex> and how many documents does {b : { $exists: true}} return?
[15:23:38] <kexmex> so is it ok to use smallfiles? btw, when a collection is dropped, how is journaling involved?
[15:23:51] <kexmex> will it grow to size of collection being dropped?
[16:38:14] <GothAlice> kexmex: Howdy! I use smallfiles a lot in development, less so in production where the dataset size grows at a predictable, relatively slow rate. (In development I'm creating and dropping things left and right, so over-allocation is a waste of time.)
[16:38:53] <GothAlice> The instruction to drop a collection is journalled like any other write operation, but the collection doesn't grow when being dropped. (The file chunks the collection was using are marked as "free", but not otherwise cleaned up.)
[16:38:53] <kexmex> say, if i dont event write 128mb per hour, i probably don't need these big journals right?
[16:39:23] <kexmex> well i was wondering if whole collection gets copied to journal during drop or something
[16:39:30] <GothAlice> Well, no. The default journal size should be good. A journal is mostly used to ensure your data is consistent in the event of a crash.
[16:39:43] <kexmex> btw, i just turned smallfiles on an existing database
[16:50:53] <kexmex> with wiredTiger, are data files much smaller?
[16:53:01] <GothAlice> With compression enabled, they can be by virtue of needing fewer stripes to store your data, compressed.
[16:53:58] <kexmex> well i am wondering the space savings and all
[16:54:14] <kexmex> i know it depends on nature of data but...
[16:55:56] <GothAlice> I don't use WiredTiger at the moment (my dataset reliably kills mongod if I try) but my custom LZMA/XZ record compression gets around 80% compression on average. (Lots of text content that compresses well.)
[16:56:10] <GothAlice> With snappy compression, I'd expect that to be around 30-40% compression.
[16:56:46] <GothAlice> (zlib at sane levels would get 60-70% on this dataset.)
[17:04:20] <GothAlice> Bzip2 is the slowest, with Xz giving better compression at speed-comparable levels. (Xz can, in "extreme mode", take a truly ludicrous amount of time to compress things, though.)
[17:04:21] <kexmex> my datafiles are 6gig, that seems suspect
[17:04:38] <kexmex> maybe i should run repairdatabase
[17:04:41] <GothAlice> What's your average document size?
[17:05:08] <GothAlice> And yeah, repair will copy out the data from the old stripes into new, more space-efficient ones.
[17:05:56] <GothAlice> http://docs.mongodb.org/manual/core/storage/ goes into some detail, but if your documents are riding the edge of a power of two, in terms of size, then you could be fighting the document allocation strategy.
[17:06:49] <kexmex> i guess when i archive some old docs, i should repairDatabase() either way
[17:07:08] <GothAlice> This is a big, big problem if you use and adjust the GridFS chunk size to be something like "64KB" exactly. You'd actually be allocating 128KB chunks accidentally.
[17:07:09] <kexmex> the problem is, my Azure server is burning through cash because they charge for IO
[17:07:29] <GothAlice> … pro tip: don't run real network services on Windows boxen. ;P
[17:08:36] <kexmex> did some things today, lets see if i saved some money hehe
[17:09:48] <chxane> Is there a way to prevent the amount of resources being used on a server by multiple mongodb connections without sharding?
[17:10:24] <kexmex> chxane: i saw something about memory
[17:10:29] <chxane> I have tired indexing and other solutions but it seems at a certain point mongodb stops allowing connections and has too many locks
[17:10:51] <chxane> kexmex, I have tried increasing the ram on the server to 16GB
[17:10:54] <GothAlice> One can certainly run into "open file limits".
[17:17:34] <GothAlice> With 13GB of files to "memory map".
[17:18:04] <chxane> right now doing a free -g I get 14GB total mem
[17:18:05] <GothAlice> 16GB of RAM would thus be quite adequate for your current needs, based on data size.
[17:18:38] <GothAlice> Your question basically comes down to: https://gist.github.com/amcgregor/4fb7052ce3166e2612ab#how-much-ram-should-i-allocate-to-a-mongodb-server :P
[17:19:59] <chxane> does help having the webserver and mongodb running on the same machine possibly the webserver and/or the wsgi server make be eating too much resources
[17:20:24] <StephenLynx> yeah, dbs usually use lots of RAM
[17:20:27] <GothAlice> (Since they tend to consume all accessible resources on a given machine.)
[17:20:52] <chxane> yes possibly I can have a lower demand server run the website and another just for the mongodb
[17:21:00] <GothAlice> If you co-lo (colocate) you only have yourself to blame for problems. ;P
[17:21:02] <chxane> wish aws ec2 wasn't so expensive lol
[17:21:19] <StephenLynx> I never use these services.
[17:21:32] <StephenLynx> either I use dedicated or VPS.
[17:21:53] <StephenLynx> I rather pay for time and have absolute control of the machine.
[17:21:54] <GothAlice> chxane: clever-cloud.com I found to be cheaper than EC2 or Rackspace for "app servers", and currently are beta testing MongoDB (including 10gen official support) in their Europe region.
[17:22:12] <chxane> thanks GothAlice I will check it out
[17:22:29] <chxane> if I can seperate like you are saying it will probably remove a lot of these problems
[17:22:50] <GothAlice> If I can't "git push" and have the app update just happen, I'm a sad high-altitude melting panda.
[17:22:57] <chxane> then I would just need to worry about server latency when passing the data back and forth between the database server and web server
[17:23:57] <chxane> wish I could just run the server out of my office or something but the internet in this town sucks
[17:24:32] <GothAlice> Considering that network access is not just "typical" but the only practical way anyone _ever_ deploys a database service…
[17:26:32] <GothAlice> I use them because they cover two of our needs: data from Canada being stored in Canada (legal requirements and whatnot), and a European zone. We also use Rackspace for our Southern US DC.
[17:27:04] <chxane> well I just need it to run django, celery, mysql and mongodb all without any issues which I don't think will be a problem
[17:28:13] <GothAlice> Funnily enough, you already have, by including MongoDB in there, too. ;P
[17:28:20] <StephenLynx> GothAlice, if you are operating outside of canada, you still have to store incoming data from canada on canadian soil or just if you are located on canada too?
[17:29:02] <GothAlice> Certain businesses require that their data not leave the country. So even if I was a service provider in another country, to accommodate local requirements I would need to have a DC in Canada for those clients.
[17:29:38] <chxane> GothAlice, the mysql database doesn't do much tho
[17:29:40] <StephenLynx> and what would the government do to this company outside of canada?
[17:29:42] <GothAlice> (Or not have those clients.)
[17:31:12] <chxane> lol yeah especially with the hell of migration
[17:31:32] <GothAlice> Even the little things: https://web.archive.org/web/20130813180719/http://tech.matchfwd.com/promiscuous-django-models/
[17:31:45] <GothAlice> (Fun side-effects of Celery using Pickle to serialize Django model objects…)
[17:32:09] <GothAlice> At that time we used Celery. A year later we did not. ;P
[17:37:51] <mike_edmr> django models are useful for creating django tutorials :D
[17:38:29] <GothAlice> A standard conference joke is "raise your hand if you've written a blog in Django" (most hands go up) "keep your hand up if you've written anything else using it" (all hands go down but one).
[17:48:17] <GothAlice> MongoDB encourages one to model your data in a way that best suits how you will need to query it.
[17:48:27] <GothAlice> What types of questions do you need to ask that data?
[17:49:10] <GothAlice> I'd also give http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html a read through.
[17:50:19] <m4k> Filtering on the basis SKU which is 4th nested array
[17:53:48] <m4k> Currently I have designed RDBMS Tables for such a data, I wanted to store the data as it is in mongo so that I can query and return.
[17:57:01] <GothAlice> m4k: The example typically use is a bit more small-scale than yours. Forums. Replies to a thread are embedded in the thread, since when viewing the replies you need the thread data, and deleting a thread should delete all replies. (Single query optimization for the most frequent use.) The threads are their own collection and _not_ embedded in the Forums.
[17:57:11] <GothAlice> s/example typically/example I typically/
[17:57:57] <GothAlice> The latter is because a forum may have a truly excessive number of threads, and "editing" a reply to a thread embedded in a forum becomes doubly nested and nearly impossible to do.
[17:58:16] <GothAlice> (I.e. threads embedded in a forum would be far more likely to hit the 16MB document size limit.)
[17:59:09] <GothAlice> There's also the "I only want a subset of the threads" issue. I.e. give me the 10 most recently updated threads. If they're embedded, this goes from being a simple query to being a full aggregate unroll and projection, which is more complex and slower.
[18:00:24] <GothAlice> "Filtering on SKU" isn't a query. That's the match part of a query. Projection is the part of the question to the database that says "give me back X" (… filtered on Y). The data you want back is just as important as what you want to filter on.
[18:02:44] <StephenLynx> yeah, that looks like a very complex model.
[18:03:18] <m4k> In this case I am not able to utilize the document with mongo. And have to create Collection for each subdocuments and also there will references.
[18:04:30] <GothAlice> m4k: Definitely read through that "how to screw up your MongoDB schema design" article I linked.
[18:05:24] <GothAlice> Treating MongoDB as SQL is a recipe for disaster, but so is treating it as a "just throw JSON at it" document store. Neither extreme will produce a good result.
[18:06:35] <GothAlice> (And both lead to writing bitter blog posts that get panned by experts. A la https://blog.serverdensity.com/does-everyone-hate-mongodb/ which covers some of the preconceptions and traps people tend to fall into.)
[18:07:05] <kexmex> need to run repairDatabase() after dropping a collection or will space be released?
[18:07:18] <GothAlice> kexmex: The space is "released", but not de-allocated on disk.
[18:07:43] <GothAlice> Meaning the space can be re-used by other collections, but the stripes won't actually shrink or get freed up. (Defragmentation == repairDatabase.)
[18:09:39] <m4k> GothAlice: Thanks for the help, seems I need to re think on my approach and better I use multiple collections for it.
[18:10:05] <GothAlice> A few references aren't a bad thing, but again, it comes down to the (complete) questions you need to ask the database.
[18:10:14] <GothAlice> And that's where apparent "duplication" of data becomes really useful.
[18:11:30] <GothAlice> For example, replies to a thread store a reference to the user who made the reply. Instead of having to load up the replies, then load up the user data separately, I embed a copy of the data a thread view would need about the user with the reference. E.g. instead of {author: ObjectId(…Bob…), comment: "This is great!"}, then looking up the ID to get the name "Bob" for display, store: {author: {name: "Bob Dole", ref: ObjectId(…)},
[18:12:23] <GothAlice> Sure, if a user changes their name you may (or might not in some cases, i.e. stored invoice data) want to go back and update the "cached" references.
[18:50:36] <blizzow> How do I change the "account_over_foo_limit" to be false in this single record - http://pastie.org/10285151 I tried and failed with db.mycollection.update( { account_id: "109" }, { $set { account_over_foo_limit: "false" } } )
[20:39:23] <svm_invictvs> cheeser: Yeah, I was going to ask how hard that would be drop in.
[20:40:07] <svm_invictvs> cheeser: And if I could submit a patch if it wasn't too difficult, but it sounds like anything I'd submit as a pull request would probably not be fruitful
[20:46:29] <saml> hey, i have a field url, which has unique index. how can I find the longest url in the collection?
[21:43:21] <diegoaguilar> Hello, can I use http://www.hastebin.com/uvububawat.sm to trick a sort for an $addToSet?
[22:48:09] <mastyd> I need to store a ton of position data (one geolocation coordinate a second) for a lot of concurrent users. I need a way to grab all the position data after an event is done (20 - 30 minutes). My thought is to store each position as a document with lat, lng, direction. Would Mongo be able to handle, say, 100+ writes a second with something like thi
[22:48:10] <mastyd> s without choking? Postgres seemed to not be able to handle it very well.
[22:48:33] <mastyd> 100+ writes a second because 100+ clients will send the coord a second
[22:49:55] <StephenLynx> but I wouldn't do that anyway.
[22:50:35] <StephenLynx> update the client every second all the time for all clients, that is.
[23:18:53] <nofxx> mastyd, worked once in a truck tracking software. What we did was a queue using resque. But there was ton of calculation on each position, and we could loose none, so redis stored it easily and the worker start emptying the queue DB in the case was postgres.
[23:20:22] <nofxx> that been said 100+ w s is pretty easy to mongo. Read a lil about writing, there's a lot of options (speed vs consistency)