[04:13:13] <preaction> the best way is the one that maximizes storage efficiency and makes querying as simple as possible
[04:14:30] <aliasc> suposse i have a channel with thousands of videos
[04:14:42] <aliasc> and a Channels collection with my channel inside
[04:15:03] <aliasc> i cant embed videos as arrays in the document since the document would grow
[04:15:09] <aliasc> so large it will exceed the limit
[04:16:24] <aliasc> so i need sepparate Videos Collection
[04:17:02] <aliasc> and link documents with ids, i think this is not the way mongodb is meant to work right ?
[04:19:23] <preaction> for that exact situation, it sounds like that is what you want
[04:19:44] <preaction> but if videos have comments, you could make those as an array of inner documents
[04:20:25] <aliasc> i've just read an interesting answer in stackoverflow
[04:20:53] <aliasc> embedding is good for small documents that dont grow fast and do not change regularly
[04:21:31] <aliasc> actually for my project i dont perform too much actions on videos
[04:22:32] <aliasc> its not even a video sharing project, im using mongodb as cache db to compare and control some data
[04:22:43] <aliasc> on channels we own on youtube through youtube-api
[04:24:07] <aliasc> how to insert documents if not exist with batchinsert
[04:40:54] <MacWinner> HI, with gridFS or with the gridfs files collection, is there a way I can find all files that are under a prefix? like filename: "/mypath/subdir/*"
[04:41:18] <MacWinner> i have all my files stored in gridfs with a filename key that corresponds to a linux file path
[04:49:31] <joannac> MacWinner: isn't that just a regex search?
[04:49:58] <MacWinner> didn't know if there was some special gridfs search
[04:50:09] <MacWinner> mongofiles seems to have a commandline switch for it
[05:07:55] <MacWinner> running into weird issue with php driver. where findOne() returns a document.. but find() does not with the same exact query: eg: $files = $collection->findOne(["md5" => "d41d8cd98f00b204e9800998ecf8427e"]);
[05:08:08] <MacWinner> if I replace findOne, with just find, i don't get any results back
[11:20:01] <pamp> i need to creat index on fields k and v in the pros array
[11:20:18] <Lujeni> pamp, your index is not fully in memory?
[11:20:31] <pamp> but i get the error "key too large to index"
[11:20:47] <bowlingx1> hey, I have a question to a gist: https://gist.github.com/BowlingX/2b5e4420f1da73decb4d#file-gistfile1-js-L14-L16. I have lots of Data that I'm going to iterate with this script (for a migration).
[11:27:54] <joannac> also, if your v could be an array, you need to rethink your schema
[11:43:24] <pamp> how can I know the size of a particular field in a specific document??
[12:11:04] <Waheedi> Why would mongodb crash on this. http://pastie.org/9993441
[12:11:24] <Waheedi> i understand something fishy is happening here but it should not crash
[12:24:01] <rosenrot87> I do have a question about a weired behaviour of the MongoDB Driver for C#, it works flawlessly if I do not use the replSet option. Once activated this option, I get a connection refused from server. What do I miss?
[12:26:43] <StephenLynx> i think it is for replica sets.
[12:26:48] <StephenLynx> do you have a replica set?
[12:27:48] <rosenrot87> I use the option replSet=rs0 in the mongod.conf
[12:28:10] <rosenrot87> i also use rs.init() and there is a oplog.rs within the local database
[12:28:41] <rosenrot87> if i comment the replset=rs0 within the conf file it works
[12:30:19] <rosenrot87> I use the oplog.rs to get notified if there are changes within my database. This is the only reason why i use replica sets
[12:30:55] <Derick> what's the full error that you get?
[12:34:15] <rosenrot87> Exception:Caught: "No connection could be made because the target machine actively refused it" (System.Net.Sockets.SocketException) A System.Net.Sockets.SocketException was caught: "No connection could be made because the target machine actively refused it"
[12:34:57] <Derick> and what's in your mongodb log?
[12:35:02] <Derick> what's the connection string that you used?
[12:47:54] <joannac> what benefit would you get from it staying up?
[12:48:04] <Waheedi> it does not affect the replica set
[12:48:12] <joannac> it can't take writes. it had stale data so you probably don't want to read
[12:48:25] <rosenrot87> joannac: I can connect from everyelse, even from my shell on my client without any problem
[12:48:28] <Waheedi> no i want to read actually joannac
[12:48:44] <GothAlice> Waheedi: Except that it detected that its data is wrong. Unless you _want_ wrong data coming back in answer to queries, you pretty much want it to "crash".
[12:48:47] <joannac> Waheedi: you want to read from a member that's behind? why?
[12:49:23] <rosenrot87> joannac: it is only the mongodb driver which tells me connection refused. this is why I'm here :)
[12:49:29] <guest999> hi. im using the windows dirver for mongoDB, but cannot resolve the 'Mongo' class in Visual studio. I have referenced MongoDB.Driver.dll and MongoDB.Bson.dll and have the necesary 'using' clauses
[12:49:49] <GothAlice> Simply have automation to detect removal of a member which spins up a replacement. High-availability means you need monitoring and automatic resolution of certain issues.
[12:50:10] <guest999> ... i have the Mongo system up and running
[12:50:11] <rosenrot87> guest999: Did you use nuget to install it? is your .net version > 4?
[12:50:23] <joannac> rosenrot87: weird. you can connect to it using the same user/password/IP?
[12:51:17] <rosenrot87> joannac: I can connect from my windows gui app, my windows shell, I did not change anything on the client side, only the replset within the conf file on the server running mongodb
[12:51:24] <guest999> rosenrot87: yes. .Net 4.5 and yes using Nuget. The dll files don't seem to have a class called 'Mongo' anywhere that i can see !?!
[12:51:48] <joannac> rosenrot87: that's very weird.
[12:52:13] <rosenrot87> guest999: what about MongoServer,MongoClient?
[12:52:43] <rosenrot87> joannac: if i remove the flag from the config I can connect again....any ideas?
[12:52:58] <guest999> rosenrot87: yes i can see thoes. the tutorials im reading refer to 'Mongo.Connect()'
[12:53:30] <rosenrot87> guest999: look for another tutorial, there were a lot of changes. keep to the latest ones 2.6.8
[12:54:29] <joannac> rosenrot87: what version of driver?
[12:54:32] <boutell> MLM: maybe Mongoose is using getters and setters (if you’re still here…)
[12:54:51] <boutell> they might not be actual properties
[12:55:02] <rosenrot87> guest999: I also experienced this. Even the tutorial of the MongoDB Driver website says you should use client.GetServer(), but this is already deprecated
[12:55:29] <guest999> rosenrot87: ah! ok cheers. im using Mongo 2.6.8
[12:55:29] <joannac> why are you using a beta driver?
[12:56:12] <guest999> rosenrot87: oh dear. i guess i'll try and work it out somehow.
[12:56:20] <rosenrot87> joannac: because i just started with mongodb and thought for something basic like connecting to a database even a beta driver should work
[12:56:45] <GothAlice> "Beta" and "should work" are a strange combination.
[12:57:18] <rosenrot87> I mean I do not want to do fancy stuff. Connecting to a database should work in a beta, at least from my point of view :)
[12:57:26] <rosenrot87> otherwise it should be called alpha :)
[12:58:15] <rosenrot87> Maybe there is one of the developers here which could verify if this is a bug maybe?
[12:59:02] <GothAlice> Both represent "pre-release, use at own risk". Neither should be assumed to do anything other than explode at the least opportune time.
[12:59:09] <joannac> I would try a non-beta version and see if that works
[12:59:23] <GothAlice> Indeed. When in doubt, avoid experimental code.
[12:59:33] <joannac> if it does, then I would consider filing a bug report
[12:59:47] <joannac> however my money is on a configuration problem
[13:01:14] <joannac> rosenrot87: I would also like to see the output of the working cases i.e. windows gui, mongo shell
[13:01:21] <rosenrot87> joannac: which configuration problem?
[13:02:24] <joannac> rosenrot87: I don't know, that's why I'm asking you to test
[13:03:07] <joannac> i highly doubt a driver would have gone out not being able to connect to a mongod, like you said it's a pretty basic feature :)
[13:03:58] <joannac> so I'm asking you for more data to independently verify what you said
[13:05:49] <guest999> all tutorials i see tghem use the statement: var m = MonoDB(); but in MongoDb 2.6.8 using driver 1.10, the website says http://docs.mongodb.org/ecosystem/drivers/csharp/ its ok to do it, but i cannot see this 'MongoDB' class.
[13:06:19] <Waheedi> so if this db.terms.remove(d._id) is taking more than 40 minutes to execute that means there is definitely something wrong in there right?
[13:06:40] <guest999> is there a list or something showing the API and/or whats deprecated ? (this is all very confusing for me)
[13:06:50] <rosenrot87> joannac: both the shell and the gui application work right now...i can use them es usual
[13:11:57] <joannac> what is going on in your system?
[13:12:10] <Waheedi> joannac: should i really tell you
[13:13:04] <Waheedi> joannac: I'm using a hosting company famous one :) and few cloud block storage volumes lost connection. and this machine was a primary node in replset
[13:13:55] <Waheedi> another node got elected to be the primary
[13:14:29] <Waheedi> but unfortunately that node was really outdated and it didn't shutdown or crash
[13:14:52] <Waheedi> while it was 1 month outdated*
[13:23:41] <GothAlice> Waheedi: I'm certainly something, but classification appears to elude you. You seem very obstinate for one requesting assistance; writing p-code describing what's going on is far less useful than providing logs demonstrating without modification or the need for assumptions what is _actually_ going on.
[13:24:14] <rosenrot87> joannac: I can connect from the shell when replset is OFF and ON, onces it responses with ">" and the other time with "rs0:primary>" :) Like it should
[13:24:53] <joannac> rosenrot87: yes, I want to see the output, including what options you give the shell
[13:36:16] <GothAlice> Waheedi: RTFM and see that even if it works, it's not how you're supposed to do it. The fine manual does not mention anywhere in the documentation for the command that use, thus the ability to use it the way you are is a fluke, unsupported, and subject to change without notice.
[13:43:35] <rosenrot87> the window just keeps blank
[13:43:36] <joannac> rosenrot87: none at all? blank screen?
[13:43:55] <joannac> that means there's no actual connection
[13:44:02] <GothAlice> All hail the mighty firewall?
[13:44:08] <joannac> so the question is, how come the mongo shell works?
[13:44:22] <rosenrot87> telnet server port -f output.txt should write everything to file.....file is just empty
[13:45:10] <rosenrot87> joannac: If i hit enter several times...I come back to the cmd again
[13:48:50] <joannac> rosenrot87: can you add the output of db.isMaster() in the mongo shell?
[13:49:13] <amitprakash> Hi, I have a collection package reflecting manifested shipments ( assigned waybills w/o information such as consignee/address etc )
[13:49:46] <amitprakash> When we get the actual shipment information, we go and individually update each document in the collection
[13:50:03] <amitprakash> However, would it be possible to bulk update these ?
[13:50:23] <amitprakash> i.e. when _id = blah, update with this and when _id = blahblha update with this_ and so on
[13:50:39] <GothAlice> amitprakash: Technically, yes, there do exist bulk operations. However they don't operate the way you're wanting, really.
[13:53:19] <GothAlice> rosenrot87: Does the bare name "phoebe" DNS resolve?
[13:53:33] <rosenrot87> joannac: should the host be contain the ip address?
[13:53:52] <rosenrot87> GothAlice: I do not think so
[13:55:04] <GothAlice> rosenrot87: Make sure all DNS names, both fully qualified and bare, resolve, even if you have to add them to /etc/hosts (%SYSTEM32%/drivers/etc/hosts on Windows, AFIK).
[13:55:16] <amitprakash> GothAlice, would it be advisable to go with bulk updates as opposed to updating one by one?
[13:55:32] <amitprakash> GothAlice, we'd be using unorderedbulkOps
[13:55:40] <rosenrot87> joannac: Could it be the problem that the host and primary entry "phoebe:27017" can not be resolved?
[13:56:21] <rosenrot87> GothAlice: Could I replace the "phoebe" with the plain IP address of the server?
[14:08:04] <rosenrot87> Thank you very much! I'm happy right now
[14:08:45] <rosenrot87> GothAlice: How to replace the phoebe entries with the actual IP address?
[14:09:33] <GothAlice> Two points: first, I don't know. Second, it's not something I'd ever do. DNS allows for reconfiguration, potentially without downtime or reconfiguration steps (other than adjusting DNS resolution), so this is a great feature, not a bug.
[14:10:52] <arussel> should I expect my driver (reactivemongo) to use a snapshot when doing a query ?
[14:13:22] <GothAlice> arussel: See the "snapshot" method reference on http://reactivemongo.org/releases/0.10/api/index.html#reactivemongo.api.collections.GenericQueryBuilder
[14:13:36] <GothAlice> arussel: Out of the box, likely not. No driver I know of snapshots by default.
[14:14:59] <GothAlice> (A QueryBuilder instance is what is returned by collection.find().)
[14:23:34] <arussel> GothAlice: thanks, not easy to google snapshot as we get all the result about version SNAPHSOT
[14:28:24] <menger> hey guys, how do i install mongodb-2.2.4 via yum?
[14:29:00] <Derick> you really don't want to stall an old version like that
[14:30:36] <GothAlice> The trick, I find, is in the balance between evaluating what the client wants, and giving them what they need.
[14:30:42] <GothAlice> http://docs.mongodb.org/v2.2/tutorial/install-mongodb-on-red-hat-centos-or-fedora-linux/ < these are the docs from 2.2, no guarantee it'll work.
[14:30:47] <Derick> but, really, you want the latest 2.6
[14:34:46] <GothAlice> menger: https://jira.mongodb.org/browse/SERVER/fixforversion/12313 < browse forward from here (navigation in upper right) to see what has improved over time up until 2.2.7.
[14:35:27] <GothAlice> ("View issues" at the bottom for the full list for any release.)
[14:36:00] <GothAlice> (Or "Release Notes" button in top right for a breakdown by category for any release.)
[15:05:39] <Cygn> Anyone ever used the jenssegers laravel plugin for mongodb and figured out how to use map/reduce there?
[15:37:55] <calmbird> hi, could you help me: http://gyazo.com/a6d611265928b046c91f41edacb81877 , mongoose: how could I deal with category = All in my case?
[15:38:31] <calmbird> without changing to much given code
[15:39:14] <GothAlice> calmbird: Progressive query refinement. Basically, store the intermediate result of Idea.where('author').equals(author) somewhere, then only conditionally restrict on category through re-assignment of the query variable you are building up.
[15:39:44] <GothAlice> Wish that was a text gist instead of an image…
[15:45:23] <GothAlice> Collection.where('field').equals(value) — How is this "easier" than Collection.find({field: value})? What the hojek does .where() even return? Can you save _that_ for later? (I wouldn't try it…) I love libraries that give more questions than answers. XD
[15:46:19] <calmbird> GothAlice: I don't know actualy
[15:47:11] <calmbird> Just listening to mongouniversity, but starting to hate mongoose lately
[15:47:33] <GothAlice> The majority of support cases I deal with in here are mongoose-related injuries.
[15:47:34] <d0x> To make our "operational" data (collected over the day) available in a dashboard, I used a daily Mongodb-MR job is converting it (joining collections). I used MR because i don't like to have another "infrastructure" to do this job. (in sum we have around 400GB). Is it common to use MR for this kind of tasks? Because I have hit already limitations like calling `db.xxx.find(this.xxx)` in the map stage. As workaround i simply loaded the wh
[15:47:34] <d0x> ole xxx database into the scope... Which is working because joined collections aren't huge.
[15:48:46] <calmbird> Oh I can't find clear answer in the internet. Can we start mongod with some parameter, that will ensure data will be writen to disk, before answering?
[15:49:23] <calmbird> Because in standard, mongodb is saying ok, and then writing data later right?
[15:49:33] <GothAlice> d0x: We took a different approach for our event analytics. We pre-aggregate the relevant statistics, allowing us to produce whole dashboards a la http://cl.ly/image/2W0a2D3I370F that generate, from live data, in less than 100ms.
[15:49:49] <calmbird> prety bad way for important data
[15:50:43] <GothAlice> calmbird: There are many different options you can choose when telling MongoDB to read and write data. What you're looking for is the ability to set a default "write concern" for your application connections.
[15:52:00] <calmbird> I just want to be sure, that mongodb will write data to disk before answering ok.
[15:52:45] <GothAlice> calmbird: By default these days, MongoDB will wait for confirmation of receipt by the server. If you want disk-persistence guarantees, enable "journalling" in your write concern. If you want to avoid a single-host failure nuking the data, you can specify how many hosts you want the data to appear on before the write is considered complete.
[15:54:50] <calmbird> I'v seen video https://www.youtube.com/watch?v=JWaDa8taiIQ, that guy is saying mongodb is bad for criticall data, because you are not sure, data will be writen to disk, but mongo will always answer yes. But I'm guessing that it has been fixed for now.
[15:55:01] <GothAlice> That's been fixed for a while.
[15:55:10] <GothAlice> It's now just a common misconception.
[15:55:35] <GothAlice> calmbird: As a fun note, I actually _lower_ the write concern for many of my inserts. Centrally aggregated logging records need to be fast, it's OK if a few get lost during high load. (There are machine local files which contain any missing records.)
[15:56:43] <krisfremen> funny how people believe everything the read on the internet without doing any testing for themselves
[15:57:37] <calmbird> krisfremen: Lack of time, and lazynes
[15:58:43] <calmbird> I can just use mature database like mysql , postrI can just use mature database like mysql , postgrees etc, that everyone using. Or get lucky shoot with mongodb, and then cry, that 10k Euro transfer wasnt writen in DB. :)
[15:59:02] <GothAlice> calmbird: I have 26 TiB of data in MongoDB, and I've had it in MongoDB for 8 years…
[16:00:54] <jeho3> MongoDB is software from pussies for pussies
[16:00:56] <Derick> GothAlice: wikipedia says our first release was in 2009 though :)
[16:01:22] <GothAlice> Okay, 6. My "nearest even number" rounding failed in this instance. calmbird: I migrated my data out of MySQL as soon as a viable alternative was presented. MongoDB fit the bill for my requirements, even back then.
[16:02:22] <Derick> i can play this game for a while...
[16:02:45] <GothAlice> calmbird: If you've ever had a failure in MySQL that required you to reverse engineer the on-disk InnoDB table structure… BSON is a joy to work with by comparison.
[16:03:01] <Derick> GothAlice: yeah - at least one document stays together
[16:03:55] <GothAlice> Derick: It was a bit of a shock that even with directoryPerDB enabled, InnoDB stored critical table structural information in a common pool a directory up… quite the surprise indeed.
[16:04:18] <calmbird> GothAlice: : In mongodb we need to join data on server side, in mysql we can join data in database side, is it any issue of mongodb? is it slower etc?
[16:04:47] <GothAlice> calmbird: In MongoDB you don't design your data models as if they were relational, so in general the difference is a non-issue.
[16:05:12] <GothAlice> There are other ways of storing related data that make more sense when you have document storage, for example, I store all of the replies to a forum thread within that thread's document.
[16:05:39] <GothAlice> Need to move an entire thread to a different forum? $set one value in one document and you're done.
[16:05:42] <medmr> calmbird: if you have highly relational data
[16:05:49] <medmr> you may want to step back and reconsider mongodb
[16:05:51] <calmbird> GothAlice: Well in mongouniversity course, they said that for larger data, we should add more collections. So then we need to join some data.
[16:05:59] <Derick> GothAlice: what sort of "tricks" did you do to make sure you're not hitting the 16MB doc limit that way? Or, don't you have that many replies?
[16:06:16] <GothAlice> Indeed. Highly relational data, data where you have transactional requirements, or data where you are performing deep graph traversal are all examples where MongoDB might not be the right solution for you.
[16:07:01] <medmr> but doing multiple calls to get related data
[16:07:06] <medmr> makes sense sometimes with mongo
[16:07:14] <GothAlice> Derick: 16MB doc limit = (16*1024*1024*1024/6.5) ~2.5 million words if the average length is 5.5 and counting spaces.
[16:07:26] <Derick> GothAlice: yes - I know it's quite a lot :-)
[16:07:34] <GothAlice> Derick: But, in general, it's trivial to have a "comment" at the end that links to the continuation thread, then the initial thread is locked.
[16:07:39] <medmr> especially if you are looking up the related data by _id, thats pretty fast
[16:07:50] <Derick> GothAlice: okay - just curious what you did there.
[16:07:50] <Cygn> Hey Guys just a general question, if i have a collection, let's say with customers, each customer did multiple sales, would you say this is something that should like ALWAYS be done with a relational database? Or could somebody for example say i insert the sales as a sub-array inside a customer collection without ruling against any best practice rule?
[16:08:21] <Derick> Cygn: how does your app want to make use of the data?
[16:09:00] <calmbird> look: http://gyazo.com/9094325c7d11126a3cb65e261200a670, this is from mongouniversity.com course, they are making related data here
[16:09:26] <medmr> there are different approaches each with pros and cons
[16:09:36] <calmbird> I will need to join this data
[16:09:37] <medmr> best choice depends on how you intend to use the data
[16:09:38] <Derick> medmr: I don't think I would have separated that out.
[16:10:12] <Cygn> Derick: I need to fetch sales, by ONLY counting them, using their date and market (would be saved in the subarray). Before i do that, they would be filtered using attributes of the customer. (For Example, i could need all sales of an iPad (article name in the subarray), but only if the customer is from USA(country in the customer data) )
[16:10:13] <calmbird> yeah we have something like 15MB limit, cant remember, if we expect documents to have large amount of data, we should split it to more collections and eventualy join it
[16:10:56] <Derick> Cygn: i would not create subdocuments in that case then. Have each document store date, market, customer attributes that you need to query on - perhaps
[16:11:17] <Derick> calmbird: yes - but 16MB is a lot
[16:11:34] <calmbird> Derick: : I understand, but subdocuments are part of document right? and if we will have realy alot subdocuments, we can extend ~15MB limit
[16:11:52] <Derick> in the case of blog comments f.e. - if it's something like my tech blog, it's not a problem as I rarely have any comments at all. If you're hackernews, you might run into issues.
[16:11:54] <calmbird> but never say to much, or you can be surprised :D
[16:12:03] <Derick> calmbird: no, 16MB is per top level document
[16:12:04] <Cygn> Derick: But that would mean i should rather use a relational database? Because i was very happy about the speed that mongodb delivers right now, but that was BEFORE i had to filter it by the other filter criterias.
[16:12:15] <Derick> subdocuments, are part of the full document and count towards the 16MB
[16:12:37] <Derick> Cygn: mongo also handles it just fine - and potentially faster?
[16:12:46] <medmr> is there any way you can fetch related documents from another collection with out first looking up the salesperson
[16:12:55] <Derick> calmbird: you can't increase the 16MB limit.
[16:13:06] <medmr> i.e. look up the salesperson and the sales documents in parallel?
[16:13:13] <medmr> if you have a key that they all include
[16:13:20] <Derick> medmr: store the sales person with each sale, and you don't need to
[16:13:27] <calmbird> i have a game, document user with subdocuments, mapobjects, mapblockers, friends everything, and I started to worry about 16MB limit actualy
[16:13:40] <GothAlice> calmbird: Consider that I when I was writing those forums I had to import all of the old forums for a gaming group with 14,000 people. The entire old forums could have fit in a single 16MB document.
[16:13:48] <Derick> medmr: then you need to do two queries...
[16:13:53] <Cygn> Derick: Actually right now i don't see how to get this without joins… medmr: in every sale there is an attribute which declares the salesperson
[16:14:00] <medmr> GothAlice: you can't just say its a lot and not address the limit
[16:14:17] <GothAlice> medmr: I previously described, for the forums example, a method of having continuations.
[16:14:45] <medmr> Derick: two queries is okay, its like doing a join
[16:15:02] <Derick> medmr: yeah, it's like doing a join client side (ie, on the web server)
[16:15:02] <calmbird> Well in subdocument operations, we still miss some functionality in mongodb.
[16:15:06] <Derick> nothing really wrong with that
[16:15:26] <Cygn> medmr: So you would just fetch all sales that fit and all customers that fits and then iterate through it and see which fits together?
[16:15:33] <medmr> if the queries can be done in parallel its not significantly slower than looking up one monolith document
[16:15:53] <medmr> if i had a /salesperson/00001/ page
[16:16:07] <medmr> i would fetch the salesperson using id 00001
[16:16:34] <calmbird> something like db.something.update({$set: {array1.variableA: 'someValue'} }
[16:16:36] <medmr> with an index on sales for .salespersonid
[16:17:09] <Cygn> The Problem is this could result in quite a high amount of data… let's say a few million sales and a few million customers, and i would have to fetch nearly 90% just to sort it out… does not feel like the most clever way :/
[16:17:17] <calmbird> it wont set variableA to alll objects in subdocument, only to first
[16:17:27] <medmr> no you wouldnt have to fetch a lot to sort it out
[16:17:35] <medmr> you can adjust your query on sales collection as needed
[16:22:06] <Cygn> medmr: okay, most of the time, the question will f.e. be "how many sales in june" - i don't need the customer for that, only the sales (data is in the sales). But then there will be questions like "how many sales in june by vip customers" - that's where i will have to start filtering the results by the customer criterias
[16:22:32] <GothAlice> "VIP customers" is, however, a small subset.
[16:22:46] <GothAlice> Thus a client_id: {$in: […]} isn't too egregious.
[16:22:48] <Cygn> medmr: Actually i have to in a way (because i always have to count sales, filtered by customer criteria) but no i don't have to display the customer information by itself.
[16:24:03] <Cygn> GothAlice: That was just one example without digging to deep in the actual topic. The Customers we are talking about here have very much specific criteria (because actually it is not a customer buying but a sales unit, which contains information about series, headunit, country, model etc.)
[16:24:04] <medmr> you could either include flags that make sales searchable by those criteria
[16:24:49] <medmr> or do the customer lookups first
[16:25:05] <medmr> and get sales custid: {$in: []}
[16:25:16] <GothAlice> Cygn: The trick with MongoDB is that your "schema" isn't sacrosanct, it should instead adapt to meet your query needs. Pre-aggregation (i.e. including "vip" in the sale) isn't data duplication, it's making your data queryable.
[16:26:06] <Cygn> GothAlice: Yeah i'm kind of getting this idea right now, but that would mean i would have to update every single sale of the customer when the customer changes, right?
[16:26:31] <GothAlice> Potentially, but again, an index on cust_id will make that type of update fast.
[16:26:38] <medmr> this is the problem you get into sometimes
[16:26:42] <Cygn> medmr: by flags you mean just insert the customer attributes into the data, because i just googled "mongodb flags" ;)
[16:28:43] <Cygn> GothAlice, medmr: Just one last backquestion. If i would like to insert the data. My structrue should be Customer1 { id: 1, sales: { sale1, sale2, sale3 }}, Customer2{id: 2, sales {sale1, sale2, sale3 }}, and an index on the customer id, correct?
[16:29:32] <GothAlice> I personally wouldn't store sales embedded under customers.
[16:29:46] <GothAlice> In fact, at work, invoices are their own collection.
[16:30:00] <cheeser> it's the only reasonable choice
[16:30:31] <Cygn> GothAlice: Okay, but (sorry for asking again) then i don't get how i should filter the sales using the customer data, without fetching the customer first in my client/server-side code?
[16:30:44] <GothAlice> Indeed. We also pre-aggregate the related data and embed it all, since our invoices need to be involatile.
[16:31:13] <Cygn> GothAlice: But by pre-aggregation you mean aggregate on database-side or on server-side?
[16:31:34] <GothAlice> Cygn: Pre-aggregation means doing it at the time you insert the record.
[16:31:39] <GothAlice> Cygn: Question: if a customer changes their billing address, do you want old, archived orders to change to reflect the new address?
[16:32:04] <GothAlice> (The only correct answer is: no. Archived financial information should not change.)
[16:33:06] <GothAlice> As such, we clone pretty much every detail about the user placing an order, and the company the order is being billed to, within our invoices. (All that make sense to store, at least. Login history from the user doesn't need to be there. ;)
[16:33:36] <GothAlice> Querying our invoices for company or user-related information is then easy; no join, since the data is "pre-joined".
[16:34:18] <Cygn> GothAlice: Sure, actually in our case we are storing the sales only for statistic use. But anyway, if i get you right, you would aggregate the two collections in the moment you would query for it, like creating a View in the classic sql world?
[16:34:56] <GothAlice> Pretty much the exact opposite of what you just described.
[16:36:12] <Cygn> GothAlice: You would insert all the customer data in the sale just when you save the sale? But didn't you say invoices are their own collections? Sorry, i really try to get it.
[16:37:08] <GothAlice> Indeed. If a user changes their e-mail address, or a company's VIP status changes, in our case we still want those archived orders to say "yeah, this order was a VIP order".
[16:37:48] <GothAlice> Real joins, or fake joins, would actually give us the wrong answers. :)
[16:38:21] <Cygn> GothAlice: I won't have this situation (since our "customers" are sales units for statistical data, if they change their status, it should be changed for older sales also), but anyway i could think of doing it that way anyway.
[16:39:22] <Cygn> Now i just have to find out how to fetch for subarrays of an entity using a condition for only specific subarrays ;)
[16:39:35] <mikeputnam> can someone share or point me to pragmatic methods to handle data migrations/deployments as part of a workflow? devs create code + data/schema changes => i orchestrate those changes into deployments that get applied to staging, production
[16:40:26] <mikeputnam> in the mysql world, i've done this with capistrano and incrementally numbered .sql files that get applied sequentially.
[16:40:51] <StephenLynx> with mongo you don't have a schema.
[16:41:23] <mikeputnam> i realize this. but that is a symantic argument. the concept of a schema still exists.
[16:41:40] <StephenLynx> one can write code that perform queries that rename fields and adapt data to a different logic.
[16:41:53] <Cygn> GothAlice, medmr: thank you VERY much !
[16:42:43] <StephenLynx> when I am writing code I always assume things may not be there and write defensively around that.
[16:42:58] <StephenLynx> so if something changes it won't break anything
[16:44:09] <StephenLynx> so unless you are working with a tool that implements a pseudo-schema, like mongoose, the developer will have to develop it from scratch.
[16:46:56] <mikeputnam> i see. that makes sense. from my perspective, i will likely be adding a "db_changes" step to my deployment process that just runs whatever the devs come up with.
[17:08:51] <StephenLynx> specially when I just need to know if the array contains a specific element
[17:08:55] <StephenLynx> so I don't have to iterate over it.
[17:09:15] <StephenLynx> but not always, though. sometimes I want the full array.
[17:09:32] <GothAlice> mikeputnam: The runner searches a given Python package namespace for scripts with main() functions (sorta, it's a bit indiscriminate at the moment) and runs them, using docstrings to output progress. The migrations themselves are written so that multiple execution is safe, i.e. they check preconditions.
[17:13:47] <Cygn> StephenLynx: in my case it could be really useful since i never need the full data, i only need to count the subarrays using one of their criteria
[17:18:39] <mikeputnam> GothAlice: thanks! this confirms the direction StephenLynx suggests. generic runner runs specific developer-created code that changes mongo appropriately.
[17:19:34] <mikeputnam> in the past i've come upon suggestions that documents should have a version number to enable rollbacks. any thoughts on this?
[17:20:28] <GothAlice> Well, that alone doesn't enable rollbacks, but consider: why does one really need to move backwards?
[17:22:04] <d0x> GothAlice: Thx for the references. Our application is using serval collections which are designed for the daily business. And to "pre-aggregate" them for the Dashboard i utilised MR (because the Aggegration Framework had some limitations for us (like missing string functions, accessing data from other collections, ...)). After the MR is executed i use the aggregation framework as well to get fast responses.
[17:22:04] <d0x> How would you "pre-aggregate", "pre-join", ... the data? I thought the best is to utilise mongoldb as well. And the only option was for me is MapReduce. As alternative i could do smth. like db.xxx.find().forEach(...). But that doesn't scale.
[17:23:14] <mikeputnam> my perspective is that of a system administrator. i'm concerned about late-at-night code+dat migrations that fail due to unforseen data condtions (or whatever) and i'm left holding the bag. as a non-developer with 15 developers generating change, this has me concerned about the stability of production.
[17:24:11] <mikeputnam> (and my own ability to recover from that sort of scenario)
[17:25:53] <mikeputnam> we are moving toward an always up/no downtime deployment model (blue/green) but this introduces more concerns for the developers writing the db changes correctly. -- at least in this model, i can recover by just reinstating the previous version.
[17:26:07] <GothAlice> mikeputnam: Standardize your deployment procedures. We have a random YouTube music video go up on the projector when the automation gets triggered, and the developer deploying must wear a silly hat.
[17:26:33] <GothAlice> (These two things generally attract attention, which means oversight of whatever that developer is doing in production.)
[17:28:07] <GothAlice> For example, we also deploy first to a clone of production called 'staging'. And only if everything goes well there do we repeat the automation on the real production environment.
[17:29:07] <daidoji> if inserting with continue_on_error=True, is there a way to get all the exceptions?
[17:29:17] <daidoji> or is it like the documentation says, I only get the last error that occurred in the batch?
[17:29:31] <GothAlice> I suspect the documentation would be correct in this instance.
[17:30:36] <daidoji> hmm, then thats kind of pointless for my use case then. Guess its better to use the old style bulk_import functions if I need all the errors?
[17:31:15] <GothAlice> Indeed. You'll get reduced throughput (inserts/sec) by querying for the result of each, but if that's what you need, that's what you need.
[17:34:50] <daidoji> also out of curiosity, what's the fastest way to load data in your experience? python -> pymongo or python parse -> stdout -> mongoimport
[17:35:32] <GothAlice> I've never really bothered to benchmark bulk loads. I do the smallest number of hops, so pymongo direct.
[17:35:34] <daidoji> I ask because we pipe-delimit here like all god-fearing people and mongoimport doesn't support choice of delimiters for some reason :-(
[19:08:18] <StephenLynx> the downside with this option ,though, is that if the array is empty, you will get nothing, if I'm not mistaken.
[19:08:21] <GothAlice> It's a more… verbose… way to query your data vs. normal find().
[19:08:42] <GothAlice> keeger: For the most part attempting to keep elements within a nested array in any order other than insertion order (i.e. always append or always prepend) is difficult.
[19:08:51] <StephenLynx> unwind splits the array into a series of documents with an element of the array each.
[19:09:38] <keeger> can i do the equivalent of a select order by?
[19:09:48] <GothAlice> Of a sub-array, not normally, no.
[19:09:58] <StephenLynx> because sort will only work for documents.
[19:20:57] <keeger> i am thinking of putting in a simple lock system, because i need to loop through this array and update other parts of the same document. but it could take 100 ms who knows
[19:20:58] <mordonez> I want to add a replica for my db
[19:21:20] <mordonez> I have to run rs.initiate() on the replica?
[19:21:43] <keeger> so i was thinking of putting a document.LockId entry, and then the clients would read, look for lock, if missing lock it with a unique Id, read it back and confirm the Id matches, and it's theirs
[19:22:03] <keeger> then when done processing, clear the lock
[19:22:08] <GothAlice> keeger: MongoDB doesn't ever really do two things to the same data at once. Instead, MongoDB reduces operations down to individual atomic operations. So operations A and B on the same data will either resolve A, B, or B, A.
[19:22:52] <GothAlice> mordonez: Or, if you don't already have a replica set, see: http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
[19:24:32] <GothAlice> keeger: Update operations (A and B in my continuing example) may have conditions that need to be true before the update can be applied. This would be the query that is passed as the first argument to .update() (normally). You can check for ownership of a lock and if nUpdated=0 (in the returned result) the condition failed.
[19:25:14] <keeger> GothAlice, ah, that would work nicely
[19:25:40] <keeger> i think that's more straightforward and lightweight than trying to a network level semaphore
[19:26:57] <GothAlice> keeger: As a note, so far I haven't ever required a dedicated field for locking except for my distributed task runner, which locks based on current worker. Simple field updates and update-if-not-modified (conditional) updates are more than sufficient for most cases.
[19:27:47] <keeger> i agree, for the majority of my app that should be fine
[19:28:14] <keeger> but i do have one process that i don't know how long it will run, and it can only run once. i'd like to lock it at the db level cuz of the horizontal ness
[19:30:48] <MLM> How do I find documents that have a certain string in one of their array of string fields? Mongoose `Model.find({ usersIds: req.user._id })` isn't pulling up any results (confirmed that they are in the collection)
[20:01:50] <sivli> Thus one of the selling points to mongodb :)
[20:02:20] <sivli> (and the fact that meteor js locks us to it, but I am not complaining)
[20:02:22] <kali> well, the geojson it supports is usefull
[20:02:40] <kali> and the implementation is reasonably efficient
[20:03:21] <sivli> Agreed. Ok wel; thanks, I just wanted to be sure before I told the boss man no can do for topo. Not worth losing the mongodb support.
[20:19:23] <MLM> Running `db.queues.find({userIds: '54effc576676066c20293010'})` returns some results in the mongo shell. Running the same query with Mongoose `Queue.find({ usersIds: '54effc576676066c20293010' })` gives no results. What is the best way to debug this?
[20:20:06] <GothAlice> MLM: It's actually a critically important distinction: are you storing the hex-encoded string version of the ObjectIds, or are you storing actual ObjectIds?
[20:21:13] <MLM> When it is created: `queue.userIds = [req.user._id];`
[20:21:15] <GothAlice> Less cool, but consistency is key. You can't mix the two and get sane results back. It's less cool because you're wasting 12 bytes for every single reference stored, but…
[20:21:22] <GothAlice> Ah, then that's an ObjectId.
[20:21:32] <GothAlice> So you're mixing, and your ability to query that field goes out the window. :(
[20:24:10] <GothAlice> That's a very good question. Likely the ODM-side of things recognizes that that field is an ObjectId reference and is attempting to automatically cast the (valid) string-version you are supplying.
[20:24:16] <GothAlice> Please: standardize on real ObjectIds.
[20:24:34] <GothAlice> Real ObjectIds store some interesting data that is useful to be able to access. And halving the storage space is nice.
[20:25:28] <MLM> Will do (convert to consisten ObjectId). Need to look up again the quirks with comparing them. Thanks for the help
[20:25:42] <GothAlice> They compare numerically based on time.
[20:25:50] <GothAlice> (Effectively. Time is the first field in the BLOB.)
[20:58:21] <MLM> How do I query for a field with that is an ObjectId?
[20:58:22] <MLM> The mongo shell is returning the documents with this query `db.queues.find({userIds: ObjectId('54effc576676066c20293010')})` but I can't get Mongoose to return it. I tried converting the string to a ObjectId in the query itself even though I have read that it should be auto casted.
[20:58:27] <MLM> `Queue.find({ usersIds: '54effc576676066c20293010' })` or `Queue.find({ usersIds: new require('mongoose').Types.ObjectId('54effc576676066c20293010') })`
[21:02:58] <obeardly> MLM: I'm a noob to Mongo, but I would couldn't you just: ObjectId("yourstringgoeshere").toString()
[21:03:16] <MLM> GothAlice :/ - I know we talked before about native Mongodb driver and I was tempted and tried moving over but I kinda like the enforced Schema system which I would need to find somewhere else or write custom
[21:04:24] <MLM> Maybe it is worth the leap because of these weird issues
[21:10:34] <MLM> Here is a full barebones test snippet that demonstrates the issue if anyone is interested. If it is not my fault then I'll make an issue: http://pastebin.com/enUPT03A
[21:15:47] <GothAlice> MLM: There's a schema, it obviously says "this is an array of ObjectIds", doesn't cast when querying. Looks like a bug, or, a feature, depending on how generous the Mongoose developers want to be.
[21:16:29] <mspro> hello, i think i managed to create an new collection with the name [object Object] when i used the copyTo command, now i can’t drop that collection, anyone an idea?
[21:17:20] <calmbird> hi, can mongodb somehow deal with transactions/object locking?
[21:17:35] <GothAlice> mspro: In the MongoDB shell you can access collections as attributes of the database (db.foo) but also as associative array elements (db['foo']) — this latter approach should let you clean up that mis-named collection.
[21:17:55] <GothAlice> mspro: Also love the combination of your IRC handle and having that particular issue, BTW.
[21:18:51] <GothAlice> calmbird: That's the typical approach. Lighter-weight approaches involve "update-if-not-different" mechanisms; two-phase is most useful for coordinating changes to multiple documents, on single documents the simpler approaches will likely be better.
[21:19:07] <MLM> GothAlice: Not totally understanding what you are saying. Is it just the way I am trying to query it or a true bug?
[21:20:16] <GothAlice> MLM: Unfortunately you'd have to ask the Mongoose developers. Whenever Mongoose comes up I'm reminded of a quote which I will modify appropriately: I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a design. I.e. they may consider the behaviour you are running into to be by-design.
[21:20:36] <mspro> thanks GothAlice i will look in to that
[21:22:41] <MLM> Made an issue: https://github.com/LearnBoost/mongoose/issues/2728 - If you see any way to improve the clarity/description of the issue, then I'm intersested
[21:23:33] <GothAlice> MLM: The test case looks thorough and complete for the issue.
[21:28:15] <GothAlice> obeardly: One stores ObjectIds as ObjectIds to save 17 bytes of space (hex encoding doubles the number of raw bytes, plus requires a terminating CString null and leading 4-byte PString length), to maintain numeric comparison capability (i.e. you can query $gt/$lt ranges to filter based on record creation time), as well as maintain the client-side capability to examine the fields of the ObjectId (timestamp, host, PID, sequence number) without
[21:28:30] <GothAlice> I love it when people disappear mid-typing. XD
[21:30:24] <mspro> GothAlice: i can access my other collections with the [‘foo’] syntax, but when i do a ‘show collections’ i get a list with my collections + on top of that list within square brackets i get the [object Object] that wasn’t there before? It’s like when you want to print an object
[21:33:06] <mspro> GothAlice: your a genius and i still have a lot to learn, so my collection was literaly named [object Object], i did not expect that one :) thanks
[21:33:49] <GothAlice> mspro: MongoDB would "stringify" any value you try to use as a collection name. "show collections" is less useful to see what's going on than "db.getCollectionNames()" is in this instance.
[21:35:21] <keeger> so if I have a natural key for my document, would it be better to use that over the ObjectId?
[21:35:36] <GothAlice> keeger: If it's unique and can be generated without querying the existing dataset, yes.
[21:35:37] <mspro> yes, i tried that second one but still didn’t get that it was listing the literal name
[21:36:25] <keeger> GothAlice, when you say generated without querying, you mean, it's not like Id = Max(id) + 1?
[21:36:53] <GothAlice> keeger: Indeed. Or, say, something more complex like YYYYWWNNN where NNN is the invoice number for the week WW in year YYYY. (Our invoicing scheme at work.)
[21:37:16] <GothAlice> Any time you need existing data to insert a new record you run into race conditions.
[21:37:46] <GothAlice> (This is why MongoDB uses ObjectId and not auto-increment. ObjectId scales to multiple independent processes quite well.)
[21:38:35] <keeger> yeah, i'm just trying to figure out how i want to handle a data structure
[21:38:56] <keeger> i have a fixed length set of locations
[21:39:06] <GothAlice> (This is something Facebook and Twitter had to learn the hard way… Twitter even rolled an entire separately-scaled software service for the purpose of generating IDs without conflict. ;^)
[21:39:29] <keeger> and 1 person can only "own" a location at a time
[21:39:49] <keeger> i dont want to have to do joins, so i'm thinking the location is inside the person document
[21:40:13] <keeger> and if it changes, i copy the location sub document to the new owner's place
[21:40:25] <GothAlice> Setting ownership in a way that there can be only one victor (first person to claim wins) is easy using update-if-not-modified. db.locations.update({_id: …, owner: null}, {$set: {owner: ObjectId(…)}})
[21:40:37] <keeger> it's not even a race condition
[21:40:55] <keeger> more of a do i store a collection of Locations, and a person holds a reference to it?
[21:40:57] <GothAlice> The record with the given ID won't re-set the owner if one has already been set. (The result, returning nModified, lets you know if the assignment worked, or there was a conflict.)
[21:41:26] <GothAlice> Indeed, that's how I'd code up the initial implementation. Start simple. :)
[21:42:07] <keeger> how does that work for document atomicity
[21:42:22] <keeger> if i go, person A, location [1] update
[21:42:29] <keeger> and location[1] is a refence, is that atomic?
[21:43:29] <GothAlice> Updates to a single record will always be resolved in a linear fashion. So two updates, U1 and U2, will never really "conflict" (other than trampling each-other's data), and the "find" part of that update handles the trampling by requiring the field to be empty (null) or the $set won't be applied.
[21:44:25] <GothAlice> If your application layer issues the updates in the same microsecond it may seem semi-random which one will win. But with an update like the above, only one will win. The other will update nothing and be informed of this fact.
[21:44:39] <dscastro> what is that mean: failed with error 57: "The dotted field 'haproxy-1.4/web_proxy' in 'pending_op_groups.0.pending_ops..sub_pub_info.haproxy-1.4/web_proxy' is not valid for storage."
[21:45:00] <GothAlice> dscastro: You aren't allowed to have extra symbols in field names.
[21:45:11] <GothAlice> For example, not much about that field name is valid.
[21:46:06] <dscastro> GothAlice: does it started on 2.6x ?
[21:46:36] <dscastro> i just upgraded my database from 2.4 to 2.6
[21:46:38] <GothAlice> For sanity, I limit field names to the regex: [a-z_]+
[21:47:36] <GothAlice> dscastro: No, likely what you were doing wasn't expressly checked or forbidden before, but was still documented as being a no-no. (I.e. it let you get away with it, but you're still a bad person for trying to get away with it. ;)
[21:48:36] <GothAlice> dscastro: Someone using MongoDB should follow their language's variable naming convention. For most languages, this means the regex: [a-zA-Z_][a-zA-Z0-9_]*
[21:49:08] <GothAlice> With your field, you can't access it using attribute notation. (foo.haproxy-1.4/web_proxy would be interpreted as foo.haproxy minus 1.4 divided by the contents of the web_proxy variable.)
[21:51:44] <dscastro> GothAlice: its a moped object
[21:52:27] <dscastro> GothAlice: this is a rails model
[21:52:37] <GothAlice> Oh, ruby. That goes some distance in explaining the lack of convention. ;)
[21:53:17] <GothAlice> So, uh, the layer designed to abstract things and make your life easier is, in this case, making your life harder.
[21:54:18] <mikeputnam> hey! i too hate ruby and prefer python. :) small world
[21:55:13] <GothAlice> Little-appreciated fact: MongoDB, being schemaless, must store the _names_ of every field in each document that uses that field. Your key adds 20 bytes (beyond a single-letter key) per document that uses it. Something to consider. ;) (I use one- or two-character field names and let my ODM abstraction layer do its thing.)
[21:55:58] <GothAlice> dscastro: Moving forward, fixing your situation will require $rename'ing the fields, then trying to upgrade again.
[21:56:27] <GothAlice> dscastro: Don't forget to recreate any indexes as appropriate, too.
[21:56:34] <dscastro> GothAlice: i just trying to figure out why it was working before upgrade
[21:56:46] <GothAlice> dscastro: MongoDB was missing an assert() call.
[21:57:24] <GothAlice> dscastro: Are any of the freakishly-named fields indexed? If so, you're going to _have_ to reindex after re-naming in order to upgrade. Bogus field names would remain bogus, and current versions aren't missing the assert().
[22:17:17] <dscastro> GothAlice: what about this: /usr/bin/mongod: symbol lookup error: /usr/bin/mongod: undefined symbol: _ZN7pcrecpp2RE4InitEPKcPKNS_10RE_OptionsE
[22:18:26] <GothAlice> dscastro: Hmm, the symbol name is gibberish to me, but that typically means you have MongoDB compiled for a library not present on your system. I.e. you installed the Ubuntu version on Debian, etc. I compile MongoDB directly for my cluster, so that's rarely an issue for me.
[22:20:33] <GothAlice> (Also because I want SSL support, and am too cheap to buy MongoDB Enterprise. ;)
[22:21:13] <mocx> "MongoDB supports no more than 100 levels of nesting for BSON documents."
[22:21:58] <GothAlice> mocx: Indeed. As with the 16MB document size limit, if you're hitting it, you're probably doing something wrong in the design of your schema.
[22:22:30] <mocx> 100 levels as in field: { level2: { level3: { name: "Steve" } } }
[22:22:37] <GothAlice> mocx: Correct. See also: http://www.javaworld.com/article/2088406/enterprise-java/how-to-screw-up-your-mongodb-schema-design.html
[22:22:47] <mocx> how many documents with in array?
[22:22:57] <GothAlice> An array is considered one level.
[22:24:14] <GothAlice> To ballpark BSON storage sizes, reference: http://bsonspec.org/
[22:31:12] <bendyeraus> hello, I’m optimising my queries, and was wondering to achieve a covered query why do I need to supress _id? sure the _id is in the index? how else is the index linked to the documents in the heap?
[22:33:09] <GothAlice> bendyeraus: Specifically, you need all of the returned fields to be stored in the single index that gets used for the query. Unless you're including _id in your "covered" index, you'll need to exclude the field during projection.
[22:34:16] <ParkerJohnston> i have a question about mongo cursor
[22:34:34] <bendyeraus> Hi GothAlice, thanks, my question was more to do with if _id is not automatically in the index, how is the index linked to the documents in the heap?
[22:34:38] <GothAlice> ParkerJohnston: Ask, don't ask to ask. :)
[22:35:09] <ParkerJohnston> public static MongoCursor<Document> getAllActiveHistoryByCaseNumber(String caseNumber) {
[22:35:30] <GothAlice> bendyeraus: Behind-the-scenes is a complex BTree bucketing process. Yes, there is some form of association between an individual index value and the record it came from, but it might not be the ObjectId. (It might be the stripe ID and index into the on-disk stripe of the record… might not be.)
[22:35:31] <ParkerJohnston> this is returning all embed docs in a single item not in a hasNext format so i cant iterate over them
[22:37:19] <ParkerJohnston> just no idea why it is grouping the embedded doc
[22:37:33] <ParkerJohnston> it shows there are two documents but only counts it as one
[22:38:53] <cheeser> "embedded docs" are largely a conceptual thing for humans.
[22:39:24] <cheeser> a collection has documents and when you fetch that document you get everything inside it regardless of whether it could be considered an embedded document or not.
[22:39:26] <ParkerJohnston> but they are stored as an array why cant i get them back out the same way
[22:40:04] <cheeser> can you pastebin what you're betting back and how it doesn't match expectations?
[22:43:55] <bendyeraus> While I’m here one other question I was thinking about, for optimal count queires, should I be doing a find query, projecting the fields I know will cause a covered query and then calling count, ie. find({x:123}, {x:1,_id:0}).count(). or is count smart enough to only access the index ie. is count({x:123}) enough?
[22:49:32] <cheeser> i dunno. why shouldn't it be an array?
[22:49:49] <ParkerJohnston> oh misread your statement
[22:50:40] <ParkerJohnston> the issue is that when i get the results and try to use history.next() to print out the results it does not print ot two lines it is one large cluster
[22:51:40] <cheeser> what two lines? you have one document there.
[22:51:50] <cheeser> are you trying to iterate the history?