[03:55:27] <supershabam> I started an experiment to tail the oplog and write stat summaries automatically over 5m windows. Has anybody else worked with storing metric datapoints in mongodb?
[04:39:26] <Nick______> #mongodb Not able to load the extension on php 5.6
[04:48:18] <quadmasta> I've got a MEAN app that runs on distributed hardware. The nodes try to connect to the master and failing that they connect to their own instance. I need to, on occasion, grab the log collection from the nodes and push them into the master database. I don't know how to describe that simply enough to become a search query
[04:48:47] <quadmasta> could someone suggest some terms to use to search?
[07:02:02] <morenoh149> how can I tell from explain output whether the whole collection was scanned or not?
[07:03:02] <joannac> do a count on the collection and see how many docs are in it?
[07:06:24] <morenoh149> I'm just going to assume that because it says b-tree cursor it didn't do a full scan
[07:08:03] <therrell> Hi everyone, I have a mongo cluster and I write to it. I then immediately try and read that document I just wrote and mongo can’t find it. Is this a common issue with default write concern behavior on mongo clusters?
[07:08:22] <therrell> Here’s a gist of the read & write code using a java client: https://gist.github.com/anonymous/3768f4f276e0a5e916a0
[07:08:59] <therrell> Its worth noting this is an intermittent issue where the read operations return null
[07:09:32] <therrell> so I’m thinking my write concern isn’t set correctly for my use case of write than immediately read
[07:10:06] <therrell> Does anyone here feel they might be able to shed some light on this issue?
[07:28:19] <bo_ptz> [Please help ] mongo>>> db.objects.find({siteId: "boris"}) return infinite character '\'
[07:45:21] <Pharmboy> sorry for posting javascript question in mongo forum, you are absolutely right it is not mongo
[07:45:46] <Pharmboy> but now I am on the right track
[07:51:26] <therrell> In case anyone paid attention to my question I figured it out. I’m using 2.2 java client which as a default write concern of normal (now known as unacknowledged) that’s fire and forget and provides no guarantee that the write operation completed before mongo responds successfully.
[07:52:37] <therrell> So my issue can be fixed by upgrading to the 2.10 client which has the default write concern of acknowledged or I can configure my 2.2 client to use the non-default write concern of my choosing
[07:54:08] <Boomtime> bo_ptz: your problem is most likely that there is invalid UTF-8 in a document field and the shell is failing to render it
[07:54:55] <Boomtime> bo_ptz: please understand the server does not care, it can handle whatever crazy string you push at it even if it's invalid, the only rule it has is that the first zero-byte marks the end of the string
[07:55:22] <Boomtime> bo_ptz: what other language to do you use? (php, java, c#, etc)
[07:56:30] <Boomtime> ok, where are you sourcing the data?
[07:56:44] <Boomtime> the other possibility is you have found a bug in the shell
[07:57:45] <Boomtime> anyway, if you want to find the document that is causing this, you can grab each one of the docs in that list one at a time by _id and see if the shell can print it
[08:03:28] <bo_ptz> but why I haven't problem when do db.objects.find().limit(1)
[08:04:33] <flusive> Hi, I have problem with disk space on one shard (is 98% used), I have also another shards (shard1: 50% disk space free, shard2: 20% disk space free). What can I do to increase free space on shard3 or how can I force mongo to don't used more space on shard3?
[08:05:19] <Boomtime> bo_ptz: limit(1) means you only see one document in the shell
[08:05:31] <Boomtime> that particular document apparently isn't the problem document
[08:05:52] <Boomtime> and that prints only 10 documents
[08:06:47] <Boomtime> do you understand that the problem is just displaying the result in the shell?
[08:06:50] <bo_ptz> Boomtime, You right, when I print it in console it failed
[08:07:39] <Boomtime> you may find one document, or you may find several are a problem, you will need to play around a bit now to find out why the problem documents can't be printed in the shell
[08:07:53] <Boomtime> in my experieince, this has *always* been because of invalid UTF-8
[08:08:13] <Boomtime> it is very common, because there are just tonnes of libraries that do not handle UTF-8 correctly
[08:08:25] <joannac> mongodump and then bsondump might help? or maybe mongoexport ?
[08:09:04] <bo_ptz> Boomtime, it's not only display problem when I do it in nodejs with native driver get data from mongo , it return object with \\\\\
[08:10:25] <Boomtime> yep, because the characters are invalid
[08:10:54] <Boomtime> that's pretty much proof that the data is not valid UTF-8, that node is trying to interpret it as such and the translation comes out as garbage
[08:11:52] <Pharmboy> are mongoose questions acceptable here if I get crickets in the mongoose channel?
[08:12:59] <teoo> hello.. could anyone please check this SO question, and maybe answer, if you have experience? http://stackoverflow.com/questions/28684462/what-is-faster-to-search-in-hashmap-object-or-array-in-mongodb
[08:18:20] <Pharmboy> I asked it awhile ago, it is in regards to $addtoset $each ... someone confirmed by syntax was correct, but when I pass it in node.mongoose, it seems there is a bracketing issue
[08:18:42] <Pharmboy> and node is trying to parse an element of my array as if it was a function
[08:33:33] <kakashi_> I got a question about that mongodb couldn't be opened...
[08:44:27] <kakashi_> log: http://pastebin.com/BbFjVVrM, after I saw this log, mongodb hangs...
[08:46:32] <flok420> hi i have a problem converting a mongo query into the PyMongo equivalent. on the forum nobody knows the answer but maybe someone in this channel knows it? the posting is at https://groups.google.com/forum/#!topic/mongodb-user/32m6T_et3MY
[09:17:39] <flok420> it looks like that it works with the match and the unwind. if i add the $redact statement then nothing is returned. I verified that max_age is 0. also with an explicit 0 (instead of that variable) I get nothing. if i replace "$ts" by "ts" then all data is returned as if the whole $gt is not executed
[09:25:08] <teoo> could anyone please check this SO question, and maybe answer, if you have experience? http://stackoverflow.com/questions/28684462/what-is-faster-to-search-in-hashmap-object-or-array-in-mongodb
[09:40:05] <flok420> joannac: if I remove the redact, then I get all data in the collection. so something is wrong with that
[10:03:09] <flusive> Please help me to solved problem with free space on shard. What happen when my shard will be full? Whole sharding will crash? What can I do to increase free space on shard3 or how can I force mongo to don't used more space on shard3? I need some solution to store around 25GB/week
[10:04:01] <flusive> maybe do you know some hosting for that? i don't have own infrastructure
[10:35:03] <styles> I'm trying to organize game cards (magic etc). I want a Card object to belong to a Set. I need to search on different factors. My idea was to add the Set to each card object.
[10:35:34] <styles> Or keeping cards seperate from sets and jsut linking the ID and requried info in each card into the card object itself
[10:44:07] <joannac> styles: heh, i tried that with my MtG cards
[10:44:29] <joannac> if you want to do things like "find all cards in this set", just embed the setname/setID in the document
[10:44:54] <styles> joannac, that's what I figured
[10:45:03] <styles> joannac, I was just going to duplicate the data, it's trivial anyway
[11:53:25] <andrewjones> Hi can anyone help with a bit of backgound on mongodb
[11:53:49] <andrewjones> Just want to check the function of the arbiter and replica set members
[12:10:26] <esko> i have a collection with title: and anchor: i would need to insert a incrementing id: to every (title: anchor:) in the collection, could someone help?
[12:40:37] <andrewjones> Hi can anyone help with Mongo MMS , using it on a standalone server no internet connection as the guides imply you can do this but we cant find out what the default login credentials would be on a standalone instance of mms
[12:42:46] <StephenLynx> I'm not familiar with MMS, but isn't it the same as a default install, with no auth?
[12:43:16] <andrewjones> no we have it setup but it asks for user login details and not sure how you set this up
[12:43:52] <andrewjones> Tried to contact mongo direct as purchasing support but cant get hold of them
[13:27:20] <jayjo> What's the best way to write json data to mongodb? Would this be a bulk write operation from my language of choice and associated driver, or can I do this through mongo shell?
[13:28:18] <hayer> I have a document which looks like this http://pastebin.com/hevptGEf .. how can I select all the elements in data which has a date between X and Y?
[13:29:41] <jayjo> cheeser: thanks, that looks like it works!
[13:35:23] <joannac> hayer: if you are ever in a situation where you think "I want a subset of this array" your array should be split into top-level documents
[13:38:56] <hayer> joannac: okey, so I should have a document called "data", where each item has a "item_id" that points to the ID of the document containing things like name and somevalue?
[13:39:31] <StephenLynx> yes. sometimes using a pseudo relation is better than having a sub-array.
[13:40:08] <hayer> StephenLynx: but how will this be if the data array is massive? Like million of rows.
[13:40:33] <StephenLynx> even better to have as a separate document, because BSON has a limit on the size of documents.
[13:41:10] <hayer> a separate document for each data entry? Aka each entry in the data-array should be one document?
[13:41:13] <StephenLynx> and since you can index, performance will not be much of an issue. just make sure you don't make somethat that will need multiple queries.
[13:41:27] <StephenLynx> yes, that is what we are suggesting in this case.
[13:41:50] <StephenLynx> let me link my model for one of my projects, hold on
[13:42:14] <StephenLynx> look at the posts collection.
[13:42:21] <hayer> Wouldn't that mean evetually I will have millions of documents? I've been told that document = table, but with no structure set in stone.
[13:42:41] <StephenLynx> a document is an entry in that collection.
[13:43:31] <StephenLynx> the posts collection used to be a sub array in the threads collection. but because I regularly had to query for just some posts in that thread and the 16mb limit, I chose to put them on a separate collection.
[13:44:19] <hayer> okey, at I've created one collection for each vechile. Each vechile has data. Each element of data should be a separate document inside the vechiles collection..?
[13:44:45] <StephenLynx> don't you mean it is one document for each vehicle?
[13:44:58] <StephenLynx> and all these documents in the same collection?
[13:45:20] <StephenLynx> anb no, these data documents should be stored in a separate collection.
[13:45:25] <hayer> typo, yes. One document for each vehicle.
[13:46:01] <hayer> Okey, so one vehicle_info collection and one vehicle_data collection?
[13:46:35] <hayer> I'm starting to see how much I've misused this so far.
[13:46:52] <StephenLynx> now you either use _id or create your own unique field for each vehicle and use it to track to which vehicle each data document belongs to.
[13:47:01] <StephenLynx> since you don't have actual foreign keys.
[13:47:27] <StephenLynx> you do have field references, but in the driver they just perform multiple queries behind the scenes.
[13:48:02] <hayer> So I can then ask the vehicle_data collection to return all documents where vehicle_id is X and date is between Y and Z! This makes so much more sense.
[13:48:32] <jayjo> Is there a reason when I imported .5gb file with mongoinsert my db is now 1gb? is that standard?
[13:50:14] <StephenLynx> jayjo keep in mind mongo prealocates disk space.
[13:50:39] <jayjo> duh, you're right that has to be it
[13:51:58] <cheeser> most databases preallocate "extents" so that information is stored close together.
[14:04:26] <jayjo> just fyi - I just wrote the next file and it is almost exactly 2x the import file size
[14:04:53] <jayjo> so now I'm at 7.95 GB, I wonder if that does have to do with space being allocated
[14:41:02] <flusive> Please help me to solved problem with free space on shard. What happen when my shard will be full? Whole sharding will crash? What can I do to increase free space on shard3 or how can I force mongo to don't used more space on shard3? I need some solution to store around 25GB/week
[14:48:48] <bros> I have a key called "store_id" in a lot of tables. I'm finding myself having to do an extra query every time I call store_id, to verify that the store_id belongs to account_id. Would it be smarter to add account_id as well to avoid the extra query?
[14:49:18] <StephenLynx> how many extra queries you have to perform by operation, bros?
[14:49:39] <bros> I want to make sure the user isn't trying to "spoof" a store ID and gain access to somebody else's data.
[14:50:00] <bros> Every time I do a query with a store, I precede it with a query for that users account and stores
[14:50:14] <StephenLynx> I would cache the user validation data.
[14:50:57] <bros> I considered doing that but couldn't come up with a reliable way for logged-in clients to get notification of the refresh while already logged in.
[14:59:00] <StephenLynx> yeah, I would have 3 collections then
[14:59:06] <StephenLynx> account, users and stores.
[14:59:30] <StephenLynx> the store have a field with the account it belongs to, so as users.
[15:00:13] <StephenLynx> so you can just cache the user validation data, including the account it belongs to. so you can easily check against the account the store belongs to when you retrieve it.
[15:05:08] <StephenLynx> and since you are using node/io, I assume, you should be checking for the error anyway
[15:05:54] <StephenLynx> but for that you will need a separate collection, I guess. I tried to make an index for documents in a sub array and failed. but I could be wrong.
[15:07:00] <bros> Can we get anybody else in the channel to confirm?
[15:12:23] <GothAlice> (Since "inserts" to sub-arrays are actually $push operations, they're updates. In your update, query for $ne or $not matching the potential duplicate value, and your record will either push the value (nModified=1) as there is no conflict, or conflict and not update the record. An index will speed up that check, potentially. The only way to tell if it actually helps is by telling MongoDB to explain the query with and without the index for
[15:14:45] <GothAlice> Still part of your collection.update({}, {$push}) operation. No you won't.
[15:15:12] <StephenLynx> if you have an unique index, then yes. but if you just write a query that will match 0 documents, then no.
[15:16:02] <StephenLynx> you can only get the updated document with an operation that I forgot it's name, but it will only return IF it actually updates the document.
[15:17:02] <GothAlice> findAndUpdate—many caveats to use.
[15:17:12] <GothAlice> (Returns document prior to modification, amongst other goodies. The manual page has more.)
[15:17:20] <StephenLynx> not to mention you will have to just assume the query conflicted with exiting values. it could be the query parameters were wrong in the first place.
[15:18:18] <StephenLynx> using separate collections with unique indexes are the safest approach.
[15:24:24] <styles> I'm trying to organize game cards (magic etc). I want a Card object to belong to a Set. I need to search on different factors. My idea was to add the Set to each card object.
[15:25:05] <styles> Or keeping cards seperate from sets and jsut linking the ID and requried info in each card into the card object itself
[15:25:25] <StephenLynx> will you have to query for just some cards within a set?
[15:27:19] <StephenLynx> when query for cards, will you have to query for the set of said card for additional data? if you do, will you have to perform just one additional query or multiple queries depending on the number of cards?
[15:30:52] <StephenLynx> depending on what you answer, you might be better with separate collections or subarrays.
[15:37:00] <NoOutlet> The best practice is to fit your data schemas to your application requirements. There isn't an inherent best way to store data without the application context.
[15:38:02] <bros> I simply want to be able to store/retrieve barcodes for items. Sometimes, items has variants. If that's the case, the barcodes belong to the variant, not the item. I'm not sure how to best represent that.
[15:38:41] <winem_> hi. I have some trouble understanding the rollback process of mongods in a replica set. is this right: the primary node writes the stuff into it's datafile and keeps the documents in the rollback file since it got an ack from the majority of nodes, e.g.? depending on the write concern of course
[15:39:15] <winem_> would just like to understand what happens in detail if the primary can't reach any of his slaves and a file system backup from the datadirectory is running...
[15:39:39] <jayjo> Is javascript the default language for working with mongodb?
[15:40:14] <StephenLynx> using js makes lots of sense.
[15:41:41] <jayjo> but is it true that no other language is 'enabled' working directly with it in the shell? all of the examples in the docs are using js functions
[15:42:14] <StephenLynx> it stands for javascript object notation
[15:42:31] <cheeser> jayjo: the shell is a js shell. so, no, writing c#, say, in the shell probably wont' ever happen
[15:42:38] <NoOutlet> Well, no, the shell understands javascript. Yes.
[15:42:53] <StephenLynx> so yeah, any other language pretty much has to mimic it. you can use json on the shell client because it just parses text.
[15:44:31] <jayjo> I like js- i'm just getting a feel for this. If i'm writing mapreduce or other aggregation tasks, this is _always_ done through the shell, right?
[15:44:51] <NoOutlet> winem_, the primary gets some writes but is (for some reason) unable to send those writes to the secondaries. One of the secondaries becomes primary and takes some new writes. When the initial primary comes back into the replica set, it will see that there are new writes which it doesn't have and that the replica set doesn't have the writes that it got before the disconnection. That's when a rollback occurs.
[15:45:23] <StephenLynx> I thought the shell was only used by the shell client.
[15:45:38] <StephenLynx> I may be wrong, but I believe the driver performs all operations.
[15:46:28] <winem_> ok. so the initial primary node writes it's stuff to the datafile before it gets the ack from the secondaries or does the primary just write it to the rollback file and moves the docs into the datafile when it receives the ack
[15:47:28] <NoOutlet> I don't know what you mean by the ack. The acknowledgement? or like the "ACK! I'm dying!"
[15:47:49] <winem_> the acknowledgement of the majority of secondaries, e.g.
[15:48:22] <NoOutlet> If the primary gets acknowledgment from the secondaries, then there won't be a rollback.
[15:48:45] <NoOutlet> A rollback occurs when a write has not been replicated properly.
[15:49:48] <winem_> ok. but are the documents written to the datafile or not?
[15:50:49] <winem_> if yes, it would mean that the primary writes to the datafile -> sends the documents to the secondaries and waits for their acknowledgement -> primary rolls back if it gets no acknowledgement and touchs the datafile again
[15:50:59] <winem_> or do I just think to complicate???
[15:51:21] <NoOutlet> They may or may not have been written to a datafile in the primary. But the writes will be rolled back (taken out of the datafile) and written to the rollback folder.
[15:51:56] <winem_> sure, I'll be still and wait to learn
[15:55:14] <jayjo> last time I'll ask about the js - this page from the pymongo documenation has the mapreduce functions put in as js. is this unique to the python driver?
[16:09:28] <NoOutlet> A network partition causes lost connections between the primary and the two secondaries. This happens immediately after a couple writes which have not been replicated to the secondaries.
[16:09:51] <NoOutlet> So now the primary has writes that the secondaries don't know about.
[16:10:32] <NoOutlet> And finding out that the secondaries aren't reachable causes an election on both sides of the partition.
[16:11:00] <NoOutlet> On the primary side, there is only one voter, so there is no majority and the primary steps down to a secondary, no longer able to take writes.
[16:11:16] <NoOutlet> On the secondaries side of the partition, there are two voters and one of them becomes a primary.
[16:12:14] <NoOutlet> If there are no writes to the new primary before the partition ends, then there will be no need for a rollback.
[16:14:26] <NoOutlet> When the partition ends, the datasets will be compared and if it's seen that the initial primary (the one in a data center by itself) has writes that the others don't, those writes will be replicated to the others then.
[16:15:24] <NoOutlet> However, if some writes are made _during_ the network partition, then when the partition ends the writes that the initial primary had will be rolled back.
[16:16:11] <winem_> ok. this was helpful for the understanding
[16:16:44] <winem_> but how does the rollback work? I saw that there is a folder called rollback in the data-directory
[16:16:47] <NoOutlet> It's irrelevant to the situation about whether it wrote to the datafile or the journal.
[16:17:24] <winem_> ok, I understand. So I just have to think a bit abstract and don't have to care if it's written to the data file
[16:18:09] <NoOutlet> Essentially, the data in the initial primary will be synced to the initial secondaries (one of which became a primary and took writes).
[16:18:59] <NoOutlet> So the initial primary will not have the writes that it took immediately before the partition, but it will have the writes that the other primary took during the partition.
[16:19:10] <NoOutlet> And it will create a rollback file to dump out the changes.
[16:19:31] <NoOutlet> This is so that if an administrator wants to apply those writes, they are available.
[16:19:51] <winem_> great, this is the point that confused me!
[16:20:45] <winem_> so the rollback file is written when the initial primary went back online again (as a primary e.g.) and rolls back some operations because the new primary did not receive them
[16:20:50] <NoOutlet> If you know much about revision control like git or subversion or anything like that, it's a similar problem.
[16:21:02] <winem_> and now, it's up to the admin to decide whether he would like to process the rollback file manually?
[16:21:24] <NoOutlet> Basically, when there are writes to two systems within a replica set, mongodb doesn't know how or if the writes can be merged.
[16:22:40] <winem_> great. so it was that "easy"... I could not explain when he writes the rollback file or if the documents are kept in the journal / data file or the rollback file at the same time, etc...
[16:22:50] <NoOutlet> I think it's not likely that the initial primary would become primary again unless it had some priority.
[16:23:24] <winem_> thank you very much. I will go in the background and play around on the dev environment to see if I got everything right now. this was very helpful - thanks for your time!
[16:23:53] <NoOutlet> Because when it comes online, if there is a rollback, that means that it needed to take writes from the new primary.
[16:34:07] <freeone3000> I have three servers pat of an 8-member replicaset who seem to be stuck in recovery. Their state is listed as "RECOVERING", but their optimeDate doesn't seem to increase. What could cause this?
[16:34:34] <winem_> NoOutlet: great, this was very helpful! :)
[16:49:01] <Siamaster> Hi, I'm trying to chose a orm for working with java
[16:49:21] <Siamaster> I looked for morphia, but there wasn't many tutorials
[16:49:48] <Siamaster> Does anyone use any that he can recommend?
[16:50:29] <Siamaster> are orms even good when working with mongo?
[16:52:35] <domo> @Siamaster it depends on your needs
[16:53:34] <domo> we use one we wrote based on mgo - https://github.com/maxwellhealth/bongo
[16:54:11] <domo> we also don't use it in certain places
[16:55:17] <StephenLynx> I wouldn't use anything besides the driver, Siamaster.
[16:56:24] <Siamaster> domo, I was counting on having to mix
[16:57:29] <StephenLynx> because they don't actually provide any functionality, they just add bloat to your project.
[16:58:09] <Siamaster> I read that morphia is gonna get merged into the driver
[16:58:11] <styles> StephenLynx, It's going to be multiple things. All cards based on a set. But also (here's the tricky part) a set can be "hidden" (not public). So I need to be able to query for all cards w/ no hidden set
[16:58:12] <StephenLynx> have been using mongo in two projects with nothing but the driver and having no issues at all.
[16:58:24] <styles> This makes me feel like I have to have the set in the collection of card
[16:59:01] <styles> Would I just... keep the set seperate still too?
[16:59:11] <styles> And anytime I update the set info (name) locate all cards and update those too?
[16:59:37] <domo> eh, for the amount of functionality we get out of our bongo service, the tradeoff is negligible
[17:00:08] <domo> I mean, writing an ODM in golang vs something like PHP definitely helps
[17:00:17] <ra21vi> I am trying to build a complex aggregation query, but failing. The required aggregation and JSON data structure is at - https://gist.github.com/anonymous/78028b8b42ce2044f64b
[17:01:34] <ra21vi> Right now I have to hit multiple queries and then apply logic at app side to get the result. Need help in building aggregation query for same.
[17:02:45] <ra21vi> In given gist (https://gist.github.com/anonymous/78028b8b42ce2044f64b), the record file describes how data is stored in mongodb. In query, the required aggregation query is explained
[17:13:46] <ra21vi> can anyone please help me with the aggregation.
[17:18:21] <Siamaster> Do I gain something by using long for ids instead of ObjectId?
[17:18:39] <Siamaster> It should take less space right?
[17:40:33] <GothAlice> Indeed, a 64-bit integer would take up 4 fewer bytes. What you lose if you do that is everything, however.
[17:41:41] <GothAlice> Siamaster: MongoDB has no concept of "auto increment", which means you need to jump through hoops and you encounter many more race conditions than before: http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
[17:42:51] <GothAlice> Notably, if you have multiple servers how do you synchronize ID creation between them to prevent duplication? ObjectId doesn't suffer this problem. (In fact, each client application connection can deal with ID creation in complete ignorant bliss of any other server or application creating IDs.)
[17:44:50] <StephenLynx> yeah, it checks for types.
[17:44:57] <GothAlice> If you accidentally store an _id that is the hex-encoded string representation of the ObjectId, suddenly creation-order sorting goes out the window, as does range querying them.
[17:45:51] <GothAlice> (Pro tip: ObjectId includes the creation time allowing you to $gt and $lt query them to find records in date ranges. Also, unless you need to extract date-like values from a creation time, you won't need to store an additional creation time.)
[17:46:16] <GothAlice> Rather, extract date-like values from a creation time within an aggregate query. (Things like $hour, $minute, $day, etc.)
[17:46:35] <StephenLynx> hm, I guess I need to study ObjectId more.
[17:57:04] <Siamaster> what if, you already have a unique String id entity then? Would you create an ObjectId anyway?
[17:57:21] <Siamaster> In my case, I have a FacebookUser which I get a String Id from
[17:57:46] <GothAlice> If you have external ID sources, then no worries using that unique value as the _id in your collection. Making them unique is Facebook's problem. ;)
[17:57:57] <GothAlice> Just be absolutely certain you are consistent within a collection.
[18:02:21] <jayjo> If I have a query running a text search on an 8gb database and it's ~10 minutes in, should I be mad? ;) I didn't test it on a subset first
[18:03:55] <GothAlice> I'm not entirely sure how MongoDB's FTI search operates, however it's common to boolean search first, that way only a subset of the documents are ranked. If MongoDB does do this initial boolean search for you, then either your query is matching a whole lot more than expected or something's up. If it doesn't, then you might want to do that initial filter yourself. ;)
[18:04:30] <GothAlice> jayjo: There is the potential issue that your index doesn't fit in RAM… if this is the case then you may be there until doomsday. ;^P
[18:36:01] <eaSy60> Hello. Is there a way to create a query on the last element of an array inside a document?
[18:38:17] <GothAlice> eaSy60: There are several methods to almost do what you want. $elemMatch will let you query the contents of an array, but that only works if you store in index with the records, as you demonstrate. You could store a copy of the "last" record pushed outside the array, then it becomes trivial to get back. You could also do an aggregate query, unwind the array, and pick the $last of each of the fields when re-grouping.
[18:38:42] <StephenLynx> you can slice on find. or you can unwind the array.
[18:38:52] <StephenLynx> you can't slice on aggregate tough
[18:39:31] <GothAlice> (Your $elemMatch example also has the race condition of needing to look up the size of the array first, to determine the index of the last element.)
[18:39:35] <eaSy60> Slide and conditions in the same query?
[18:40:09] <eaSy60> GothAlice: that was an exemple, I don't want to store an index or the last record :)
[18:41:11] <GothAlice> Storing the last record outside the array, and $set'ing it when you $push new values to the array, while a minor increase in processing (one extra operation) will allow you to easily query (and even index) the fields for that "last" record.
[18:41:40] <GothAlice> If I needed to query them, that's what I would have done for the forums I wrote. (As it stands, I just slice during projection since when I need the last reply to a thread I just need the last reply, no query.)
[18:43:11] <eaSy60> I need to get all the documents in a collection where document.anArray[last].status === 'online', I don't need to $push any thing.
[18:43:33] <eaSy60> I writing a .find query, not a .update query
[18:43:34] <GothAlice> I'm referring to how you update that data, not query it, when I refer to pushing. Are new values ever added to anArray?
[18:44:55] <eaSy60> yes I $push values in a different application
[18:45:04] <eaSy60> the .find query is for a cron job
[18:50:44] <GothAlice> This is an example of what I often refer to in MongoDB as non-duplication of data. It isn't technically a duplication of the data because it's pretty much required to query in the manner you wish.
[18:53:21] <GothAlice> Doing it the pure aggregate way would require a rather substantial $group projection that includes every field you care about, and the results come back in a manner different than standard queries. It would be more difficult to implement that way. Doable, but seriously, the $set when you $push makes everything extremely easy. You could limit the $set to a subset of the embedded document's fields if you really wanted, too.
[18:54:55] <GothAlice> (An aggregate would also require MongoDB to do a _lot_ more work.)
[18:55:07] <eaSy60> my array contains only : { status: String, date:Date.now }
[18:55:33] <GothAlice> You could $set lastStatus if that's the only value you care about on that last record. :)
[18:56:02] <GothAlice> This is also referred to as a form of pre-aggregation. (You're keeping what would be the result of an aggregate query up-to-date in the record itself during normal use.)
[18:57:20] <eaSy60> Maybe I should do one request with $elemMatch, and then postprocessing the result in order to filter the documents where the last element of the array match my condition
[18:57:41] <GothAlice> … again, that's even more work. And work that would require roundtrips from the database to your application.
[18:59:50] <GothAlice> Schemas aren't sacrosanct… especially in MongoDB which is effectively schema-less. Your model should model how you need to query your data, you shouldn't be falling back on application-side bulk processing basically ever.
[18:59:55] <eaSy60> okay, I'll do that with myArray[] + myArrayLast
[19:00:50] <GothAlice> (Bulk processing of multiple records application-side eliminates the point of even having a database. Might as well use on-disk JSON files. ;) (Ohgodspleasenobodydothis…)
[19:12:08] <hayer> I store a time field as .Net DateTime for when the data was created/inserted. I use this to select all data between date X and Y. Can I use the ObjectId.getTimestamp for the same?
[19:14:42] <GothAlice> hayer: Yes. Most ObjectId implementations have a factory method "from_datetime" or similar to allow you to construct an ObjectId for a particular moment in time, and you can then $gt/$lt range query _id.
[19:15:34] <GothAlice> It's very useful, and for most cases (where you aren't using $hour/$year/etc. projection against the creation time field in aggregate queries) eliminates the need for a discrete creation time field.
[19:17:39] <hayer> but is the time from when the item was inserted or when it was actually written to disk?
[19:18:17] <GothAlice> In most cases the client driver is what constructs the ID, so it would be the moment of the .insert() call, not the time it was committed to disk. (The ID needs to already exist at that point.)
[19:18:40] <GothAlice> As an example, using MongoEngine in Python: ohist = Invoice.objects(criteria).filter(id__gt=ObjectId.from_datetime(now-timedelta(days=30)), state__ne='voi').count()
[19:19:24] <GothAlice> (Count the Invoice documents matching mostly security-related search criteria, i.e. ownership, whose ID indicates the invoice was created within the last 30 days and state isn't void.
[19:40:14] <StephenLynx> the problem with these engines is the coupling.
[19:40:18] <StephenLynx> that is plain bad design.
[19:40:33] <StephenLynx> you don't have just a db, you have THEIR db.
[19:41:08] <Siamaster> not only that, it makes you too depeneded on GAE and you need to always think about the pricing when designing your db
[19:41:17] <Siamaster> you have to anyway, but datastore was just too extreme
[19:41:36] <StephenLynx> yeah ,I heard google boned people pretty hard on disk space.
[19:42:21] <Siamaster> you can't query on field you don't index in datastore
[19:42:42] <Siamaster> and every index will cost and allocate as much space as the row itself
[19:42:58] <Siamaster> so if you have a many to many relationship like , User, Store, UserVisitStore
[19:43:23] <Siamaster> and you want to be able to query stuff like which stores does a user visit or which users are visiting a store
[19:43:32] <freeone3000> I'm getting "RS101 reached beginning of local oplog" when adding a new member to a replicaset after running rs.reconfig() on primary.
[19:56:37] <fewknow> ra21vi: because there are faster engines out there that are easier to use. Really depends on the aggregation, but complicated ones should be done in other engines
[19:56:51] <ra21vi> fewknow: just for single query, I won't opt to handle two DBs. I just moved my DB to mongo. and trying to map queries in mongo as much as possible
[19:56:59] <freeone3000> fewknow: https://gist.github.com/freeone3000/6c705b8979b0df843e54 is the logs. Error is "RS101 reached beginning of local oplog", which is the last message from "heartbeat" in rs.status()
[19:57:22] <ra21vi> fewknow: can you suggest some engines, I will read about them
[19:57:31] <fewknow> ra21vi: if you abstract away the db with a "data access layer" you don't need to know what DB the data comes from
[19:57:44] <fewknow> I use mongodb, elastic and hadoop all in same infrastructure
[20:01:14] <fewknow> thought you were having a problem adding replica set member?
[20:01:49] <freeone3000> fewknow: Yes. It's in FATAL immediately after adding, with the complaint "RS101 reached beginning of local oplog" in rs.status().
[20:02:25] <fewknow> but there is nothing in the log files about it?
[20:03:18] <fewknow> can you past your rs.config() and rs.status()
[20:08:39] <freeone3000> fewknow: It's for a company whose customers, do, indeed, move that often. We also have some very strict time requirements.
[20:08:57] <freeone3000> fewknow: Executives, salespeople, government ministers, etc.
[20:09:21] <fewknow> you can use a caching layer or something to solve that issue...if you want to distribute the data and scale it you will need to shard
[20:09:30] <fewknow> you are limited on how many secondaries you can have
[20:09:39] <fewknow> also the more secondaies you have the more stress ont he primary
[20:16:29] <freeone3000> Changed it to https://gist.github.com/freeone3000/50e21076527bd1808be9 with unique priorities. Have an odd number, since id 3 is a non-voting member. Still have id 7 in FATAL.
[20:19:46] <fewknow> freeone3000: just changing the priority is not going to fix the issue. Once you have the config correct you will need to remove the node and add it back. I would tail the log file when you add it back to see what is going on....tail the primary and the node you are adding
[20:22:52] <freeone3000> fewknow: Adding the node back on primary claims that the node is down.
[20:23:41] <freeone3000> fewknow: And now it's back into FATAL.
[20:31:00] <freeone3000> Any other suggestions? "RS101 reached beginning of local oplog" seems to mean that it's ahead of primary - how can I force it to understand that it's not?
[20:32:48] <fewknow> you can restore a backup and then add the node in.
[20:33:04] <fewknow> i am not sure why it is in FATAL...seems like a config issue where it doesn't know that it is a secondary
[20:40:02] <freeone3000> fewknow: Can't recover from backup - next backup window isn't available yet, and this replset had ran out of disk at last backup.
[20:40:21] <freeone3000> (Trying to restore those backups got a secondary stuck in RECOVERY with a last optime date of last friday, with no updates.)
[20:48:15] <hayer> How do I specify sort order in c# when using BsonDocuments and the FindAsync?
[20:48:34] <hayer> I've looked at the FindOptions but can't get it set up properly.
[22:02:40] <jordanpg> what's the correct syntax for #command?
[22:20:25] <jordanpg> added question to SO: http://stackoverflow.com/questions/28707415/what-is-the-correct-syntax-for-the-dbcommand-method-that-is-replacing-the-conve
[23:59:59] <ra21vi> is mongo suggested as spatial geo db with 10% write and 90% read load scenario (geo queries mostly in radius search)...