[00:07:52] <simpleAJ> hi.. I was looking at db.createCollection API. It has size filed. Suppose my collection is not a capped collection and I didn't specify size field..how much does mongo db allocated a space for that collection?
[01:54:43] <jadeLA222> Question: doing a nested lookup for the first time in RoR/Mongoid, and not sure what I’m doing wrong. Anyone spot something wrong w/ this syntax?
[02:05:57] <sx> would there be much performance difference for storing 1000 properties in one document vs 1000 documents w 1 property each
[02:08:43] <Frozenlock> Any suggestion on how to get $near functionality for timestamps? My crude approach would probably be using $lte and $gte, sort them and take the closest to my timestamp.
[02:12:34] <joannac> sx: if the 1000 properties are logically connected, what would be the point of separating them?
[03:15:11] <in_deep_thought> can someone explain the objectID to me? I have objects in my database that are autogenerated like this: db.images.insert({ "source" : "www.easyvn.net--45-best-real-world-full-hd-wallpapers--007.jpg", "_id" : ObjectId("534de25fce66b4cc1ab8fbe5"), "score" : 0, "__v" : 0 } the ObjectID filed throws everything off because I can't save add it to other dbs because its not valid json. then when I manually strip off the object id I get error
[03:15:11] <in_deep_thought> when doing findbyID. what is the deal?
[03:17:06] <cheeser> the shell has an enhanced syntax that supports creating ObjectIDs, dates, etc. like that.
[03:17:15] <cheeser> and, no, it isn't valid json.
[03:17:33] <cheeser> you can convert it to a document like { $oid : "534de25fce66b4cc1ab8fbe5" } though
[03:20:29] <in_deep_thought> cheeser, so in the database itself, should the documents look like {"blah":5,"_id":ObjectID("jl3l4js234234jlk")} or should they look like {"blah":5,"_id":"jl3l4js234234jlk")}
[03:37:35] <in_deep_thought> why is it that when I type typeof ObjectId("507c7f79bcf86cd7994f6c0e").valueOf() into the try.mongodb.org it tells me its an object? I though the whole point was that valueOf() returns a string
[03:38:13] <in_deep_thought> ObjectId("507c7f79bcf86cd7994f6c0e").valueOf() seems to return itself
[04:09:25] <in_deep_thought> cheeser, Sorry my IRC quit. In any case, would a query of item._id return ObjectID("jl3l4js234234jlk") or "jl3l4js234234jlk" ?
[04:12:40] <cheeser> depends on the document, of course.
[04:12:59] <cheeser> but assuming the document uses an ObjectID for it's _id then you'd get an ObjectID
[04:13:32] <in_deep_thought> so with { "source" : "www.easyvn.net--45-best-real-world-full-hd-wallpapers--007.jpg", "_id" : ObjectId("534de25fce66b4cc1ab8fbe5"), "score" : 0, "__v" : 0 }, item._id would return ObjectId("534....")
[04:14:02] <in_deep_thought> and then I could do item._id.toValue to get the string itself?
[04:15:20] <cheeser> item._id.valueOf() will get you the string value
[04:16:22] <in_deep_thought> I have read that one of the whole points of MongoDB is that is stores everything in JSON format so that it is easily accesible. Is this not true when it comes to the autogenerated _id values
[04:16:56] <cheeser> no, it stores documents in BSON
[04:17:16] <cheeser> the shell has an extended syntax to make certain things nicer, e.g.
[04:18:39] <in_deep_thought> so everything is encoded in BSON, not just the _id field?
[07:58:24] <kali> in_deep_thought: please use a pastebin next time
[07:59:00] <kali> i don't understand your question.
[07:59:01] <in_deep_thought> kali, sorry I accidentally copied the paste text instead of the url
[08:00:21] <in_deep_thought> http://bpaste.net/show/N0XZSHsBq3j4WOTCZ1Ke/ this is my object. I want to return that long string. what is the way to do this? it looks like to me it might be Item._id.$oid but I am not sure if that is right? and then would I have to do .valueOf() on the result of that to get a string?
[08:02:07] <kali> in_deep_thought: what language are you using ? { $oid: } is just a json conversion trick, the _id is actually an ObjectId, so you should look for the interface of the ObjectId class in the language you're using
[08:03:36] <in_deep_thought> I am using javascript. It says that the convention is ObjectID.str to get the string out
[08:04:25] <in_deep_thought> ok, so the whole $oid thing is just a convention?
[08:05:07] <kali> yes. it's a way of encode bson document in json. json has less types than bson, so...
[08:07:39] <in_deep_thought> do I have to call .str on the other objects I want to get? like if I want to get the url after source in that paste, would I need item.source.str?
[08:08:43] <kali> i don't think so, but my knowledge of js has its limits
[08:17:49] <bushart> Hi everyone. I have 60 millions documents. And i try group their by index field.... But it is a veeeeery long time...
[08:18:04] <bushart> How can I speed up this process?
[08:19:41] <bushart> He generally uses the index to groups?
[08:58:41] <rbnb> Are writes to the oplog journaled before being made visible to secondaries?
[09:01:35] <kali> bushart: it's "kali", not "kail" :)
[09:02:30] <kali> bushart: i don't know, i haven't spend enough time on 2.6 to get a good feeling on what the optimizer knows to do yet. but explain() should help
[09:03:28] <kali> bushart: then depending on what you're doing in the match and group, a composite index might be the solution, but you'll have to show me what you're doing
[09:56:12] <Guest53043> while importing json document in mongodb i got this error
[09:56:13] <Guest53043> exception:BSON representation of supplied JSON array is too large: code FailedToParse: FailedToParse: Date expecting integer milliseconds:
[09:56:32] <Guest53043> can anyone help me out with a solution
[10:56:54] <tinix> kiiinda the same thing, almost. :P
[11:05:57] <tinix> anyone know if redis does that type of coercion? i'm already using it for something else... if i can go w/o having to setup solr that'd be nice.
[11:08:19] <tinix> i guess i'll use an update hook or something to keep the RDB up-to-date
[11:08:56] <Nodex> redis treats everyhing as binary afaik
[11:09:17] <rbnb> Are writes to the oplog journaled before being made visible to secondaries?
[11:22:21] <rbnb> I guess what I want to know is: does this image (reasonably) accurately portray MongoDB replication? http://daprlabs.com/blog/wp-content/uploads/2014/04/mongodb_repl.png << only one secondary is shown in that picture for simplicity
[11:36:02] <kali> rbnb: the "ok" is not necessarily send that late, it depends on the writeconcern
[11:36:28] <Derick> rbnb: not anymore, writes have changed in 2.6
[11:37:12] <Derick> the rest looks fine, and what kali says
[11:42:26] <rbnb> Thanks, guys. Assuming that I want a writeconcern which gives a high consistency guarantee, when would it send? Would it be after journaling or application? I would assume after journaling is sufficient
[11:45:44] <rbnb> Derick, can I confirm that the only change in the flow for 2.6 is that getLastError is effectively implied by the write operation itself?
[11:46:42] <rbnb> i.e, insert() then getLastError() in 2.4 is equivalent to insert() in 2.6
[11:47:30] <kali> rbnb: there is no general answer to that. it's a matter or infrastructure reliability and risk assessment. ack with w=majority has some benefits for instance.
[11:48:24] <rbnb> That's fair enough, kali. We will assume I always pass w=majority
[11:53:11] <rbnb> That seems to be the case from the release notes
[11:54:05] <kali> rbnb: my point is, with w=majority, you *don't* wait for journaling
[11:54:29] <rbnb> kali, you apply the write to the data before journaling it?
[11:56:17] <kali> rbnb: (fyi, i'm not a mongodb developper) my understanding is in "ack" mode, the write goes to the journal, but you don't wait for the journal to be commited to disk
[11:56:47] <kali> rbnb: and if i understand correctly, that's what http://docs.mongodb.org/manual/core/write-concern/ says
[11:57:38] <rbnb> w=majority -> "enough systems are aware of this write and therefore you may as well consider it successful, even though it hasn't actually been made durable"
[11:58:29] <rbnb> why only "majority ack" and no "majority journaled" write concern mode?
[11:58:44] <rbnb> Also... the diagram scares me when it shows "apply" before "journal"
[11:58:54] <rbnb> Imagine if disk filesystems did that....
[11:59:03] <kali> because it's faster, and the risk of having more than one replica failing is low
[12:00:32] <rbnb> This is how the GFC happened... a bunch of random events were modeled independently, but there was little thought put to the correlation between them... like if the DC has a power shortage just after acking :P
[12:01:02] <rbnb> strong tongue-in-cheek, but there's some truth to it... the option to have a more durable write concern seems useful
[12:01:15] <kali> rbnb: as i said, it's a risk vs performance compromise
[12:02:11] <kali> rbnb: some apps can tolerate higher latency, some other can tolerate some occasional miswrites
[12:02:48] <kali> rbnb: the good thing with mongodb is, you don't even have to choose at the application scope. you can pick request per request your durability policy
[12:02:58] <kali> rbnb: and that is f***ing great :)
[12:03:27] <rbnb> That's true, but for the apps which can tolerate higher latency, there is no "bet the business" option
[12:04:11] <rbnb> For the occasional "this write has to happen or we're f****d" operation, there doesn't seem to be a "truly durable" write concern level
[12:04:22] <rbnb> w:majorityJournaled would be good
[12:05:04] <rbnb> also, this image scares the shit out of me: http://docs.mongodb.org/manual/_images/crud-write-concern-journal.png
[12:14:38] <rbnb> The whole system would be a heck of a lot faster & also more reliable if it was based on a distributed consensus algorithm (paxos, raft) rather than log shipping
[12:20:02] <Derick> beginning of page level locking
[12:20:18] <rbnb> page level locking will help a lot
[12:21:08] <rbnb> but it still seems that ever write to a primary causes 4 things to be written on that primary rather than 2 (journal oplog, write oplog data, journal target, write target data)
[12:21:35] <Derick> those writes are into *memory* though
[12:21:58] <rbnb> Derick, which ones of those are in mem? Don't they all get persisted?
[12:22:43] <Derick> sure, they're backed by disk - they're memory mapped files
[12:22:55] <Derick> but the writes you speak of don't necessarily hit disk
[12:27:29] <rbnb> But they do all need to hit the disk for data to be durable (and they will all be in separate files, iirc, so four write ops) - or am I mistaken there?
[13:09:52] <sweb> is mongodb default accept all connection from out of machine ?
[14:20:05] <bushart> I'm trying to reproduce this example: http://docs.mongodb.org/manual/core/aggregation-pipeline/#pipeline-operators-and-indexes but explains doesn't show that index is used for $group stage. Does it mean that index isn't used, or maybe explain just doesn't show it to me?
[14:41:20] <Lujeni> and u have enought memory for your working set i guess
[14:44:01] <Jadenn> joannac: i didn't realize it yesterday, but yes the $and was supposed to be there, i was trying to replicate WHERE (friendUid = <id> AND confirmed = 1) OR uid = <id>
[14:44:21] <Jadenn> or maybe it wasm't, but yeah :P
[15:12:37] <Jadenn> luckily mongo takes mere seconds to change data so ill just go in and foreach everything int
[15:14:22] <Joeskyyy> Just playin' around with stuff here to preface, I know elasticsearch does more what I'm looking for, BUT. Anyone know if you can do a partial text search using the new fancy $text op?
[15:14:37] <Joeskyyy> i.e. "coffe" and "coffee" return similar documents
[15:14:49] <Nodex> I think $text has stemmming in it
[15:49:20] <Nodex> you said you wanted all of them
[15:49:36] <Nodex> [16:44:16] <dragoonis> I need rows that have both 2845 and 2846
[15:50:07] <Nodex> none of your rows have both answer_id's because they're not an array
[15:50:17] <dragoonis> Nodex, I believe I need an array then.
[15:50:56] <dragoonis> Nodex, so response_id, should have an array of answer_ids ?
[15:51:20] <Nodex> i'm very confused as to what you want now
[15:51:47] <Nodex> db.answer_to_response_map.find({answer_id: { $in: ["2845", "2846"] } }).count(); <--- that will get you "WHERE answer_id = 2845 OR answer_id=2846"
[15:53:20] <Nodex> if your answers and responses are always integers then you should cast them as int's - you will save some space and sorting will work better
[17:42:19] <boutell> in thanks for an answer earlier, here is node code to verify the server is at least mongo 2.6:
[19:48:41] <Dimrok> I'm having trouble with a request, so I would like to know if you'd mind giving me a hand? I want to atomically find or modify a document.
[19:49:54] <Dimrok> My case is: I want get a field from a document (User that I get from users.find_one({'email': email}). The problem, it's a new field I want to add if not present in the document.
[19:57:40] <Dimrok> @saml yeah but how do I distinguish users that already have the field (to use this one) ? Is your answer to do something like: users.update({"email": email, field: Null}, {"$set": {field: The_New_Value}}) every time?
[20:03:27] <Dimrok> I'm scared of: -a race condition / -a cpuvore function. I explain my self: I want to compute a special hash per user, based on there email and some other data if they don't have one. The problem if I use a python driver that will compute the hash before knowing if the {"$exists: False} is positively evaluated by mongo.
[20:04:27] <Dimrok> So I'll probably have to use a non atomic request, but's ok. :)
[20:05:56] <saml> Dimrok, not clear what you're trying to do
[20:07:35] <tscanausa> Dimrok, why dont you start off with what problem are you trying to solve and then we might be able to help?
[20:09:18] <Dimrok> Sorry: > Imagine that database: [ {"email": "foo@bar.io"}, {"email": "bar@foo.io", "hash": "4242} ]. I want a function that return the hash field of the user (compute it if not present). so : method("bar@foo.io"} will return directly "4242, method("foo@bar.io") will compute the hash, store it in the document and return it.
[20:11:49] <Dimrok> So what I really want is an atomic function that find_OR_modify my document (and lazily evaluate the update argument).
[20:12:42] <kali> Dimrok: well, you don't have much choice but writting exactly what your describe. your computation has to be application side, as mongodb has no provisions for doing that kind of stuff, so you'll need two request anyway
[20:13:44] <kali> Dimrok: as long as your hash is a function of other fields of the document, your compute-and-store branch will be idempotent, so the worst that can happen is having two client compute the value at the same time, and getting the same result
[20:15:13] <Dimrok> In fact, as long as my parameters are the same, the hash function should be idempotent so it should work.
[20:29:48] <aaronds> Hi I'm trying to get the id of a newly inserted object using the Java API. It looks like I should be able to use getUpsertedId(); on the returned WriteResult object but it seems to always return null. Running server 2.6 so this should work. Anyone got any ideas?
[20:32:14] <kali> aaronds: the usual approach is to set the _id yourself. just call new ObjectId(), that's exactly what the client does anyway
[20:35:40] <aaronds> Ah ok, use that on the insert so I've already got a reference to it kali?
[21:30:30] <saml> I have two databases: articles and images (not in the same database). can I do a query like: find all articles whose image is larger than certain size? article example: {img: 'foo/bar.jpg', _id: 'a/b.html'} image example: {_id: 'foo/bar.jpg', width:100, height:200}
[21:31:29] <saml> currently, do, articles.articles.find({img:{$exists:1}}) and for each doc, query images.images.find({width:{$gt:100}})
[21:37:40] <tscanausa> do your images have meta data about the article?
[21:44:05] <saml> tscanausa, what do you mean? images have image metadata. article read from it too
[21:44:14] <saml> to render image credit, alt text... etc
[21:44:29] <saml> i think i should repeat those into article
[21:44:45] <saml> but different team is managing images. image metadata.. etc
[21:45:16] <saml> so when they update image metadata, need to find all articles that use the image and update metadata on the article, too
[21:59:23] <tscanausa> so it your image collection has a field called article then it is really easy to do other wise you need to loop through all of your article collect your image ids and then go fetch the info from the images collection
[23:23:52] <LetterRip> I'm doing a $project and the field isn't appearing, here is my code and an example data and my output