[00:15:27] <MacWinner> GothAlice, do you know if wiredtiger will have any benefits for gridfs collections? (assuming the files being stored are not zipped or encrypted)
[00:15:46] <MacWinner> or more sepcifically, will wiredtiger treat gridfs collections the same as any other
[00:16:30] <GothAlice> MacWinner: Well, yes. Many file formats compress exceptionally well, for example, most PDFs and Microsoft Office documents. (Microsoft chose to go with UTF-16 encoding, thus for English text every other byte is wasted. ;)
[00:16:34] <MacWinner> if for instance I have 1000 files stored which are 1MB each, and they have exactly the same content and MD5 hash
[00:16:48] <GothAlice> I don't see why it'd make a distinction. GridFS is just a protocol on top of MongoDB, at the application layer.
[00:16:57] <GothAlice> Well, no, it can only optimize so far.
[00:17:01] <GothAlice> It's not document-aware compression.
[00:17:39] <MacWinner> so it's just compressing each document data individually?
[00:18:13] <GothAlice> Very likely, and that'd mean a new compression dictionary for each document. Either that or it's done at the block level, in which case there can be savings within a single stripe…
[00:18:22] <MacWinner> no gains to be made if the data is repeated across files? I vaguely recall that wiredtiger works on repeating data very well
[00:18:47] <MacWinner> wasn't sure if it's the repeating part that makes it work well.. or just the fact that log data compresses well individually
[00:18:53] <GothAlice> They may have a different strategy to store the compression dictionary, if so, it can benefit from dataset-wide deduplication.
[00:19:18] <GothAlice> Log data is typically text-based, thus very low information density.
[00:20:40] <GothAlice> (And numerical logged data typically has large numbers of empty high bits.)
[00:21:18] <MacWinner> trying to find more info on the specifics to understand better
[00:21:29] <GothAlice> When in doubt: source code.
[00:22:21] <MacWinner> i'm hoping the wiredtiger storage engine config may give some clues first
[00:23:11] <MacWinner> "MongoDB supports compression for all collections and indexes using both block and prefix compression"
[01:57:20] <matthavard> I don't know if this is the right question to ask, but how much disk space would a 10 character string take up in mongo? Would it just be 10 bytes like if I wrote 10 characters in a file?
[01:58:06] <matthavard> Thanks GothAlice checking that out
[01:59:38] <GothAlice> At a minimum: identifying mark (1 byte), single character field name (letter + null byte), 4-byte length, 10 bytes of text, terminating null, terminating null = 19 bytes. Doesn't cover the surrounding document.
[02:03:22] <matthavard> an opening brace takes up more space than a closing brace?
[02:03:42] <GothAlice> It's not literally { turning into 4 bytes. Read that instead as "the start of a document".
[02:03:52] <matthavard> oh ok yeah that makes sense
[02:04:16] <GothAlice> (Because BSON is safe for use as a network protocol, certain variable-length structures must be prefixed by an int32 that says "there are this many bytes after this length marker".)
[02:04:45] <matthavard> oh wow that's neat. and here I am using it for some dumb crud app
[02:04:46] <GothAlice> Classically referred to as "Pascal strings".
[02:05:26] <GothAlice> But… to be compatible with C and most other languages, strings in BSON are *also* "C strings", i.e. terminated with a null character.
[02:06:33] <matthavard> But you're saying the int32 header contains the number of bytes in the rest of the document?
[02:07:00] <matthavard> Anyway, so if I added more characters to that string, am I just adding onto the 18 or is there some more fancy math?
[02:07:22] <GothAlice> That way your network code can simply: length = (int)socket.read(4) — get the length, then document = socket.read(length) (in p-code)
[02:07:36] <GothAlice> You're just adding more bytes to that initial 18.
[02:08:20] <matthavard> Ok cool. So if a document consisted of the _id field and a key with a 100,000 character string as a value, I could just think of that as taking up 100kb on disk
[02:12:23] <GothAlice> [4 byte length] [field type byte] ["_id" + null] [12 byte ObjectId] [field type byte] ["a" + null] [4 byte length] [10KB string + null] [null] — I believe that'd be the tokenized form of the resulting BSON document; might be missing something somewhere, though. ;)
[02:18:39] <GothAlice> matthavard: Clearly, it's getting late. You'll also need to take into account automatic snapping of document storage space to powers of two. I.e. a 900 byte BSON document is allocated 1024 bytes on disk. This has an impact if you're storing data near the edge of one of those powers. (I.e. 10KB would get 16KiB of allocated space—attempting to store 16KiB of text would result in a 32KiB document on-disk!)
[02:45:06] <hemmi> Hey there. I am interested in storing UUIDs in mongo and associating browser fingerprints between them into parent UUIDs. Could I use something like the built-in mapreduce to do this any easier?
[03:21:04] <hemmi> Hey there. I am interested in storing UUIDs in mongo and associating browser fingerprints between them into parent UUIDs. Could I use something like the built-in mapreduce to do this any easier?
[05:01:05] <hemmi> Hey there. I am interested in storing UUIDs in mongo and associating browser fingerprints between them into parent UUIDs. Could I use something like the built-in mapreduce to do this any easier?
[05:09:24] <Boomtime> hemmi: are you describing a join? mongodb does not have joins, you will either need to embed relevant sub-documents/arrays or use two queries
[05:18:54] <hemmi> Boomtime: I'm referring to having a list of new UUIDs and fingerprints that'd I'd like associate with UUIDs with matching fingerprints in mongo. Sorry for the wording I'm a little unsure how to say what I want.
[08:37:26] <eren> hello, I'm looking for HA mongodb installation. I've digged throuhg official documentation but it's still vague how to setup HA.
[08:38:22] <Gevox> Hi, i'm trying to retreive a certain field from a document. I have found that i can do using .find
[08:38:27] <Gevox> here is the line i'm using User.userDBCollection.find(query, new BasicDBObject("password", true).append("_id", false))
[08:38:34] <eren> the application only accepts one IP address to connect to. So, if this address fails, how will it failover to another node? I'm thinking about keepalived or is there any mongo way to achieve this failover?
[08:38:43] <Gevox> however, this returns DB cursor instead of just a field.
[08:39:23] <Gevox> eren: I don't know about HA, but are you trying to achieve fault tolerance?
[08:40:44] <KekSi> you could use keepalived or haproxy to a bunch of mongos instances
[08:41:48] <KekSi> or just have a local mongos on all the boxes you're running your app on so its unlikely to die unless both die, or tell your app about multiple mongos instances, or ...
[10:07:00] <pamp> I have string type fields with more than 255 characters. and I need to create an index on them. When creating the index received a "key is too large to index". How can I do this. I really need this field and indexed.
[10:48:15] <leporello> How to create array fields in mongoose?
[10:48:24] <leporello> I've described them as [String
[10:49:17] <leporello> ], but when i use Account.create({somefield: ['a', 'b', 'c']}) it doesn't save arrays
[10:54:34] <Diplomat> Hey! I have a strange issue.. I made a readwrite user for my database, but for some reason I'm not able to auth my user.. is there some kind of FLUSH PRIVILEGES; alternative for mongodb? or I'm doing something wrong?
[10:54:46] <Diplomat> Fatal error: Uncaught exception 'MongoConnectionException' with message 'Failed to connect to: localhost:27017: Authentication failed on database 'articles' with username '{USERNAME WAS HERE}': auth failed'
[11:17:03] <spuz> What is the difference between Bulk.find.replaceOne() and Bulk.find.updateOne()?
[11:17:29] <spuz> The documentation is not clear: http://docs.mongodb.org/manual/reference/method/Bulk.find.updateOne/
[11:19:28] <leporello> spuz, seems like replace creates document with new id
[11:22:11] <spuz> leporello: so if I always have an entire document to update, I could use replaceOne and if I only had certain fields to update I could use updateOne? I guess that makes sense.
[11:23:06] <leporello> spuz, yes, seems so. With replace you'll have only new fields and new id. With update id and all other fields left same
[11:45:47] <snowcode> hi, is it possible to perform a query to get all objects within a range of hours? I've two properties startHourMinutes and endHourMinutes in my schema, and I would to select all objects where the current time is in this range (currentTime > startHourMinutes && currentTime < endHourMinutes.)
[11:49:02] <wawrek> I am using map-reduce and I could not run indexOf on a list of ObjectIds. within the map function I wanted to check if an item is in a list of ObjectIds. The proper way is to use javascirpt's indexOf, but it does not work. did someone run into this problem?
[12:33:06] <pamp> its possible querying fileds by the lenght of the string?
[12:33:55] <pamp> something like ths db.foo.find({"field.length:{gt:1024}"})
[12:35:32] <joannac> snowcode: exactly the way you would expect? db.collection.find({startHourMinutes: {$gt: DateHere}, endHourMinutes: {$lt: DateHere}})
[12:35:53] <krisfremen> pamp: you can do something like find({$where:"this.field.length > 1024"})
[12:35:57] <joannac> pamp: nope. have a naother field and store the legnth of the string yourself
[12:38:15] <snowcode> joannac: seems to be a date comparisor...or not? i would to compare only the time component of the date, not the date itself
[12:40:12] <joannac> snowcode: I don't understand what you're asking
[12:41:21] <snowcode> I'm not interested in compare datetime, only time. So I should save "15:00" or the entiredatetime, but in both cases my comparison is only for time "15:00" > "12:00" && "15:00" < "16:00"
[13:39:44] <fscd> i'd expect that findOne would find the object (it's the same but the array elements have been reversed and the keys inside those elements are reversed too)
[13:58:19] <fscd> using $all should match the order inside the array
[13:58:40] <fscd> > db.order_test.findOne({"a": {$all:[{"x":"3", "y":"4"}, {"y":"2", "x":"1"}]}}) --> this is null too
[13:59:07] <fscd> > db.order_test.findOne({"a": {$all:[{"x":"3", "y":"4"}, {"x":"1", "y":"2"}]}}) --> this gets it right, because x and y are on the same order
[14:00:14] <fewknow> I believe you can use a.x:3 and a.y:4 or a.x:1 and a.y:2
[14:00:52] <fewknow> in your current findOne you are requesting a specific order
[14:02:16] <fscd> how should i be requesting an object whose 'a' array has both{x:1, y:2} and {x:3, y:4} without specifying order on the array nor element keys?
[14:12:55] <fscd> fewknow: this works, is there any better way? > db.order_test.findOne( { $and : [ {a: {$elemMatch: {y:'4', x:'3'} } }, {a: {$elemMatch: {y:'2', x:'1'}}}] })
[14:21:54] <leporello> fscd, what about { $and : { 'array.x': '4', 'array.y': '3'}, {'array.x': 1, 'array.y': '2'} } ?
[14:22:07] <leporello> Just reading about something similar
[14:26:00] <fscd> dot notation might be better in one-off queries, but i'm generating these queries
[14:26:21] <fscd> i want to match an object with some other identical objects (except order of elements in arrays and order of keys in those elements)
[14:26:38] <fscd> so i think elemMatch is better for generating each query
[14:28:34] <matthavard> Hi. How do I translate this pymongo client connection to the terminal command to start mongo with the same inputs as the pymongo call? https://pastebin.mozilla.org/8830840
[15:23:15] <fscd> is there a better way than generating { $and: [ {$elemMatch: ... }, ...]} of checking if a document exists without specifying order on array elements or array element keys?
[17:34:58] <DragonPunch> say i have an array, how do i find all the items for each element in my array
[17:42:13] <pagios> hi, how can i flush a collection?
[18:13:51] <GothAlice> find().update_one() doesn't make much in the way of sense. :/
[18:14:16] <GothAlice> That translates to "update the first record in the collection".
[18:15:09] <GothAlice> I also assume your Ruby ODM/DAO translates the argument to update_one into $set operations?
[18:16:51] <GothAlice> pagios: Yeah, okay, you're not doing a real update there. The data you're passing in is all wrong. When issuing updates, you need to tell MongoDB explicitly what you want it to do. I.e. {$set: {field: value}}, simply having {field: value, …} won't cut it.
[18:17:08] <GothAlice> For examples, reference: http://docs.mongodb.org/ecosystem/tutorial/ruby-driver-tutorial/#updating
[18:21:47] <GothAlice> That's getting there. :) That should now correctly set those values, or create a new record containing them if no other records can be found. Note that you're still attempting to update every record in that collection—your find() isn't limiting to any unique value, such as "arn".
[18:22:16] <pagios> thats mainly to create a new field
[18:22:32] <pagios> and if it is already there, to update it
[18:22:38] <GothAlice> Well, again, you're selecting _all records_ (find()), but only updating the _first_ (update_one()) — this does not make sense.
[18:24:33] <GothAlice> Then the query you are building there makes no sense: you are attempting to update one out of all users, without specifying which user…
[18:25:03] <pagios> GothAlice: i am specifying the user wit hthat arn no?
[18:25:15] <GothAlice> Not at all: that arn is in the update portion, not the query portion.
[18:26:24] <GothAlice> What this would do is find the user matching that :arn, and set the :token and :subscribedTo values. If a user doesn't exist matching that :arn, it'll create a new one, set the :arn, then perform the update on that.
[18:26:26] <pagios> so if nothing is found upset will take the stage
[18:27:27] <GothAlice> See also http://docs.mongodb.org/manual/reference/operator/update/setOnInsert/ if there are any values you want to set on create, but not on update.
[18:28:49] <GothAlice> pagios: You might also want to add "upsert" to your system's dictionary. That auto-correct to "upset" is… mildly amusing. ;)
[18:36:32] <GothAlice> Don't keys in Ruby require a : in front of them? (One is missing on $set. Additionally, $ may be special, requiring that key to be quoted.)
[18:36:54] <GothAlice> Likely a problem on the upsert bit, too.
[18:38:11] <GothAlice> Actually, looking at the examples http://docs.mongodb.org/ecosystem/tutorial/ruby-driver-tutorial/#updating — there are a number of minor syntax changes you need to make.
[18:45:22] <pagios> result = client[:'userTable'].find(:arn => @subscriberARN).update_one( '$set' => {:token => subscriberToken , :subscribedTo => @topicTypes } , { :upsert => true } ) GothAlice still not there, thats weird
[18:59:50] <GothAlice> Alas, I'd first have to learn a new programming language. ;) The syntax still isn't balanced; I just have no idea what the appropriate syntax is for an update_one in Ruby that also has options. No options? No surrounding braces. Options? Maybe surrounding braces?
[19:55:36] <GothAlice> cheeser: Well, there are some notable examples of that not being the case. XD EVE Online's "boot.ini" problem was an interesting one. I think there was a linux package somewhere that had an extra space in the installation script turning rm -rf /usr/share/<package> into rm -rf /usr /share/<package>…
[19:56:27] <GothAlice> Two cases where waiting a bit for some extra perfection would have saved tens of thousands of computers from destruction. XD
[20:02:40] <deathanchor> ah I can use unwind for my purpose
[21:33:37] <LurkAshFlake> I get this php error: Fatal error: Class 'MongoClient' not found when i wrote $mongoClient = new MongoClient(); how can i check if i install correctly (i never worked with database before)
[21:37:34] <LurkAshFlake> i wrote extension=mongo.so
[21:44:55] <saml> after changing storage engine to wiredtiger, mongo connector fails to send a doc
[23:09:16] <derbie> hi! var userSchema = mongoose.Schema({ local : { email : String, password : String, }, details : { ssn : String, firstName : String, lastName : String, }}); What's the most efficient way to get a list of unique lastName s from the collection?
[23:10:14] <derbie> i basically need to loop through all the documents don't i ?
[23:14:34] <joannac> the shell has a distinct() method
[23:14:39] <joannac> not sure about mongoose though
[23:17:21] <derbie> The information i'm trying to get is something similar to "find a single most recent entry of all unique lastNames in the collection"
[23:18:10] <derbie> I'm wondering if because of the schema this will be inefficient
[23:18:35] <derbie> And maybe i should have another collection with unique lastnames
[23:20:00] <derbie> "find a single most recent entry of each unique lastNames in the collection"