[00:03:18] <joannac> wsmoak: if you have auth off and ports open, anyone can connect
[00:03:39] <wsmoak> okay, good to know. I will go learn what “localhost exception” actually means.
[00:03:53] <joannac> wsmoak: if you have auth on but no users, anyone can connect on localhost and have full access. Anyone connecting from outside will need to auth except they can't because no users
[00:04:03] <GothAlice> wsmoak: It means that after enabling authentication, if there are no users added yet you will be given the chance to connect and add one, but only from localhost.
[00:04:20] <GothAlice> (That's the intent of the feature. ;)
[00:06:45] <GothAlice> An important note: make your first user a full admin. If you don't, after adding the user you'll be required to authenticate… and won't be able to do anything. ;)
[00:11:05] <wsmoak> I need query help. From this: https://gist.github.com/wsmoak/679c25268d21fcb8a13a How do I get (for each) the *max* date out of that inside array of documents ?
[00:11:19] <wsmoak> or is the data model just done all wrong to begin with ?
[00:11:35] <GothAlice> wsmoak: An aggregate query could give you the answer you seek.
[00:13:41] <GothAlice> You are free to perform whatever query you need to before finally finding the max.
[00:14:45] <wsmoak> it starts out as coll.find(“type” => “breeding”).each { |x| … }
[00:15:01] <wsmoak> so I’m looking at one document at a time.
[00:15:13] <wsmoak> oh, do another query on that document ?
[00:15:45] <GothAlice> "how do I get the *max* date" was your question, no? The problem with the find().foreach approach is that you will be needlessly transferring data from the server to your application, the majority of which being simply thrown away.
[00:16:47] <wsmoak> the question was … How do I get (for each) the *max* date out of that inside array of documents ?
[00:16:56] <wsmoak> sorry I may not be asking the right questions just yet :)
[00:18:37] <GothAlice> wsmoak: You have fixated on one potential solution, the inferior of two solutions given the questions and examples you have provided. Why, exactly, do you need the results back for iteration using foreach? (Again, this will be hideously inefficient. Is this a homework question?)
[00:19:04] <GothAlice> wsmoak: Additionally, which language are you using? Is that Ruby?
[00:19:53] <wsmoak> yes that is ruby. no it’s not homework. More like My First Sinatra and mongoDB App.
[00:21:17] <GothAlice> wsmoak: Then I feel it's doubly important to learn to perform tasks in the "correct" (optimum) way instead of potentially learning bad habits. "How do I 'each' across this?" is a generic Ruby question (and one which I can't actually help with, #ruby may be more useful). "Getting the max" is a MongoDB question, and an aggregate query is the Right Way… even if you're doing other aggregation. That's why the aggregate framework exists.
[00:24:20] <GothAlice> db.coll.aggregate([{"$match": {"type": "breeding"}}, {"$project": {"date": 1}}, {"$group": {"_id": 1, "maximum": {"$max": "$date"}}}]) — this is the mongo shell / JS / Python way of doing it, including your initial query and an optimization limiting processing to just the "date" field.
[00:25:07] <wsmoak> I possibly don’t understand aggregate. (the docs I found started talking about map reduce and such.) But I don’t want the max across all the breedings in the database.
[00:25:47] <GothAlice> wsmoak: Then what is your query? I can certainly help formulate it (and the query will be useful in both find() and aggregate() operations; the syntax is the same).
[00:26:11] <wsmoak> I updated the gist so you can see where it gets used. https://gist.github.com/wsmoak/679c25268d21fcb8a13a
[00:26:25] <wsmoak> for each of the breedings, I need to know the last (max) exposure date
[00:28:24] <wsmoak> yes, they do get $push’ed in there. so go with that.
[00:28:56] <wsmoak> I will just have to be disciplined about data entry :)
[00:29:04] <giuseppes> ok, I'm trying to create a user but createUser throws
[00:29:11] <giuseppes> "talking to an old (pre v2.6) MongoDB server");
[00:29:56] <giuseppes> weird is that the first time it was working but I realized I had typed the wrong password so I interrupted
[00:30:56] <GothAlice> wsmoak: db.coll.find({"type": "breeding"}, {"exposures": {"$slice": -1}}) — you'll have to translate this into Ruby, though. When you .each across these results, the "exposures" array will only contain the last entry appended.
[00:31:20] <giuseppes> also mongo --version returns 2.6.5
[00:32:14] <GothAlice> wsmoak: I avoid needing funny projections by adding a "latest" date field to my top-level documents, then having the operation that $push'es a sub-document to the array also $set the top-level "latest" date. Then "latest" is always available.
[00:32:25] <GothAlice> giuseppes: What does mongod --version return on the server?
[00:32:59] <giuseppes> GothAlice, sec, the problem disappeared after two restarts
[00:34:20] <wsmoak> GothAlice: interesting. so duplicating data like that is not frowned upon here?
[00:34:36] <GothAlice> wsmoak: It's not a duplicate; not really. :)
[00:35:12] <wsmoak> well… it’s already stored inside the array, and it could get out of sync, so yeah, duplicate :)
[00:35:23] <xxtjaxx> Hi! I have a live-ticker-esque app where I want to show data to the customer that just came in. This should(tm) work with capped collection and a tailable cursor and sending off new items to the clients upon request. However I also want to store the data in an uncapped collection for further analysis functions. Is there a way to make this possible without inserting a document twice(in the capped and uncapped collection)
[00:36:15] <wsmoak> (and thank you for the query help GothAlice )
[00:36:43] <GothAlice> wsmoak: If you always $set that value when you $push the more complete record, it won't get out of sync.
[00:37:20] <GothAlice> wsmoak: And it's not really a duplicate. Each child record has its own date… but you have a *separate* value (that you want, likely frequently!) that represents the "latest" one. That's a different thing, no duplication.
[00:39:27] <wsmoak> GothAlice: thank you, very valuable. I did multi-value long ago, and then some SQL in between … it’s been fun letting the data model for this app emerge (even if I get stuck sometimes).
[00:40:19] <wsmoak> and *so* much easier to change when I realize I’ve left something out or done something entirely wrong!
[00:43:31] <probablyafish> Hello! Had a question about using data from mongodump that's located on an external harddrive. I don't have enough space for all of the data on my local hd – if I specify the --dbpath to the external hd when I run mongorestore and have set --dbpath to the external hd as well in the mongod process should I be good to go? Do I need to be worried about mongorestore copying data to my local harddrive?
[00:47:28] <GothAlice> xxtjaxx: No. You'll have to issue two inserts. I do this in my distributed RPC system, built on capped collections. It's doubly important for my data, since the data describing a "job" may be large, but the notifications about new jobs being added and old jobs completing is basically just {added: ObjectId(…)} or {finished: ObjectId(…), success=True}.
[00:47:47] <GothAlice> (The latter being the data sent to the capped collection.)
[00:49:35] <xxtjaxx> GothAlice: Thank you very much.
[00:50:07] <GothAlice> probablyafish: You should be good. Note that there are some other files MongoDB might want to create, for example log files, process ID (if running in the background), etc. You may wish to double-check the locations of these files. (If configured to append to the log, it could potentially grow without restriction.)
[00:51:58] <probablyafish> @GothAlice: Thanks! Also for the --filter option for mongorestore am I able to use query operators? eg something like --filter '{"field": { $regex: ... }'
[00:52:35] <nitestryker> i know this is sort of a php question but i can't get an answer from anyone over in the other chan. i was hoping someone could help me out I need to know what the proper way to create a new collection in php
[00:52:59] <GothAlice> nitestryker: Inserting your first record should automatically create the collection.
[00:53:42] <GothAlice> probablyafish: That's a good question. I'm not sure. The documentation doesn't state if that's a full query descriptor or just a simple == matching.
[00:58:15] <GothAlice> Why is that a problem? (Other than being hardcoded.)
[00:58:23] <nitestryker> i think it should be $collection = $database->userp_name
[00:59:00] <GothAlice> I always recommend using descriptive names, where possible. If this is a collection of data about users, call the collection "users".
[00:59:32] <GothAlice> (And never again be confused.) Note that your first example (using a dynamic collection name) while useful in some circumstances, is not useful here.
[01:02:29] <nitestryker> let me try that see if it fixed the problem
[01:02:36] <GothAlice> Pastebin your code, please.
[01:03:46] <GothAlice> (The reason I said no is that you either copied and pasted that from elsewhere, or typed it manually into IRC; it isn't your actual code. Pasting lines that aren't actually in your code leads to confusion.)
[01:06:49] <GothAlice> (In the grand scheme, too, you don't want to createCollection all the time, it's pointless since inserting a record will create the collection anyway. It *may* be worthwhile to check if it exists, and if it doesn't create it and add the indexes. But I'm seeing here the MongoDB class doesn't provide a function to check if a collection exists, other than getCollectionNames.)
[01:08:35] <GothAlice> *Anything* is better than PHP.
[01:09:27] <nitestryker> well.. perl or python could be used for this but its just kind of messy
[01:09:41] <GothAlice> https://gist.github.com/amcgregor/18e3180215736369b49e is an example of Python declaring a DB signal handler that updates timestamps automatically and a sample document schema that uses it. (MongoEngine package on top of Python).
[01:09:49] <LouisT> GothAlice: so PHP is better than PHP? interesting
[01:13:13] <nitestryker> unless you have your sever setup for PERL
[01:13:51] <nitestryker> and ASP is the worst language on the planet
[01:14:37] <GothAlice> LouisT: You remember register_globals? (I.e. all GET/POST/SESSION vars get injected into the global scope) It's still there in the handy little http://php.net/parse_str function. It takes an optional second argument to populate an array, cool. Still populates the current scope even if you supply it. They recommend passing it $_SERVER["QUERY_STRING"] — register globals. And, my favourite little PHP oddity, the second variable you pass
[01:49:56] <GothAlice> In my case the ipython shell. (A much improved Python REPL with full syntax highlighting and pretty-printing by default.)
[01:50:13] <wsmoak> I was going to ask about pretty-printing.
[01:50:41] <GothAlice> bpython is another, lighter-weight alternative that does the ncurses thing to provide a text-mode GUI, and for interactive debugging I use pudb (same ncurses UI).
[01:50:54] <GothAlice> Unless it's web, then WebError interactive HTML exceptions are nifty.
[01:51:00] <wsmoak> so, maybe an improved irb for ruby… will look.
[01:51:18] <GothAlice> Indeed; when you use a data access layer, it's a good idea to use it for administration, as well.
[01:51:44] <GothAlice> (A lot of my collections have events that are only triggered Python-side, like automatic updating of modification timestamps, that using another tool wouldn't replicate.)
[01:54:13] <wsmoak> what about naming… I (of course) have “mydb” right now. And I have a collection for this app… do I call the collection “nivens” (the app’s name) or “rabbits” (what’s in the collection) ?
[01:54:28] <GothAlice> Does your collection only manage one type of document?
[01:54:53] <wsmoak> no, it has rabbits, breedings, and litters
[01:55:13] <GothAlice> (It's unusual to only need one collection for an entire app. While possible, mixing multiple types of documents into one collection can be very confusing to develop/debug.
[01:55:50] <wsmoak> oh, I saw examples of documents that used {“type” => “type-of-thing”, … } and went with that
[01:56:37] <wsmoak> okay, so perhaps database=nivens and three collections
[02:02:23] <GothAlice> wsmoak: Most often having a "type" value is used when you want to have "inheritance" (the equivalent of SQL's concrete single-table inheritance, just way more efficient).
[02:09:44] <GothAlice> In MongoDB, you'd simply omit the fields that don't matter.
[02:10:33] <GothAlice> Using "type" in your own application to determine record type. Read the tree from bottom to top as: "Bowler is a type of cricketer is a type of player."
[02:13:33] <wsmoak> yeah… I was using it for entirely separate (but related) kinds of things, like players, balls, [some other sports-related thing]
[02:13:41] <GothAlice> (rabbits are not a type of breeding; litters aren't a type of rabbit ;)
[02:14:05] <wsmoak> so those go in different collections
[02:29:40] <wsmoak> GothAlice: thanks for all your help. those changes to the db made the code much cleaner.
[09:45:08] <dorongutman> I have a document with several dates in it - createdAt, updatedAt, startDate, endDate, approvedDate
[09:45:51] <dorongutman> I will need to create an index on a few of them - mainly startDate and endDate
[09:47:23] <dorongutman> since they are all dates, I thought about grouping them in a sub-json, like {_id: “abc”, “dates”: {startDate: …, endDate:…, createdAt:…, updatedAt: …, approvedDate: …}}
[09:47:53] <dorongutman> so from an application point of view, it’ll be a bit easier to manage them in my mapper
[09:48:14] <dorongutman> but I’m afraid it won’t be a good thing from a performance point of view
[09:48:28] <dorongutman> specifically because of the indexes
[09:48:58] <dorongutman> does the sub-document matter in terms of indexes ?
[10:51:03] <kali> dorongutman: it won't make much of a difference
[13:33:13] <wsmoak> now you can write up a tutorial :)
[13:34:49] <giuseppesolinas> wsmoak, I wish I had the time!
[13:37:30] <giuseppesolinas> now I'm having an issue tho, It looks like I'm not able to save a timestamp in mongo
[13:38:07] <wsmoak> giuseppesolinas: pffft. keep notes on a gist while you’re working. clean up formatting. done.
[13:38:44] <wsmoak> what language? in ruby it’s Date.new(year,month,day).to_time.utc
[13:40:42] <giuseppesolinas> wsmoak, node.js, I create an object with all my post's data then save the post with that as an argument, I usually add info on conditions, but if I create an object with timestamp as an attribute it doesn't get saved to the db
[13:41:40] <giuseppesolinas> If I take a look at the object before it gets saved I can see my timestamp, but looking at the object in the db all attributes are there except the timestamp
[13:42:15] <giuseppesolinas> wsmoak, I wish I could work with ruby, maybe I'll be able soon ;)
[13:44:05] <wsmoak> heh, and I’m trying to get to node.js next. just using sinatra to develop the data model and api because it’s simple.
[13:44:52] <giuseppesolinas> wsmoak, next project I'm hopefully using spree ;)
[13:45:04] <wsmoak> maybe your timestamp isn’t in the right (internal) format? that’s my guess.
[13:45:22] <wsmoak> I’d think there ought to be an error logged somewhere though…
[14:00:26] <giuseppesolinas> wsmoak, also something prevents me from using "createdAt" as an attribute name
[16:16:22] <the_drow> Is there a way to specify multiple db paths if I use the directory per database configuration?
[16:16:39] <the_drow> Someone on the mailing list suggested symlinking but I don't think that's a good idea
[20:13:16] <probablyafish> Hello! Was wondering what the easiest way to convert output from mongodump to valid JSON.. my mongodump .bson is ~95GB. I used bsondump but the file still has a bunch of ObjectId("...") and Date("...") things so I can't parse it. Then I thought I'd just use mongorestore and restore from the .bson file but it's building the index and it's been > 24 hrs. I want to extract some data from the .bson file
[20:46:30] <chx> hi. is there a way to say "update all fields except this one"?
[20:59:36] <giuseppesolinas> chx, just avoid including the field in your statement
[21:00:26] <giuseppesolinas> if your schema has properties a, b, c, call save object with properties a, b
[22:12:02] <ningu> I'm trying a bulk-load some data into mongodb using cayley (a graph database that can use mongodb as a store) but after loading the first ~44GB or so in I get this error in the mongodb logs: SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [127.0.0.1:41297]
[22:13:30] <ningu> the database is all stored in 20 or so files in /var/lib/mongodb so I don't know about that, for number of open files... but could be something else?
[23:30:34] <bunbury> cheese: it makes sense to do that though right ? goolge business bandwidth limit would be exceeded each day doing imap queries
[23:31:25] <bunbury> cheeser: it makes sense to do that though right ? goolge business bandwidth limit would be exceeded each day doing imap queries
[23:36:09] <bunbury> found this https://developers.google.com/resources/api-libraries/documentation/gmail/v1/java/latest/com/google/api/services/gmail/model/MessagePart.html
[23:38:19] <bunbury> and this https://developers.google.com/gmail/api/
[23:42:20] <cheeser> depends on what you want to do
[23:46:17] <bunbury> be able to find messages that bounced cuz ppl email dont exist. message delayed