[00:59:53] <Boomtime> also, why are you using findAndModify instead of update?
[01:01:03] <rndmsh> From what I can tell from this library it's trying to find docs (which in this case represent jobs) to process and lock them once it finds them to prevent other workers from processing the job.
[01:06:51] <zamnuts> MongoDB has the $nor logical query operator. What is the reasoning for not having a $nand logical query operator? use case: http://stackoverflow.com/q/20810698/1481489
[01:11:00] <zamnuts> another use case (more appropriate): https://groups.google.com/d/topic/mongoose-orm/2UnZf2LtKkk/discussion, notice the "solution" uses $ne on specific fields - what if that query is complex?
[01:16:42] <joannac> zamnuts: file a server ticket
[01:17:06] <zamnuts> joannac, was going to, figured i'd ask here frist :)
[01:39:58] <zamnuts> Boomtime, get the modified documents after update, which is helpful with things like $set or $setOnInsert, esp in conjunction with multi/upsert
[01:40:42] <zamnuts> i like being able to benefit mongodb's internal logic to apply the query/update BSON to a document and get the consolidated/rendered document
[01:46:36] <Boomtime> but if you test 1000 findAndModify concurrently, versus 1000 update/find concurrently, you will likely see the update/find combination doing better
[01:46:55] <Boomtime> it might even be very signficant
[01:47:14] <Boomtime> findAndModify might be the right thing to do, it depends on your use case, you should test it
[01:47:17] <zamnuts> based solely on lock duration?
[02:17:38] <Boomtime> remember though that $ne/$nor/$not have fairly terrible performance characteristics - you should use them only if there is a crazy psychopath nearby demanding that you do
[02:19:33] <Boomtime> to avoid their use btw, you can usualy come up with a field value which logically means what you want - i.e cast all queries into a positive search ("is X" rather than the "not is X")
[02:19:47] <zamnuts> Boomtime, i'm aware of the perf problems with those, but i'm willing to trade that for ease of implementation on my application layer
[02:23:08] <Boomtime> it sounds like you have a good handle on this
[02:23:39] <zamnuts> granted i can't leverage intersected indices, but w/e
[02:24:38] <zamnuts> which is such a great 2.6 feature, i must say! i was sure missing them in 2.4
[03:31:08] <rh1n0> whats the preferred method for syncing a subset of records from one mongodb to a different mongodb? These are separate projects under the same company and are not identical. I was thinking i could easily export the data but i was hoping for more of a solution i could automate so it could run daily. Thanks for any ideas
[03:34:51] <Boomtime> rh1n0: have you considered a tailable cursor on the oplog?
[03:35:34] <Boomtime> there are lots of ways to do what you want as a once-per day process, but they all boil down to export/import, it just depends on what tools you choose to do it
[03:36:03] <Boomtime> a tailable cursor on the oplog is slightly more elegant than that, but requires a bit more thought/effort to implement
[03:36:30] <rh1n0> Boomtime: Thanks for the help. Tailing the oplog is an interesting idea. Especially if i needed to run this process multiple times per day. Ill check into that. Otherwise ill just automate the export ;) thanks again
[06:05:07] <sandelius> Can I get an honest answer here? Is MongoDB a valid choice for a project managment system, CMS and any applications like that? It's hard to get a "real" answer so I guest it's best to actually ask you guys here?
[06:12:59] <Boomtime> sandelius: you are basically asking if mongodb is a database, to which the obvious answer is yes. like every single deployment of a given technology though it is easy to deploy it badly, so the realistic answer becomes "it depends"
[06:13:59] <Boomtime> if you want a "real" answer to any question, the question needs to be less nebulous, so there is a chance of actually answering it
[06:14:57] <Boomtime> "could I (me personally) write a project management system backed by mongodb?", absolutely yes
[06:15:20] <Boomtime> could _you_ write a project management system backed by mongodb: i have no idea
[06:15:54] <sandelius> Boomtime I can but my question was more, is it a good idea
[06:19:58] <joannac> sandelius: the answer is "it depends"
[06:49:42] <NodeJS> Is it possible to load javascript file into mongodb shell and get access to it via db.eval script?
[07:24:01] <NodeJS> Does mongodb support storing server-side javascript functions?
[07:29:01] <NodeJS> Boomtime: look at my SO question http://stackoverflow.com/questions/26020618/is-rethinkdb-useful-on-partial-updating-json-documents-according-rfc6902
[07:29:23] <NodeJS> I need to patch document and save it back
[07:30:36] <Boomtime> so use a single update with $set: http://docs.mongodb.org/manual/reference/operator/update/set/
[07:32:08] <NodeJS> Boomtime: it's not atomic, I need fetch document, check revision, apply patches and save it back
[07:32:43] <Boomtime> so use a single update with $set: http://docs.mongodb.org/manual/reference/operator/update/set/
[07:37:08] <NodeJS> Boomtime: how could I move documents inside when applying move patch like the following: [{ "op": "move", "from": "/foo/waldo", "path": "/qux/thud" }] ?
[07:37:55] <joannac> NodeJS: I suggest you pastebin a before and after
[07:38:08] <joannac> I don't know what "move documents inside" means
[07:39:18] <NodeJS> Look at my nested JSON document in my question http://stackoverflow.com/questions/26020618/is-rethinkdb-useful-on-partial-updating-json-documents-according-rfc6902 please
[07:48:36] <Boomtime> There are examples on the "update" command showing even how to make multiple updates, both mutliple changes to a single document or changes to multiple documents
[07:49:55] <NodeJS> Boomtime: I can't find there operations unfortunately :(
[07:50:48] <joannac> NodeJS: your patches look like { "op": "move", "from": "/foo/1", "path": "/foo/3" }
[07:57:09] <joannac> I don't think it's the right thing to be doing in a database
[07:58:21] <NodeJS> @joannac: I implement collaborative system that store big nested JSON document. I need to update it on client side and server side similarity
[07:59:16] <NodeJS> @joannac: each update is bunch of patches that must be applied at ones
[08:05:12] <NodeJS> @joannac: I'm using https://github.com/Starcounter-Jack/JSON-Patch module and it looks good but I want to reduce questies to mongodb, so I've stared learning stored javascript fuctions in mongodb
[08:20:16] <NodeJS> @joannac: can you help me with my problem?
[08:44:26] <geekou> hello all! i've got a question: when using db.cloneCollection, it copies the source collection not to the current database
[08:44:33] <geekou> but on a database with the same name as the original one
[08:44:45] <geekou> is there a way to have it copied to the current database ?
[10:50:09] <Lope> I've heard that mongoDB likes to eat up all of the memory it can. That's a problem if I run it on my server with a bunch of openVZ VM's because it's going to starve the VM's of memory. Any suggestions? Should I put MongoDB in a VM and cap it's memory?
[11:30:03] <Wh0> hi, i have a question, maybe any1 in here can help me out
[11:50:05] <wnkz> Hi, mongodb package seems to have changed to v2.6.5 but I can't find any release notes online ; does someone have some more information ? Is it safe to upgrade ?
[11:54:24] <Wh0> i found a bare metal orm for mongodb which should be blazing fast according to the description: https://www.npmjs.org/package/iridium
[11:54:49] <Wh0> when i run the benchmark.js included i get extreme differences between the mongo driver and iridium
[11:55:25] <Wh0> the benchmark on the page says its 100ms slower than the mongo driver but on my machine the mongo driver take 200-300 ms and iridium takes 40000ms (40s) for the same task
[12:30:40] <wh0s_th3rE> can any1 help me with this module? https://www.npmjs.org/package/iridium
[12:33:59] <cheeser> please stop repeating your question over and over. if someone here *could* answer your question and was willing to, (s)he would have. repeating it only annoys everyone else.
[12:34:12] <cheeser> try waiting a few hours when it's not early morning for most of the US
[12:43:08] <wh0s_th3rE> whats with you cheeser? you cant help me?
[12:43:26] <wh0s_th3rE> maybe its a very very simple problem, but i dont know what i'm doing wrong
[12:58:14] <braz> @wnkz the announcements are normally sent out to this google group (https://groups.google.com/forum/#!forum/mongodb-announce) and this Jira query will give you all the updates for 2.6.5 (https://jira.mongodb.org/issues/?jql=project%20%3D%20SERVER%20AND%20fixVersion%20%3D%20%222.6.5%22%20ORDER%20BY%20priority%20DESC) whilst the release notes on the web page are being updated
[13:23:41] <electronplusplus> I'm unable to connect to a remote mongo by setting the env var MONGO_URL, why?
[13:43:40] <Lope> I'm trying to make a replicaset but getting an error: first I tried rs.initiate(); now I tried: rs.initiate({_id:'rs0',version:1,members:[{_id:0,host:'0.0.0.0:27017'}]},{force:true}); "couldn't initiate : can't find self in the replset config"
[13:47:18] <Lope> replSet can't get local.system.replset config from self or any seed
[13:52:21] <skot> you cannot use a host of 0.0.0.0, you need a real address.
[13:53:09] <skot> And the address you use must resolve to the server you are initiating on, hence the error message
[13:53:14] <Lope> which is the computer's hostname
[13:53:21] <Lope> if I type ping mongo it gets replies from itself
[13:54:48] <skot> I would suggest posting your shell session and logs to pastebin/etc to get help.
[13:55:35] <skot> i would also suggest connecting using the mongo shell with the exact name:port you are using to make sure you don't have a firewall rule or something...
[13:56:55] <Lope> is the mongo console normally on the same port as where you'd access the DB for other things?
[13:57:57] <skot> the mongo shell doesn't listen to any port, it is a client.
[13:58:23] <skot> The server has a single port used by all clients, including replication/sharding
[13:58:50] <Lope> I've changed the hostname to mongoserver now. so I tried mongo mongoserver:27017 and I got Failed to connect to 10.0.0.101:27017, reason: errno:111 Connection refused
[13:59:46] <skot> yep, I'd suggest you work with your sysadmin/network expert to figure out what you have configured
[14:00:50] <skot> If you know much about networking I'd suggest starting with making sure you don't have any firewall restricting access, that you started mongod bound to everything or the right address, etc. etc.
[14:01:46] <Lope> well, I'm running the console on the same VM as the mongod is running on. and there is no iptables installed.
[14:02:07] <Lope> so I'm thinking it's a configuration issue with mongo. listening on the wrong interface or some security setting.
[14:02:41] <skot> yep, I would assume you have it limited to localhost or something
[14:03:25] <skot> netstat -na | grep LIST will give you all listening ports.
[14:51:58] <tiso> hi, what's the best approach for writing mongodb scripts? in relational I used sql, the previous developer in mongo used javascript, but I'm not sure this is the best approach
[14:52:37] <GothAlice> tiso: Stored procedures and map/reduce/finalize functions in MongoDB must be written in JS.
[14:54:29] <skot> GothAlice: fyi, there are no stored procedures in mongodb
[14:55:03] <GothAlice> skot: Except that the linked page stores one in its first code example. ;)
[14:55:36] <skot> That isn't a stored proc, it is javascript text.
[14:56:46] <GothAlice> Seems like a rather trivial semanitc differentiation, there. I note the example isn't quoted or anything, so that specific detail is fairly craftily hidden from the user.
[14:56:53] <tiso> I never really liked javascript, if I could avoid using it I would, but it seems mongo suggests using it
[14:57:36] <GothAlice> tiso: After my first dabblings in map/reduce, I immediately switched to aggregate queries, which can be performed from any language that MongoDB supports client connections from.
[14:57:39] <skot> GothAlice, it a huge distinction and difference. Stored procs are used very different than stored javascript.
[14:58:44] <skot> Generally stored procs are used for data validation, as an api for clients, for triggers/events and have significant transaction/internal support in relational databases
[14:59:04] <skot> None of those exists in mongodb...
[14:59:27] <tiso> so really anything you can do from a javascript script you can also do from any language in this list?
[14:59:57] <GothAlice> skot: With inclusion in $where evaluation and allowable use as a component in mapReduce or a simple db.eval() (which can cover the client api aspect) it's pretty close. Of course, you still shouldn't really be using them. Aggregate queries FTW. ;)
[15:05:49] <tiso> so theorically I could build a python app and through the api I should be able to do anything that I can do with a js shell script? why is javascript in a different category in this list? http://api.mongodb.org/
[15:06:20] <GothAlice> tiso: Because JavaScript isn't a client language, it's the langage MongoDB uses internally.
[15:06:36] <GothAlice> (Though of course there are Node.js bindings and whatnot.)
[15:06:59] <tiso> GothAlice: but regardless of this, I should be able to create a java app that does anything I can do with a shell js script
[15:07:11] <GothAlice> tiso: With certain exotic restrictions, yes.
[15:11:55] <GothAlice> tiso: Performance can be great even if you do the processing external to MongoDB itself. My MongoDB-as-RPC framework can churn through 1.9 million records (tasks) per second per host.
[15:22:03] <tiso> GothAlice: what does this mongodb-as-rpc do?
[15:24:18] <GothAlice> tiso: Have a readme: https://github.com/marrow/marrow.task#1-what-is-marrow-task :)
[15:26:18] <GothAlice> https://github.com/marrow/marrow.task/blob/develop/marrow/task/model/query.py?ts=4#L30-L58 is the real magic.
[15:27:25] <GothAlice> tiso: https://gist.github.com/amcgregor/4207375 — was my segment of a presentation on how we were abusing MongoDB at work.
[16:05:17] <Sawbones> Are there any good articles on rules for structuring my mongodb collections?
[16:14:22] <GothAlice> Sawbones: The MongoDB documentation itself contains well written tutorials on "thinking the MongoDB way" about a number of common data structures (like blog posts with comments, etc.)
[16:17:12] <GothAlice> Sawbones: The most difficult thing for my SQL buddies to grok is nested documents, i.e. storing replies to a forum thread within the forum thread. Mostly it's the unexpected ability to query, retrieve, update, and insert those nested documents which blows them away. ;)
[16:17:56] <Sawbones> GothAlice: Yeah, I'm rooted into the sql way to do this and I want to make sure I don't mess up my mongo setup
[16:19:49] <GothAlice> Sawbones: https://github.com/bravecollective/forums/blob/develop/brave/forums/component/thread/model.py#L99-L192 is the code my forum software uses to deeply query, retreieve, or update nested results, and other goodies like getting the first or last comment in the thread.
[16:46:05] <tiso> is there a better alternative to robomongo in windows?
[17:13:30] <jkretz> hi all - i've got a question about $elemMatch - specifically in versions greater than 2.2 if anyone is around
[17:13:51] <jkretz> i see in the docs that $elemMatch in versions > 2.2. now will only return the first match
[17:13:59] <GothAlice> jkretz: I use $elemMatch fairly frequently in my forum software. What's the Q?
[17:14:02] <jkretz> but i need to return all matches
[17:14:20] <jkretz> how do i get the behavior as described in the docs from pre 2.2 ?
[17:14:42] <jkretz> the student example there is exactly what i need
[17:14:45] <GothAlice> jkretz: AFIK $elemMatch can't match multiples, however you can get the behaviour back by switching to aggregate queries and using $unwind prior to your $match.
[17:15:41] <jkretz> hrmmm i was hoping to avoid aggregate queries as i am a complete newbie to this, and unfortunately the tutorial i wen through was out of date :\
[17:15:53] <GothAlice> Aggregate queries are one of the most beautiful things ever invented. ;)
[17:16:26] <jkretz> do you happen to know of any good examples using aggregation to pull sets of subdocuments out of an array?
[17:16:58] <GothAlice> db.students.aggregate([{"$unwind": "$students"}, {"$match": { ... your find() here ... }])
[17:18:15] <GothAlice> $unwind generates one record in the aggregate per child record per top level record. (I.e. within the $match, $students isn't a list, it's one of the students at that school.)
[17:20:36] <jkretz> i've not used grouping - well i did, but again the tutorial i did was out of date (it was the tuts plus course FYI - if anyone asks similar questions)
[17:21:30] <GothAlice> jkretz: Most of our aggregate queries for reporting look roughly like that screenshot.
[17:21:50] <jkretz> does the ordering matter within the aggregate array?
[17:24:14] <GothAlice> I use $project to reduce the amount of information MongoDB needs to keep around and manipulate during intermediary steps, which in theory speeds things up, but I haven't bothered to measure it. (Most of our queries take a few tens of milliseconds.)
[17:26:41] <jkretz> again forgive my complete inexperience here (i literally installed mongo yesterday, coming from a few years of tinkering in microsoft sql.... shame)
[17:26:55] <jkretz> but in a grouping operation, for example on that student data set
[17:27:34] <jkretz> if i want to group by the zip code from the parent document
[17:28:52] <jkretz> i'm sure it's for the best, but you have no idea how elated i was when going through the videos in that course and they went through the (old) behavior of elemMatch
[17:29:13] <jkretz> i literally said "great! this is exactly what i need for my data!"
[17:29:26] <jkretz> but the behavior is no more :(
[17:29:35] <GothAlice> Heh, yeah. There are a number of times I'm absolutely positive something is possible and it turns out not to be the case.
[17:29:37] <jkretz> i'm sure aggregation is the right answer, and infinitely more flexible
[17:29:51] <GothAlice> Well, it has some limitations which should be covered in the introduction.
[17:30:00] <GothAlice> (16MB document size limit being the biggest one.)
[17:30:35] <GothAlice> The aggregate query that generates our "last two-weeks" vs. "two-weeks prior" hit comparison charts takes ~25ms when unfiltered—examining all client data—and the "all time" vs "last 30 days" aggregate to generate the "which languages are in use right now" chart takes ~18ms. ^_^ And yup, we're using MongoDB for analytics, amongst other things. Looots of aggregate queries.
[17:32:00] <jkretz> great stuff, my application is some internet of things temperature sensors
[17:33:13] <GothAlice> jkretz: Let me give you some extra links to third-party blog posts.
[17:33:32] <jkretz> that would be great - thanks so much for all the help!
[17:33:56] <GothAlice> http://www.thefourtheye.in/2013/04/mongodb-aggregation-framework-basics.html < student example
[17:34:18] <GothAlice> http://stackoverflow.com/questions/15938859/mongodb-aggregate-within-daily-grouping < a neat trick for extracting components from dates
[17:35:58] <GothAlice> http://www.nrg-media.de/2013/10/mongodb-aggregation-group-by-any-time-interval/ < URL is self explanatory ;)
[17:36:55] <GothAlice> And last (I promise!) is my favourite article ever about aggregation: http://www.devsmash.com/blog/mongodb-ad-hoc-analytics-aggregation-framework — this example uses temperature data from ocean buoys as the sample.
[17:37:23] <GothAlice> jkretz: These are the resources I used to kickstart myself into aggregate queries. :)
[17:47:44] <stefandxm> theres no documentation there, just crap :D
[17:47:56] <GothAlice> Click the "Documentation" link (towards the bottom of the list of links), then click the top link on the list of versions, and explore. :)
[17:48:14] <GothAlice> Full disclosure: I don't use C++.
[17:51:36] <stefandxm> shit is up and running and valgrind isnt complaining
[17:51:45] <stefandxm> but id like to verify all my api calls etc
[17:51:52] <stefandxm> with the doxygen crap its not possible
[17:52:07] <stefandxm> tbh i am quite flabbergasted
[18:02:02] <geri> hi is there a way to convert a xml doc to json and insert it to mongo?
[18:05:00] <GothAlice> geri: Usually one would roll their own solution, as each DOM is different. https://gist.github.com/bufferine/1330475 is one example, in Scala.
[18:05:16] <GothAlice> (You could use simplexml in PHP, or lxml in Python to achieve the same result.)
[18:05:42] <geri> so all i need is installing scala?
[18:05:58] <GothAlice> geri: No, all you need is a way to load an XML file in your preferred scripting language.
[18:06:23] <geri> i thought this script generates a json doc
[18:06:56] <GothAlice> geri: The sample script I gave you will load an XML document and insert it into MongoDB.
[18:08:37] <GothAlice> geri: https://github.com/bravecollective/core/blob/develop/brave/core/util/eve.py#L52-L111 is an example of some processing code I use to transform inconsistent XML into Mongo-safe data, used here: https://github.com/bravecollective/core/blob/develop/brave/core/util/eve.py#L252-L268
[18:09:15] <GothAlice> EVE Online's API calls generate really bad output.
[18:09:39] <geri> GothAlice: but you could also use the scala script?
[18:09:43] <GothAlice> geri: It could be short in Python, too, but I really needed to clean that data up. If I had pristine input I could avoid all that.
[18:10:10] <GothAlice> geri: Indeed I could. The resulting structures would be slightly different, but yeah, it'd work.
[18:38:03] <jkretz> GothAlice - is there anyway to get the performance stats of an aggregate call - similar to .explain() for find?
[18:40:43] <GothAlice> jkretz: I'm not sure. Generally I do so simply by cutting the aggregate query short. (I.e. time the first step, time the first and second step, …)
[19:19:22] <NoReflex> hello! I have some problems with indexes. for example if I try to query for documents newer the ObjectId it is fast and everything is well. Instead if I add a condition for the device that has generated the document it does not complete; I also tried with hint({_id: 1}) but no luck
[19:20:20] <NoReflex> the only thing I can think of is that I'm not using the filter condition properly
[19:21:07] <NoReflex> for example this query locks up: db.messages.find([{"_id": {"$gt": ObjectId("5436db19a21fa71f35e046dc")}}, {"device": "some_device"}]).hint({"_id": 1}).sort({"_id": 1}).limit(10).explain(true)
[19:21:32] <NoReflex> but if I remove the condition for device it works OK
[19:23:25] <NoReflex> isn't a list of conditions the correct way to query for something like: _id GREATER THAN oid AND device is "some_device"
[19:28:01] <NoReflex> how can I query for documents like : field_a = value AND field_b > value ?
[19:28:17] <NoReflex> do I use list or a dictionary
[19:31:55] <ssarah> hei guys, so whats the easiest way to, on a default install, start a replicate set with ssl enabled?
[19:46:23] <ssarah> ok... what's anyway to do it ;D
[20:47:01] <kerms> mongo newbie here trying to figure out a query on embedded documents, anyone got a few minutes to help out?
[21:19:57] <OneAngryDBA> Is there a flag in the 2.6 YAML based config file for the "cpu" setting which existed in the 2.4 configuration file format?
[21:22:16] <tmwsiy> hi I am trying to do a date query where I just want to see if one date on the document is greater than another date on the same document. All of the exmaples i see about this are testing a date parameter and not from the record. I am still prety green but I cannot readily find the pieces for this
[21:24:53] <GothAlice> tmwsiy: Standard or aggregate query?
[21:25:28] <tmwsiy> I am not doing any grouping or summing or the like, so my inclination is to say standard
[21:25:47] <tmwsiy> trying to work my tabular mind into this paradigm :)
[21:25:56] <GothAlice> tmwsiy: And do you want to filter on the condition, or include it in the results?
[21:26:46] <tmwsiy> I just want to find all documents where the one date field is greater than another date field on the same document
[21:27:30] <GothAlice> Unfortunately $where is, AFIK, the only way to do what you want with standard queries.
[21:27:33] <jiffe> mongodb work well as an in-memory nosql cache?
[21:27:46] <GothAlice> jiffe: It can, if you have enough RAM for the entire dataset.
[21:28:18] <GothAlice> jiffe: MongoDB will still use memory mapped on-disk files, but if they fit entirely in RAM you can still get the performance you'd expect from pure RAM.
[21:28:29] <tmwsiy> nice! thank you, I thought I could somehow do it with $gte
[21:28:52] <GothAlice> tmwsiy: It's important to note that $where can not use indexes and executes JavaScript, meaning it *will* slow your query down.
[21:29:24] <tmwsiy> gotcha, my dataset is like 120MB so its not a problem :)
[21:29:28] <GothAlice> tmwsiy: It's easier to do what you ask when using aggregate queries, but then there are a bunch of other things to be aware of (i.e. no more than 16MB of data returned per query.)
[21:29:49] <tmwsiy> I think this is thebest way at my current level of understanding
[21:30:34] <GothAlice> tmwsiy: I've been using MongoDB for years and have never needed to resort to a $where clause, for comparison. ;)
[21:30:58] <GothAlice> (Usually it's a sign that you should try structuring your data differently.)
[21:32:19] <GothAlice> For example, if you have a creation date and a modification date, and you only want records that have been modified since creation it's much better to $exists or $not null it (i.e. don't store a modification date if it hasn't been modified) than try to compare if one is greater than the other.
[21:32:23] <tmwsiy> well this is just scraped web data and I am trying to figure out what appears to be a switched field with respect to these two dates, its more a one off sanity check
[21:33:24] <tmwsiy> I could easily just pull the dates out of the database and doo this, jut trying to learn mongo :)
[21:34:21] <GothAlice> MongoDB rewards good data design (and punishes bad design) more than most DB engines, I've found.
[21:46:36] <tmwsiy> yes, that $where does suck some cycles, huh?
[21:47:27] <GothAlice> Aggregate would avoid that entirely, but if more than 16MB of your dataset matched it'd explode somewhat hilariously.
[21:49:52] <tmwsiy> as a side note working with ufo report data and it appears as if in about 16% of the reports the date the sighting was reported shows up as a date after the sighting occurred, :) maybe those folks are just that out there .. lol time shifting and what not
[21:50:14] <tmwsiy> or maybe it was a bug in the site
[21:51:52] <GothAlice> Heh; sounds like a fun dataset.
[21:52:06] <GothAlice> I work for an HR firm. Pretty uninteresting data from an abstract standpoint. ;)
[21:52:29] <GothAlice> Business critical, sure, but not very sexy. ^_^
[21:57:24] <tmwsiy> Yes the bread and butter is rarely sexy, this is for a stats regression class. This helps in the interest department for sure, not a flat file I can pull in directly like most everyone else has. I am going to have to do some more creative thinking to get 6 relatively non-corrolated variables and a response to do the multiple regression stuff
[21:59:03] <GothAlice> tmwsiy: Actually, you could just do a straight import of CSV or equivelant tabular data. I've done that lots to migrate data out of SQL.
[21:59:42] <GothAlice> For the most part I prefer doing complex data processing outside of MongoDB, though obviously I make use of pre-filtering and whatnot, if possible.
[22:01:11] <tmwsiy> right this was easy peasy to get to do web scraping and I really like it for that ! Also been using it for a greek language project that has been fun. all trivial dataset sizes, not at all where this sort of tefchnology seems to really shine
[22:01:34] <GothAlice> Sometimes, though, simple things are made hard. https://gist.github.com/amcgregor/1623352 was the silliest map/reduce I've ever written. ^_^
[22:01:43] <GothAlice> Wish I had aggregate queries back when I wrote that.
[22:06:11] <jiffe> is it possible to disable automatic db/collection creation?
[22:06:46] <GothAlice> Not really. You can do tricky things with file permissions to prevent MongoDB from being able to create the files, but that's… ugly.
[22:06:54] <tmwsiy> heh, well for map reduce its pretty simple :)
[22:06:57] <GothAlice> jiffe: You could also do it by enabling authentication, I suspect.
[22:07:15] <GothAlice> tmwsiy: Yeah, but an entire map/reduce for wont of a GROUP BY statement makes me cry tears of blood. ;)
[22:10:24] <jiffe> well I am trying to handle the event where a db/collection exists at one point and then doesn't exist such as could be the case with a fresh memory based instantiation, the collection would have to be recreated with proper indexes
[22:10:26] <GothAlice> tmwsiy: Especially after the "map/reduces store into actual collections by default" change. Wow my DB got cluttered fast until I updated the relevant code to clean up after itself.
[22:11:08] <GothAlice> jiffe: You'd likely have to trap that condition yourself, or simply run ensure_index before each block of calls.
[22:11:33] <GothAlice> jiffe: ensureIndex (ugh, camel) caches so it's pretty efficient if frequently called.
[22:12:24] <GothAlice> jiffe: Best would be, of course, to trigger such operations automatically during the creation of a new memory backed store, prior to any use by the application.
[22:12:48] <tmwsiy> yes I hve done Hadoop and you run into similar stuff, only having to implement java objects with a ton of ways to shoot yourself in foot out of the gate :)
[22:12:50] <GothAlice> (My code runs ensureIndex across all indexes and all collections once each time the application starts up)
[22:13:26] <GothAlice> "I had a problem and thought ot use Java. Now I have an AbstractProblemFactoryObserver, and the original problem."
[22:14:16] <tmwsiy> I agree, wrote a lot of java but its json/python all the way for me now!
[22:15:18] <GothAlice> Our NLP code is Python running in Jython to make use of the many Java language libs. It's a bit of a frankenstein monster.
[22:16:20] <tmwsiy> hah, sounds fun.. embedded python that runs on jvm and clr is a great boon to the language, esp with all the CS departments switching to it
[22:16:51] <GothAlice> tmwsiy: For the most part, though, our production deployments are on pypy.
[22:16:56] <tmwsiy> acutally having my first experience with nltk, so far its pretty slick
[22:17:37] <GothAlice> nltk has some warts, but they're not too hard to work around. Our MongoDB-based fulltext indexer uses it.
[22:20:02] <GothAlice> https://github.com/marrow/contentment/blob/develop/web/extras/contentment/components/search/model.py#L37 is the old Okapi BM-25 ranking algorithm and boolean pre-filter (basically the exact same as Sphinx and most other dedicated packages for search ranking) I wrote for Mongo. :3
[22:21:21] <kerms> long time relational user trying to figure out a query in mongo, anyone got a second to help?
[22:21:33] <GothAlice> kerms: Ask, don't ask to ask. What's the issue?
[22:22:30] <kerms> i’ve got a master document, schema looks like: Master: { _id, details:[Detail]} where Detail: {_id, start, end, keywords:[string]}
[22:22:33] <kerms> i want to be able to search by the keywords on the detail
[22:22:37] <kerms> and get back the matches with their master records
[22:22:40] <kerms> and the start and end properties of the detail as well
[22:23:18] <imanc> is there a way to atomically assign a unique ID (objectID) to an embedded document?
[22:23:28] <GothAlice> kerms: A ha! {'details.' + keyword: {'$exists': True}}
[22:23:41] <GothAlice> kerms: That'd be the approach using Python dict literals.
[22:23:42] <kerms> getting back the whole records isnt an issue, its getting the matches within that array of embedded documents that is giving me trouble
[22:25:08] <GothAlice> kerms: Ah. $elemMatch will only return one record in your case. I can highly suggest trying aggregate queries, since $unwind followed by a $match and a $group will do what you want.
[22:25:12] <GothAlice> (Let you filter sub-documents.)
[22:27:23] <kerms> my only worry about unwind and group is performance - i’ve read about using project and setdifference in mongo 2.6: second part of accepted answer (http://stackoverflow.com/questions/25080013/mongodb-limit-response-in-array-property)
[22:28:20] <GothAlice> Indeed, adding $project operators to restrict the fields being manipulated to just those you need can save a lot of processing power.
[22:29:25] <GothAlice> http://cl.ly/image/1I1k2N263U24 is the common structure for most of my report generation queries.
[22:29:51] <kerms> i’ve got to run but thank you for the help goth!
[22:29:52] <GothAlice> (And yes, my entire db is optimized to use "shortest possible" keys.)
[23:33:53] <choke> https://gist.github.com/ch0ke/a86ed12e852e008c69e4 i have this criteria -- and it works... it returns 7 rows, but what i need to know is how can i modify this search criteria to limit the results to the latest result, and only 1 per element on the $in?
[23:37:12] <Boomtime> choke: you need to define what you mean by "latest result"
[23:37:58] <choke> we have timestamps in the form of: "locationTimeStamp":"2014-10-09 13:59:59"
[23:39:19] <Boomtime> ok, so you can sort on that to get the "latest"... i'm not sure what you mean by your other requirement either
[23:39:39] <choke> let me show you a gist example of what it does now.. and what i'm looking for
[23:43:15] <choke> https://gist.github.com/ch0ke/68cc2d0a38366c0ee38e there we go, had to do some formatting to make it easy to read
[23:44:54] <choke> we're searching off an array of locationUserKey -- need to get the latest ( locationTimeStamp) limited to 1 per locationUserKey
[23:45:41] <Boomtime> yep, are those seperate documents, or an array contained in a single document?
[23:46:10] <Boomtime> I ask because your _id is not an ObjectID
[23:47:03] <Boomtime> anyway, whatever they are, a query can't do it for the general case but the aggregation pipeline can
[23:47:08] <choke> it is, but i'm running my queries through PHP and it removes the ObjectID portion... So each one is a seperate document
[23:50:18] <Boomtime> aggregation pipeline, one sequence that would work is $match ($in) -> $sort (locationTimeStamp) -> $group (locationUserKey) and select using $first
[23:56:48] <choke> okay so run a match and sort... on the collection, and group on the cursor?